+ All Categories
Home > Documents > Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Date post: 30-Oct-2016
Category:
Upload: anirban-mitra
View: 214 times
Download: 1 times
Share this document with a friend
7
Occurrence, divergence and evolution of intrinsic terminators across Eubacteria Anirban Mitra a , Kandavelmani Angamuthu a , Hanasoge Vasudevamurthy Jayashree a , Valakunja Nagaraja a,b, a Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India b Jawaharlal Nehru Center for Advanced Scientic Research, Bangalore 560064, India abstract article info Article history: Received 10 January 2009 Accepted 16 April 2009 Available online 23 April 2009 Keywords: Transcription Intrinsic termination GeSTer Eubacteria Palindrome Stemloop Rho In Escherichia coli, the canonical intrinsic terminator of transcription includes a palindrome followed by a U-trail on the transcript. The apparent underrepresentation of such terminators in eubacterial genomes led us to develop a rapid and accurate algorithm, GeSTer, to predict putative intrinsic terminators. Now, we have analyzed 378 genome sequences with an improved version of GeSTer. Our results indicate that the canonical E. coli type terminators are not overwhelmingly abundant in eubacteria. The atypical structures, having stem-loop structures but lacking Utrail, occur downstream of genes in all the analyzed genomes but different phyla show conserved preference for different types of terminators. This propensity correlates with genomic GC content and presence of the factor, Rho. 6070% of identied terminators in all the genomes show optimizedstem-length and ΔG. These results provide evidence that eubacteria extensively rely on the mechanism of intrinsic termination, with a considerable divergence in their structure, positioning and prevalence. The software and detailed results for individual genomes are freely available on request. © 2009 Elsevier Inc. All rights reserved. Introduction Transcription of a DNA template to a RNA transcript is functionally subdivided into initiation, elongation and termination [1]. In E. coli and a few other bacteria studied, termination is achieved by two mechanisms, employing intrinsic and factor-dependent terminators. Functionally, if features of the nascent transcript itself can cause termination in vitro, it is known as intrinsic termination [2,3,4]. On the other hand, factor-dependent termination employs proteins, notably Rho [5,6]. Relying on experiments with E. coli, various models have been proposed [7,8,9,10,11,12,13,14] to explain the mechanism of intrinsic termination. A typical intrinsic terminator has been shown to consist of a GC-rich palindromic region followed a stretch of T on the sense strand. When transcribed, this results in a RNA which has a hairpin followed by a U-stretch. Several studies indicate that a combinatorial action by both the hairpin and the U-tract contribute to pausing and termination [15,16,17,18,19,20,21,22,23]. The stability of the RNADNA hybrid is essential for function of the Ternary Elongation Complex (TEC). It is has been shown that the U-trail signicantly reduces the stability of the hybrid [8,9,13,18] and terminators without a relatively weaker hybrid usually have a low termination efciency. However, terminators have been also shown to retain termination efciency in absence of the U-trail or when many of the U residues have been deleted [21,22,24,25]. The RNA structure where a U-trail follows the hairpin (Supple- mentary Fig. 1A), however, remains the canonical model for an intrinsic terminator. The large scale sequencing of genomes has resulted in efforts to develop algorithms to detect intrinsic terminators across whole genomes [26,27,28,29]. Most of these programs identied intrinsic terminators based on both the strength of the hairpin and the weight of the U-trail. Such analysis revealed that Firmicutes and many other bacteria are highly dependent on intrinsic terminators [30]. Surprisingly, it was also concluded that several bacteria species do not seem to depend on intrinsic termination [27,31]. However, this was difcult to explain given that intrinsic termination is certainly an economical, efcient and, most likely, an ancient regulatory mechanism. Conservation of intrinsic termination mechanism across eubacteria is also reinforced by the observations that intrinsic terminators from E. coli function efciently in other bacteria. To understand the apparent anomaly, as to why several groups of bacteria would select against this mechanism, and also obtain a better insight into transcription termination, we developed a program called GeSTer (Genome Scanner for Terminators) to identify intrinsic terminator structures across whole genomes [24,32,33]. The exponential increase in whole genome data has now allowed us to analyze 378 genomes, spanning several prokaryotic phyla using the improved version of GeSTer. Our results reveal the occurrence of varied types of terminator structures downstream of many experimentally veried operons in diverse groups of bacteria. Further, different bacterial phyla show preference for different types of intrinsic terminators and this propensity is correlated with genomic GC content and the presence of the Rho factor. Genomics 94 (2009) 110116 Corresponding author. Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India. Fax: +91 80 23602697. E-mail address: [email protected] (V. Nagaraja). 0888-7543/$ see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2009.04.004 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno
Transcript
Page 1: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Genomics 94 (2009) 110–116

Contents lists available at ScienceDirect

Genomics

j ourna l homepage: www.e lsev ie r.com/ locate /ygeno

Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Anirban Mitra a, Kandavelmani Angamuthu a, Hanasoge Vasudevamurthy Jayashree a, Valakunja Nagaraja a,b,⁎a Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, Indiab Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore 560064, India

⁎ Corresponding author. Department of MicrobioloInstitute of Science, Bangalore 560012, India. Fax: +91

E-mail address: [email protected] (V. Nagaraja)

0888-7543/$ – see front matter © 2009 Elsevier Inc. Adoi:10.1016/j.ygeno.2009.04.004

a b s t r a c t

a r t i c l e i n f o

Article history:Received 10 January 2009Accepted 16 April 2009Available online 23 April 2009

Keywords:TranscriptionIntrinsic terminationGeSTerEubacteriaPalindromeStem–loopRho

In Escherichia coli, the canonical intrinsic terminator of transcription includes a palindrome followed by aU-trail on the transcript. The apparent underrepresentation of such terminators in eubacterial genomes ledus to develop a rapid and accurate algorithm, GeSTer, to predict putative intrinsic terminators. Now, wehave analyzed 378 genome sequences with an improved version of GeSTer. Our results indicate that thecanonical E. coli type terminators are not overwhelmingly abundant in eubacteria. The atypical structures,having stem-loop structures but lacking ‘U’ trail, occur downstream of genes in all the analyzed genomes butdifferent phyla show conserved preference for different types of terminators. This propensity correlates withgenomic GC content and presence of the factor, Rho. 60–70% of identified terminators in all the genomes show“optimized” stem-length and ΔG. These results provide evidence that eubacteria extensively rely on themechanism of intrinsic termination, with a considerable divergence in their structure, positioning andprevalence. The software and detailed results for individual genomes are freely available on request.

© 2009 Elsevier Inc. All rights reserved.

Introduction

Transcription of a DNA template to a RNA transcript is functionallysubdivided into — initiation, elongation and termination [1]. In E. coliand a few other bacteria studied, termination is achieved by twomechanisms, employing intrinsic and factor-dependent terminators.Functionally, if features of the nascent transcript itself can causetermination in vitro, it is known as intrinsic termination [2,3,4]. On theother hand, factor-dependent termination employs proteins, notablyRho [5,6].

Relying on experiments with E. coli, various models have beenproposed [7,8,9,10,11,12,13,14] to explain the mechanism of intrinsictermination. A typical intrinsic terminator has been shown to consist ofa GC-richpalindromic region followed a stretch of Ton the sense strand.When transcribed, this results in a RNAwhich has a hairpin followed bya U-stretch. Several studies indicate that a combinatorial action by boththe hairpin and the U-tract contribute to pausing and termination[15,16,17,18,19,20,21,22,23]. The stability of the RNA–DNA hybrid isessential for function of the Ternary Elongation Complex (TEC). It is hasbeen shown that the U-trail significantly reduces the stability of thehybrid [8,9,13,18] and terminators without a relatively weaker hybridusually have a low termination efficiency. However, terminators havebeen also shown to retain termination efficiency in absence of theU-trail orwhenmanyof theU residues have been deleted [21,22,24,25].

gy and Cell Biology, Indian80 23602697..

ll rights reserved.

The RNA structure where a U-trail follows the hairpin (Supple-mentary Fig. 1A), however, remains the canonical model for anintrinsic terminator. The large scale sequencing of genomes hasresulted in efforts to develop algorithms to detect intrinsic terminatorsacrosswhole genomes [26,27,28,29].Most of these programs identifiedintrinsic terminators based on both the strength of the hairpin and theweight of the U-trail. Such analysis revealed that Firmicutes and manyother bacteria are highly dependent on intrinsic terminators [30].Surprisingly, it was also concluded that several bacteria species do notseem to depend on intrinsic termination [27,31]. However, this wasdifficult to explain given that intrinsic termination is certainly aneconomical, efficient and, most likely, an ancient regulatorymechanism. Conservation of intrinsic termination mechanism acrosseubacteria is also reinforced by the observations that intrinsicterminators from E. coli function efficiently in other bacteria. Tounderstand the apparent anomaly, as towhy several groups of bacteriawould select against this mechanism, and also obtain a better insightinto transcription termination, we developed a program called GeSTer(Genome Scanner for Terminators) to identify intrinsic terminatorstructures across whole genomes [24,32,33]. The exponential increasein whole genome data has now allowed us to analyze 378 genomes,spanning several prokaryotic phyla using the improved version ofGeSTer. Our results reveal the occurrence of varied types of terminatorstructures downstream of many experimentally verified operons indiverse groups of bacteria. Further, different bacterial phyla showpreference for different types of intrinsic terminators and thispropensity is correlated with genomic GC content and the presenceof the Rho factor.

Page 2: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Table 2GeSTer-identified terminators downstream of known/experimentally identifiedoperons/transcription units in representative eubacteria.

Species Operon ΔG L/I-shaped Referencesa

Caulobactercrescentus

xyl −24.6 L Stephens et al, 2007groEL −24.2 L Avedissian and

Lopes Gomes, 1996hcrA-grpE −21.9 I Roberts et al., 1996cheE −27.4 I Jones et al., 2001rsaDE −27.2 I Toporowski et al., 2005

Helicobacterpylori(strains J99and 26695)

jhp 1395 −14.3 I [27]HP0207 −12.4 I [34]HP0842 −9.7 I [34]

Salmonellatyphimurium

sapABCDF −13.8 I Parra-Lopez, et al., 1993opp operon −15.9 L Hogarth and Higgins, 1983dpp operon −32.8 I Elliott, 1993csg operon −18.6 L Romling et al., 1998fru operon −13.3 L Geerse et al., 1989

Streptomycesavermitilis

nuo operon −38.2 I Ikeda et al., 2003rrnA3 operon −32.3 L Ikeda et al., 2003

Mycobacteriumtuberculosis

tuf −49.6 L [24]Rv 1324 −57.9 I [24]

E. coli K12MG1655

mhpABCDFEoperon

−31 I Ferrandez et al., 1997

flg operon −18.6 L Blattner et al., 1997carAB operon −14.1 L Blattner et al., 1997araBAD operon −12.9 L Schleif, 2000nagBACD operon −20 I Plumbridge, 1989

111A. Mitra et al. / Genomics 94 (2009) 110–116

Results and discussion

Identification of potential terminators across Eubacteria

The GeSTer program was used to evaluate a large sample of 378eubacterial genome sequences representing most of the eubacterialphyla. The complete non-redundant list of bacteria analyzed includes313 species (Table 1, Supplementary Table 1), resulting in one of thelargest databases of intrinsic terminators available to date. For severalbacterial species, many strains have been analyzed (eg., E. coli, Bacillusanthracis, Pseudomonas syringae, Staphylococcus aureus) (Supple-mentary Table 2). Given the large sample size, and the diversity ofprokaryotic genomes sequenced till date, it gives an opportunity todelve into the occurrence, distribution and evolution of intrinsicterminators.

Potential terminators identified by GeSTer are grouped by theprogram into different types based on the different structural features(as described in Materials and methods and Supplementary Fig. 1).Moreover, all palindromic structures identified by GeSTer aresubjected to a species-specific ΔGcut off. The complete set of potentialterminators (“All”) was compiled for each organism. From this, themost stable structure downstream of each gene was designated to bethe “Best” terminator (Table 1 and Supplementary Table 1). The totalnumber of terminators identified in 313 species is 447820. Of these,331271 are candidates for “Best” structures (Supplementary Table 1).

Table 1Frequency of occurrence of different types of intrinsic terminators identified inrepresentative eubacteria.

Species Genes Alla Bestb Best/genes %Lc %Id

Acinetobacter baumannii 3420 2651 2043 59.74 69.35 30.64Agrobacterium tumefaciens 2759 1328 1002 36.32 40.42 59.58Anabena variabilis 5097 2999 2217 43.5 46.41 53.59Bacteroides fragilis 4275 1620 1293 30.25 66.9 33.1Bifidobacterium adolescentis 1699 999 724 42.61 50.28 49.72Bordotella pertusis 3488 2444 1515 43.43 15.84 84.16Borrelia garinii 867 245 176 20.3 67.61 32.39Bradyrhizobium japonicum 8316 4606 3042 36.58 18.51 81.49Burkholderia malleii 5089 4738 2553 50.17 13.47 86.53Campylobacter fetus 1768 490 382 21.61 65.97 34.03Clostridium perfringens 2784 1714 1288 46.26 86.88 13.12Chromobacter violaceum 4528 2581 1797 39.69 30.38 69.62Corynebacterium diphtheriae 2339 1053 805 34.42 57.39 42.61Coxiella burnetii 2060 809 557 27.04 31.78 68.22Enterococcus faecalis 3191 1719 1290 40.43 82.33 17.67Escherichia coli HS 4492 2558 1898 42.25 49.26 50.73Gleobacter violaceus 4477 1458 1128 25.2 23.4 76.6Gluconobacter oxydans 2493 1096 810 32.49 33.83 66.17Lactobacillus acidophilus 1937 1066 828 42.75 87.56 12.44Legionella pneumophila 2993 1176 914 30.54 47.37 52.63Listeria monocytogens 2930 1537 1252 42.73 89.22 10.78Mycobacterium ulcerans 4206 1781 1321 31.41 11.28 88.72Mycoplasma gallisepticum 764 297 201 26.31 88.06 11.94Neisseria gonorrhoeae 2068 1275 955 46.18 61.15 38.85Nocardia farcinica 5744 3276 2228 38.79 8.3 91.7Pasteurella multocida 2091 1179 923 44.14 80.72 19.28Psudomonas putida 5445 2378 1879 34.51 23.68 76.32Rhodobacter sphaeroides 851 449 307 36.08 15.31 84.69Salmonella enterica 4546 2427 1828 40.21 43.16 56.84Shigella flexneri 4299 2425 1790 41.64 48.1 51.9Staphylococcus aureus 2588 1459 1053 40.69 83 17Streptococcus mutans 2039 877 668 32.76 77.25 22.75Streptomyces avermitilis 7662 5035 3363 43.89 10.59 89.41Synechococcus elongatus 2575 833 680 26.41 40.59 59.41Treponema denticola 2816 1210 808 28.69 56.44 43.56Vibrio fischeri 3900 2522 1980 50.77 82.3 17.7Xanthomonas oryzae 4139 2010 1456 35.18 17.31 82.69Yersinia pestis 3984 2078 1590 39.91 51.89 48.11

a “All” represents total structures identified in a genome. b “Best” denotes the strongeststructure downstream of coding region. Best/genes is numerically identical topercentage of genes that have an intrinsic terminator downstream of the stop codon.c %L and d %I are percentage of Best structures that are L- or I-shaped.

sdhCDAB operon −23.7 I Blattner et al., 1997sucABCD operon −22.9 L Blattner et al., 1997tyrR −21.1 L Blattner et al., 1997his operon −14 L Blattner et al., 1997gyrA −15 I Blattner et al., 1997cpsBG operon −16.5 I Blattner et al., 1997nadB −16.6 I Flachmann et al., 1988tuf −12.8 L Johanson and Hughes, 1992yjjQ-bglJ operon −15.4 L Stratmann et al., 2008

a Details of all references provided in this table are listed in the Supplementarymaterial.

Thismeans that out of a total of 929730 genes studied here, 35.6% havean intrinsic terminator downstream of the coding region. This numberis actually an under representation. The actual number of genesdependent on intrinsic terminationmust be higher than this value dueto the operonic arrangement of genes in eubacteria. Amongst all theidentified “Best” potential terminators (see Materials and methods,Supplementary Fig. 1), 46.5% are L-shaped (with a U-trail), while theremaining terminators are I-shaped and without a U-trail (Supple-mentary Fig. 2, Table 1 and Supplementary Tables 3A–F). TheU-shaped(tandem) terminators make 9.2% of all observed structures. The othertwo classes of terminators are of relatively rare occurrence in all thegenomes analyzed so far (Supplementary Tables 3A–3F). X-shapedstructures make up around 4% of all structures identified while theV-shaped are the rarest (0.1%).

Validation of GeSTer prediction

To validate the results obtained, we analyzed the experimentallyidentified operons and transcription units across several eubacteria tosee if GeSTer-identified structures occurred downstream of suchoperons. The representative results (Table 2) show that severaloperons from diverse species such as Caulobacter crescentus, E. coli,Helicobacter pylori, Mycobacterium tuberculosis, Salmonella typhi-murium, Streptomyces avermitilis, etc indeed have GeSTer-identifiedterminators at their 3′ end. We also note that a number of suchoperons end with an I-shaped structure. It is noteworthy that even inthe prototypical E. coli, the “Best” structures identified downstream ofmany operons (eg., nagBACD operon, cpsBG operon) lack U-trails. Aterminator devoid of the U-trail has been shown to function in E. coli[21] and such I-shaped terminators have also been shown to be

Page 3: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

112 A. Mitra et al. / Genomics 94 (2009) 110–116

efficient in diverse species such as mycobacteria[24,25] and H. pylori[34]. These results validate that canonical and variant stem–loopstructures are abundant across eubacteria and are necessary toachieve transcription termination. Apart from functioning as intrinsicterminators, some of the I-shaped structures could function as Class Ipause signals and cause fraying or hypertranslocation of theelongation complex [35]. Pausing induced by such structures couldcause a “slowing down” of the RNA polymerase and thus facilitatefactor-dependent termination. Protection against 3′-5′ exonucleasescould be another function of such structures [36].

GeSTer is as efficient as other published algorithms such asTransTerm. TransTerm achieves specificity and sensitivity of 93% anda false positive rate of 6%[29]. GeSTer gives similar values — falsepositive rate of 5–10% and false negative rate of ∼10%. These valuesare also at par with those obtained when predicting terminators forB.subtilis and other Firmicutes [30]. However, it is noteworthy thatthe prediction by other programs is sensitive to such high degreeonly for bacteria that have the canonical, L-shaped terminators. Incontrast, GeSTer efficiently predicts variant terminator structures inmany genomes which are overlooked by other algorithms.

Intrinsic terminators are uniformly distributed over bacterial genomes

The program TERC (TErminators Represented in Circle) was usedto analyze the distribution of intrinsic terminators in eubacterialgenomes. TERC shows the distribution in a circular schematic of thegenome, using the coordinates of positions of terminators obtainedfrom the GeSTer output. The start sites of GeSTer-identified “best”intrinsic terminators are plotted on a circular graph, that representsthe genome. The results for 6 representative species are shown(Supplementary Fig. 3). In general, genes on both strands employintrinsic terminators and there is no strand-specific bias with respectto occurrence. In the E. coli genome, both L- and I-shapedterminators seem to be uniformly distributed (SupplementaryFig. 4) and there are no clusters where only one shape of terminator

Fig.1.Differential preference for intrinsic terminator shapeds in eubacterial phyla. The resultsfrom (A) Firmicutes (n=54) (B) Actinobacteria (n=28), (C) α-Proteobacteria (n=39), (Dobtained by dividing the sum of “Best” structures (L-shaped or I-shaped) with total numberobtained by dividing the sum of “All” structures with total number of “All” structures in the

is predominant. In contrast, analysis of genomes of S. aureus MRSA,Listeria monocytogenes and Thermoanaerobacter tengcongensis showa concentration of the structures in half the length of each DNAstrand of the genome (Supplementary Figs. 3C, E, F). This is becausethere is an identically skewed distribution of genes in thesegenomes, and the positions of terminators actually mirror that ofgenes, both in the gene-rich as well as in the gene poor regions. Allgenomes analyzed with TERC show a few “gene islands” where noterminator structure is present. The genes in such regions could bedependent on Rho-mediated termination.

Bacterial phyla showpreference for different shapes of intrinsic terminators

We have analyzed the preference for a particular shape of intrinsicterminator in a given eubacteria phylum. The representative eubac-terial phyla studied are the Firmicutes, Actinobacteria, α-Proteobac-teria and β-Proteobacteria (Fig.1, Supplementary Tables 3A–3F). While80% of all intrinsic terminators identified in Firmicutes are of the L-shaped, the Actinobacteria employ such terminators to a lower degree(only ∼25% of “best” structures). The majority of intrinsic terminatorsin Actinobacteria, are I-shaped; for eg., all 17 mycobacterial genomes[33] (Supplementary Table 3B) sequenced till date show a highpropensity of I-shaped terminators.α- and β-Proteobacteria also showhigh preference for I-shaped terminators. The preference for L- or I-shaped terminators was statistically significant between Firmicutesand Actinobacteria and Firmicutes and β-proteobacteria. However, in afew genera, the preference for a particular shape of terminator is notobserved. For instance, around 59% of total terminators identified inCorynebacterium diphtheriae and Corynebacterium glutamicum are ofL-shape. In contrast, the closely related Corynebacterium efficiens andCorynebacterium jeikeium show a much lower prevalence (37%) forL-shaped terminators (Supplementary Table 3B). Similarly, the L- andI-shaped terminators are equally present in the genomes of Rickettsiaprowazekii and Rickettsia typhi, while other Rickettsia species show apreference for the I-shape (Supplementary Table 3C).

shown here are for themean and standard deviation for L-, I- and U-shaped terminators) β-Proteobacteria (n=20). % occurrence of L-shaped and I-shaped terminators wasof “Best” structures in the sample. % occurrence of U-shaped (tandem) terminators wassample.

Page 4: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

113A. Mitra et al. / Genomics 94 (2009) 110–116

Furthermore, 8–11% of all intrinsic terminators across bacteria areU-shaped, but certain organisms such as Burkholderia and Rickettsiaspecies show a higher preference (up to 17%).

Correlation between the intrinsic terminator and genomic GC content

To study if there is any correlation between the prevalence of acertain type of terminator in an organism with any other genomicparameter, the preference for I-shaped (or L-shaped) terminators wasanalyzed against genomic GC content. There is a correlation betweengenomic GC content of the bacteria and its preference for I-shapedterminators (Supplementary Fig. 5). The fraction of I-shaped termi-nators increases with genomic GC content. Bacteria with high GCcontent, such as genus Mycobacterium, have an overwhelmingmajority of I-shaped terminators. In contrast, eubacteria having lowGC content genomes, in general, have a preponderance of L-shapedterminators, well exemplified by Firmicutes (Supplementary Table 3B).Genomes with a GC% in the intermediate range showed equalpreference for both kinds of terminators.

However, besides genomic GC content, transcription elongationrate and evolution of RNA polymerase subunits [37] and auxiliaryproteins are also likely to be determinants of the type of terminatorselected in a given organism. The indirect relationship between GCcontent and terminator shape seems more plausible in the light of thefact that preference for I-shaped terminators varies widely amongstorganismswith a similar GC content. Recent studies indicatemultiplesroles for the nascent hairpin in — destabilizing the upstream end ofthe DNA–RNA hybrid, preventing addition of NTP by blockingtranslocation at the active site of RNA polymerase and hinderinginteraction between nascent transcript and the polymerase. Thus, thehairpin traps the RNA polymerase in a thermodynamically weaksituation[18] and also causes conformational changes in the RNApolymerase[8].Organisms with higher transcription rate may have apreference and prevalence of L-shaped structures, as both the stem–

loop and the U-trail are needed to cause efficient pausing anddissociation of the rapidly advancing elongation complex. In contrast,in organisms likeM. tuberculosis, with slower elongation rate [38], thestem–loop structure alone would have sufficient time to form in theRNA polymerase exit channel even in absence of a U-trail. Insupport to this idea, a mutant E. coli RNA polymerase which has aslower transcription rate can terminate efficiently on I-shapedterminators [12,39].

Variation in position of terminator structure with respect to the stop codon

Formanybacterial genomes, terminator structures are concentratedwithin 50 bp of the stop codon (Supplementary Fig. 6). Furthermore,the “Best” (or strongest) structures show a more dramatic peak. Theobserved clustering of terminators immediately downstream of thestop codon indicates a scenario where the Ternary Elongation Complex(TEC) is halted soon after it has crossed the stop codon. The occurrenceof large number of terminators around 50 bp after the stop codonindicates that the position of terminators is not a randomphenomenon.If the positioningwas only a function of GC content, terminators wouldhave been observed over the entire downstream region. This clusteringalso has a functional significance as it would prevent unnecessary“wastage” ofNTPs, result in faster recyclingof the TEC and could also aidin closer packaging of genes in the genome. We further confirmed thisby shuffling six representative complete genome sequences and thendetermined their terminator profile (Supplementary Fig. 7, Supple-mentary Table 4). In all the cases, the number of terminators identifiedwas much lower compared to the “wildtype” unshuffled genomesequence. Also, the clustering of terminators was lost in case of theshuffled sequences. The results show that terminator distribution is anon-random process and the sequences which transcribe intoterminator structures have been selectively “positioned” downstream

of the genes to cause termination. To validate the observation further,t-test was performed to compare the average around the peak andthat of the −190 to −200 bp region (with respect to stop codon) andwas found to be statistically significant [24].

An “optimized” hairpin structure for terminators

Several parameters of an intrinsic terminator influence its termina-tion efficiency. The stability of the stem–loop structure (as given by itsΔG) is one such important criteria. A stronger structure (more negativeΔG value) is more likely to cause efficient pausing and termination. Incontrast, a weaker terminator (less negative ΔG value) will probablyresult in increased read-through by the elongation complex. In theGeSTer results, a genome-specific ΔGcut off is used. Structures, whichhave aΔGmore negative than thisΔGcut off value, are only considered tobe functional terminators.When the GeSTer-identified terminators of agenome were sorted as per their ΔG, it was seen that more than 70%identified terminators had ΔG values only slightly more negative thanthe ΔGcut off. Thus, irrespective of the species, 70–85% of the identified“best” structures had ΔG≤−25 kcal/mol (Fig. 2). Moreover, mostterminators identified downstream of known operons/transcriptionunits in various eubacteria (Table 2) have a ΔG in this range. A similaranalysis of the identified terminators based on the stem-length of thehairpin showed that 60–70% had a stem-length of 6–13 bp(Supplementary Fig. 8). In case of the well characterized λtR2terminator, studies on the effect of the stability of the RNA hairpin ontermination efficiency (TE) have shown that the optimal hairpin formaximizing TE has a stem of 8–9 bp [17]. An optimized hairpin wouldfit into the main channel of RNA polymerase [8], and freeze the TECby inducing DNA bubble collapse, resulting in disruption of thehybrid and preventing translocation at active site[18]. Thus, it appearsthat shorter or longer stems would be inefficient for the sequentialinteraction during the termination process. Since there is likely to bea continuous selection in favour of densely packed bacterial genomes,it would be advantageous for cells to evolve intrinsic terminatorswhich are encoded by smaller genomic regions, but nevertheless arefunctionally efficient. In addition, stem-length could have beenconstrained and optimized by its interaction with RNA polymerase.Additional factors such as Rho, NusA and NusG could contribute toincrease the efficiency of the terminators.

Relationship between intrinsic terminators and termination factor Rho

Genes that are not terminated by intrinsic mechanism, probablyemploy Rho-dependent transcription termination. Rho homologueshave been identified in many eubacteria. However, the requirementand consequently cellular levels of Rho vary from species to species.To understand if there was any relation between Rho and intrinsicterminators, we analyzed eubacterial genomes for presence of rhogene. Out of a representative sample of 186 bacteria, 31 species didnot have an identifiable rho homologue (Mitra et al., unpublishedobservations). The bacteria included all the Mollicutes as well asLactobacilli, Cyanobacteria and Streptococcus species. Intriguingly,most of the bacteria which lack rho also show a preference for L-shaped intrinsic terminators (average 65.7% of total) and a lowgenomic GC content (average 38.03%) (Supplementary Fig. 9 andSupplementary Table 5). Moreover, most of these bacteria seem tohave less genes (average number of 1893 genes compared to theaverage number of 2998 genes for 255 species). Mollicutes, which arerelated to Bacillus, could have dispensed off with Rho at a latter datewhen they established an endosymbiont life cycle. Whether thefunctional importance of Rho becomes greater as the GC contentand/or the number of genes increases in a genome is a point thatneeds additional experimentation and datasets. In this context,previous studies have revealed low in vivo termination efficiency incase of a particular I-shaped terminator [21]. Although Rho requires

Page 5: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

Fig. 2. Distribution of identified intrinsic terminators as per their ΔG in representative eubacteria. The abscissa denotes the ΔG (kcal/mol). The ordinate denotes fraction of totalnumber of “Best” structures in that species.

114 A. Mitra et al. / Genomics 94 (2009) 110–116

C-rich unstructured RNAs for efficient loading, it may also load ontoRNAs without these canonical features with lesser efficiency. If such“rho-loading” sequences are present in the vicinity of I-shapedterminators, they could facilitate termination.

Conclusions

The analysis with GeSTer reliably focuses on the differences instructure and functional dependence on intrinsic terminators over alarge and diverse sample of bacterial genomes. Our results show thatthe unifying feature of the identified intrinsic terminators is the stem–

loop structure. Canonical and variant terminators are extensively usedacross eubacteria. The usage of alternate intrinsic terminators isindicative of the diversity and evolution of non-coding regulatorysequences across eubacteria. Structural variants of terminators have gotselected in different bacterial species. The selection of terminatorshapes and their distribution is likely to have been influenced byparallel evolution of RNA polymerase subunits and factor-dependenttermination. Other factors such as transcription elongation rate,genomic GC content and growth rate have also probably dictated the

preference for terminator shapes in the context of a particular genome.GeSTer-identified terminators occur downstream of operons andtranscription units in several bacteria. Thus, our results can also beused to predict and validate one end of operons across bacterial species.An example of such application is in the case of the Streptomycescoelicolor genome [40].

At a whole genome level, transcription terminators function tocompartmentalize gene expression. Terminators act as roadblocks forthe TEC and thus modulate the expression of genes that are physicallyadjacent to each other on the genome sequence. Formany terminators,the hairpin and the U-trail are both important for efficient pausing andtermination. In caseswhere the U-trail is dispensable, sequences distalto the termination site can functionally complement the U-trail's rolein causing termination [8,41]. However, several algorithms developedto identify intrinsic terminators recognize only E. coli type (L-shaped)terminators. Thus, analysis of genomes from a variety of species thatshowa lesser dependence on L-shaped terminatorsmayhave led to theconclusion that Rho-dependent termination is more active in suchspecies. Further, in those organisms, such as the cyanobacteriumSynechocystis, where a rho homologue is absent and L-shaped

Page 6: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

115A. Mitra et al. / Genomics 94 (2009) 110–116

terminators are underrepresented, existence of alternate terminationmechanism has been proposed [27,31]. Our data, by bringing intoaccount the various structures that can function as intrinsic termina-tors, reexamines the paradigm and emphasizes the view that intrinsictermination is a successfully evolved mechanism and hence widelyused in the eubacteria domain.

Materials and methods

Identification and computational analysis of intrinsic terminators fromwhole genome sequences

A typical intrinsic terminator has a double-stranded stem and acentral, unpaired bulb. Symmetric and asymmetric unpaired regionsin the stem, termed mismatches and gaps respectively, can also occurin a terminator. The sequence downstream of the hairpin is importantin some cases. GeSTer identifies palindromic structures downstreamof the genes, and calculates their stability, distribution, and the natureof trailing sequences and adjacent structures. Terminators areidentified using both the ΔG and the permitted lengths of thestructural features. Qualitative data fromexperimentally characterizedintrinsic terminators has been used to set the default parametersincorporated into GeSTer.

The mode of operation of GeSTer is as follows — first, wholegenome sequences in the GenBank format [42] are segregated intocoding, upstream and downstream regions. Next, palindromicsequences in the −20 to +270 nucleotide region with respect to thestop codon, are identified. The search is however stopped beforeentering the coding region of the downstream gene. Subsequently, allpotential structures are computed, and the one with the lowest ΔG isconsidered. The search is then repeated. Finally, a ΔGcut off filter is usedto select the final set of structures. This filter is based on the genomicGC content of bacteria and characteristics of structures in theupstream region. It must be noted that there is species-specificity tothe basal ΔG of non-coding regions of genomes. It is also stronglycorrelated to the genomic GC content. The best linear regression fitcorresponds to the Eq. (1),

ΔGdownstream = − 0:294 × kGCð Þ + 4:411: ð1Þ

Selection by GeSTer has been optimized such that there isminimum identification of structures that are present upstream ofgenes. Pure upstream regions, which occur between two divergentlytranscribed genes, were identified for this purpose. The optimizedΔGcut off is derived by iteratively weighting ΔGdownstream such thatseparation between upstream and downstream structures ismaximized. The weight parameter is given by dividing the optimalΔG for E. coli with ΔGdownstream for E. coli.

Thus, for any bacterial genome, the final ΔGcut off is given by,

ΔGcut off = 12= 10:5ð Þ × −0:294 × kGCð Þ + 4:411½ �: ð2Þ

The specificity of GeSTer was determined by analyzing all theexperimentally tested terminators. To elaborate further, using thelatest criteria [43], we analyzed whether GeSTer could generate anidentical or similar structure as has been deduced from genetic orbiochemical studies. The terminators used for this analysis are fromthe references [25,26,44,45]. With these parameters, GeSTer couldidentify 85–92% of known terminators in various sequences. The useof a species-specific ΔGcut off resulted in b10% false negatives, butimportantly prevented identification of many false positives acrossgenomes. After optimization of the ΔGcut off, there were N10-fold moreterminators identified in the downstream region compared to thepure upstream regions, Assuming that all the “pure” upstreamstructures are incorrect (which may not be) we can then assumethat the algorithm might also be detecting a similar number of

incorrect structures in the downstream region. This gave us a falsepositive rate of 5–10% depending on the genome under consideration.

The genomic location, sequence, structural parameters, distancefrom stop codon and genomic distribution of the potential terminatorstructures can be obtained from the output. GeSTer classifies theidentified terminator structures based on the sequence of the trailfollowing the hairpin, and position of adjacent structures (Supple-mentary Fig. 1). Terminators identified, using these criteria, areclassified into different types depending on whether a U-trail ispresent or absent and the presence and spatial arrangement ofadjacent stem-loop structures. They are thus classified into 1) L-shaped/E. coli type, where N3 Us following the stem–loop structure,2) I-shaped (Mycobacterium type), in which case b3 Us in the 10ntstrech follow the stem–loop, 3) Tandem/U-shaped, when 2 or morestructures occur within 50 nucleotides of each other downstream ofthe same gene, 4) V-shaped, when a stem-loop structure immediatelyprecedes another with no intervening sequence, and 5) X-shaped/Convergent, which occur between convergently oriented genes on thetwo strands. Most identified terminators are symmetrical and couldpotentially work in either orientation.

The source code is Visual Basic. The program runs in all availableWindows environment (Windows 97, 98, 2000, XP). The front end isuser-friendly and most parameters (such as stem-length, loop,number of mismatches and distance from stop codon), although setto default, can be modified by the user. The default values are 4–30 bpfor the stem-length, 3–9 nucleotides for the loop and a maximum of 3nucleotides in amismatch or gap. Outputs are generated as graphs andtab-limited text files. Individual structures can also be viewedgraphically. In addition, the new version of GeSTer (GeSTer 2.3) isflexible to accommodate minor variations in the GenBank format ofinput genome sequences. Besides, some details about the genes (suchas, gene name, coordinates of genes and terminators) are nowavailable in the output file, making it more user-friendly andfacilitating analysis about their terminator profiles. Two separateoutput files (called “weakpalinsreg.dat” and “weakpalinscomp.dat”)have been created which include all the palindromic structures, whichwere computed but did not meet the genomic ΔGcut off. Nevertheless,they could be important in the analysis.

For studying the distribution of identified intrinsic terminators inthe whole genome, a PERL-based program, TERC (TerminatorsRepresented in Circle) was constructed. On a circular templaterepresenting the genome, the coordinates of the positions of all ofthe identified L- and I-shaped terminators, as given by the GeSTeroutput, was plotted, followed by the positions of genes using thenucleotide coordinates of start codons from the GenBank file.

All genomic sequences used in the present study have beendownloaded from the genome database of the National Center forBiotechnology Information, ftp://ftp.ncbi.nih.gov/genomes/Bacteria[42]. Shuffling of entire genome sequences was carried out using aPERL-based program. Shuffled genome sequences were then analyzedby GeSTer. Analysis of ΔG and stem-length of terminators and wasdone using Microsoft Excel 2003. Student's two-tail t-test was carriedout to ascertain the significance of the differential preference fordifferent types of terminators by bacterial phyla. Graphs were plottedusing GraphPad Prism software.

Conflict of interest statementNone.

Acknowledgments

The authors would like to thank O.Krishnadev for algorithmdevelopment, Natasha Mhatre and Anil Kesarwani for statisticalanalysis and discussion. The work in VN's laboratory is supported by agrant under Indo-Swiss Joint Research Programme of Department ofScience and Technology, Government of India.

Page 7: Occurrence, divergence and evolution of intrinsic terminators across Eubacteria

116 A. Mitra et al. / Genomics 94 (2009) 110–116

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.ygeno.2009.04.004.

References

[1] P.H. von Hippel, An integrated model of the transcription complex in elongation,termination, and editing, Science 281 (1998) 660–665.

[2] J.P. Richardson, J. Greenblatt, Control of RNA chain elongation and termination. in:F.C.e. Neidhert, (Ed.), Escherichia coli and Salmonella: Cellular and MolecularBiology, ASM press, Washington, D.C., pp. 822–848

[3] T. Platt, Transcription termination and the regulation of gene expression, Annu.Rev. Biochem. 55 (1986) 339–372.

[4] T.M. Henkin, Control of transcription termination in prokaryotes, Annu. Rev.Genet. 30 (1996) 35–57.

[5] S. Banerjee, J. Chalissery, I. Bandey, R. Sen, Rho-dependent transcriptiontermination: more questions than answers, J. Microbiol. 44 (2006) 11–22.

[6] M.S. Ciampi, Rho-dependent terminators and transcription termination, Micro-biology 152 (2006) 2515–2528.

[7] T.J. Santangelo, J.W. Roberts, Forward translocation is the natural pathway of RNArelease at an intrinsic terminator, Mol. Cell 14 (2004) 117–126.

[8] V. Epshtein, C.J. Cardinale, A.E. Ruckenstein, S. Borukhov, E. Nudler, An allostericpath to transcription termination, Mol. Cell 28 (2007) 991–1001.

[9] I. Gusarov, E. Nudler, The mechanism of intrinsic transcription termination, Mol.Cell 3 (1999) 495–504.

[10] R.A. King, D. Markov, R. Sen, K. Severinov, R.A. Weisberg, A conserved zinc bindingdomain in the largest subunit of DNA-dependent RNA polymerase modulatesintrinsic transcription termination and antitermination but does not stabilize theelongation complex, J. Mol. Biol. 342 (2004) 1143–1154.

[11] I. Toulokhonov, I. Artsimovitch, R. Landick, Allosteric control of RNA polymerase bya site that contacts nascent RNA hairpins, Science 292 (2001) 730–733.

[12] W.S. Yarnell, J.W. Roberts, Mechanism of intrinsic transcription termination andantitermination, Science 284 (1999) 611–615.

[13] N. Komissarova, J. Becker, S. Solter, M. Kireeva, M. Kashlev, Shortening of RNA:DNAhybrid in the elongation complex of RNA polymerase is a prerequisite fortranscription termination, Mol. Cell 10 (2002) 1151–1162.

[14] I. Toulokhonov, R. Landick, The flap domain is required for pause RNA hairpininhibition of catalysis by RNA polymerase and can modulate intrinsic termination,Mol. Cell 12 (2003) 1125–1136.

[15] P.J. Farnham, T. Platt, Rho-independent termination: dyad symmetry in DNAcauses RNA polymerase to pause during transcription in vitro, Nucleic Acids Res. 9(1981) 563–577.

[16] I. Artsimovitch, R. Landick, Interaction of a nascent RNA structure with RNApolymerase is required for hairpin-dependent transcriptional pausing but not fortranscript release, Genes Dev. 12 (1998) 3110–3122.

[17] K.S. Wilson, P.H. von Hippel, Transcription termination at intrinsic terminators:the role of the RNA hairpin, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 8793–8797.

[18] K. Datta, P.H. von Hippel, Direct spectroscopic study of reconstituted transcriptioncomplexes reveals that intrinsic termination is driven primarily by thermo-dynamic destabilization of the nucleic acid framework, J. Biol. Chem. 283 (2008)3537–3549.

[19] I. Gusarov, E. Nudler, Control of intrinsic transcription termination by N and NusA:the basic mechanisms, Cell 107 (2001) 437–449.

[20] S.P. Lynn, L.M. Kasper, J.F. Gardner, Contributions of RNA secondary structure andlength of the thymidine tract to transcription termination at the thr operonattenuator, J. Biol. Chem. 263 (1988) 472–479.

[21] H. Abe, H. Aiba, Differential contributions of two elements of rho-independentterminator to transcription termination and mRNA stabilization, Biochimie 78(1996) 1035–1042.

[22] R. Reynolds, M.J. Chamberlin, Parameters affecting transcription termination byEscherichia coli RNA. II. Construction and analysis of hybrid terminators, J. Mol.Biol. 224 (1992) 53–63.

[23] S.J. Greive, S.E. Weitzel, J.P. Goodarzi, L.J. Main, Z. Pasman, P.H. von Hippel,Monitoring RNA transcription in real time by using surface plasmon resonance,Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 3315–3320.

[24] S. Unniraman, R. Prakash, V. Nagaraja, Alternate paradigm for intrinsic transcrip-tion termination in eubacteria, J. Biol. Chem. 276 (2001) 41850–41855.

[25] C.J. Ingham, I.S. Hunter, M.C. Smith, Rho-independent terminators without 3′poly-U tails from the early region of actinophage φC31, Nucleic Acids Res. 23(1995) 370–376.

[26] Y. d'Aubenton Carafa, E. Brody, C. Thermes, Prediction of rho-independentEscherichia coli transcription terminators. A statistical analysis of their RNAstem–loop structures, J. Mol. Biol. 216 (1990) 835–858.

[27] M.D. Ermolaeva, H.G. Khalak, O. White, H.O. Smith, S.L. Salzberg, Prediction oftranscription terminators in bacterial genomes, J. Mol. Biol. 301 (2000) 27–33.

[28] E.A. Lesnik, R. Sampath, H.B. Levene, T.J. Henderson, J.A. McNeil, D.J. Ecker,Prediction of rho-independent transcriptional terminators in Escherichia coli,Nucleic Acids Res. 29 (2001) 3583–3594.

[29] C.L. Kingsford, K. Ayanbule, S.L. Salzberg, Rapid, accurate, computational discoveryof Rho-independent transcription terminators illuminates their relationship toDNA uptake, Genome Biol. 8 (2007) R22.

[30] M.J. de Hoon, Y. Makita, K. Nakai, S. Miyano, Prediction of transcriptionalterminators in Bacillus subtilis and related species, PLoS Comput. Biol.1 (2005) e25.

[31] T. Washio, J. Sasayama, M. Tomita, Analysis of complete genomes suggests thatmany prokaryotes do not rely on hairpin formation in transcription termination,Nucleic Acids Res. 26 (1998) 5456–5463.

[32] S. Unniraman, R. Prakash, V. Nagaraja, Conserved economics of transcriptiontermination in eubacteria, Nucleic Acids Res. 30 (2002) 675–684.

[33] A. Mitra, K. Angamuthu, V. Nagaraja, Genome-wide analysis of the intrinsicterminators of transcription across the genus Mycobacterium, Tuberculosis(Edinb) 88 (2008) 566–575.

[34] A.R. Castillo, S.S. Arevalo, A.J. Woodruff, K.M. Ottemann, Experimental analysis ofHelicobacter pylori transcriptional terminators suggests this microbe uses bothintrinsic and factor-dependent termination, Mol. Microbiol. 67 (2008) 155–170.

[35] I. Artsimovitch, R. Landick, Pausing by bacterial RNA polymerase is mediated bymechanistically distinct classes of signals, Proc. Natl. Acad. Sci. U. S. A. 97 (2000)7090–7095.

[36] J.E. Mott, J.L. Galloway, T. Platt, Maturation of Escherichia coli tryptophan operonmRNA: evidence for 3′ exonucleolytic processing after rho-dependent termina-tion, EMBO J. 4 (1985) 1887–1891.

[37] L.M. Iyer, E.V. Koonin, L. Aravind, Evolution of bacterial RNA polymerase:implications for large-scale bacterial phylogeny, domain accretion, and horizontalgene transfer, Gene 335 (2004) 73–88.

[38] R.M. Harshey, T. Ramakrishnan, Rate of ribonucleic acid chain growth inMycobacterium tuberculosis H37Rv, J. Bacteriol. 129 (1977) 616–622.

[39] J.C. McDowell, J.W. Roberts, D.J. Jin, C. Gross, Determination of intrinsictranscription termination efficiency by RNA polymerase elongation rate, Science266 (1994) 822–825.

[40] E. Laing, V. Mersinias, C.P. Smith, S.J. Hubbard, Analysis of gene expression inoperons of Streptomyces coelicolor, Genome Biol. 7 (2006) R46.

[41] A.M. Ryder, J.W. Roberts, Role of the non-template strand of the elongation bubblein intrinsic transcription termination, J. Mol. Biol. 334 (2003) 205–213.

[42] NCBI GenBank FTP Site [ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria].[43] D.H. Mathews, J. Sabina, M. Zuker, D.H. Turner, Expanded sequence dependence of

thermodynamic parameters improves prediction of RNA secondary structure,J. Mol. Biol. 288 (1999) 911–940.

[44] R.K. Hartmann, V.A. Erdmann, Analysis of the gene encoding the RNA subunit ofribonuclease P from T. thermophilus HB8, Nucleic Acids Res. 19 (1991) 5957–5964.

[45] K. Steiner, H. Malke, Transcription termination of the streptokinase gene ofStreptococcus equisimilis H46A: bidirectionality and efficiency in homologous andheterologous hosts, Mol. Gen. Genet. 246 (1995) 374–380.


Recommended