+ All Categories
Home > Documents > Genome-wide characterization of the relationship between essential and TATA … · 2016. 12. 9. ·...

Genome-wide characterization of the relationship between essential and TATA … · 2016. 12. 9. ·...

Date post: 04-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
8
Genome-wide characterization of the relationship between essential and TATA-containing genes Hyun Wook Han a,b , Sang Hun Bae b , Yun-Hwa Jeong c , Jisook Moon a,b,c,a College of Medicine, CHA University, CHA General Hospital, Seoul, Republic of Korea b College of Life Science, Department of Applied Bioscience, CHA University, Seoul, Republic of Korea c Clinical Statistics Center, CHA University, Seoul, Republic of Korea article info Article history: Received 19 October 2012 Revised 18 December 2012 Accepted 26 December 2012 Available online 18 January 2013 Edited by Takashi Gojobori Keywords: Essential gene TATA-containing gene Codon bias Expression Degree of protein interaction network The number of transcription factor binding site The pattern of amino acid usage Saccharomyces Cerevisiae abstract Essential genes are involved in most survival-related housekeeping functions. TATA-containing genes encode proteins involved in various stress–response functions. However, because essential and TATA-containing genes have been researched independently, their relationship remains unclear. The present study classified Saccharomyces cerevisiae genes into four groups: non-essential non-TATA, non-essential TATA, essential non-TATA, and essential TATA genes. The results showed that essential TATA genes have the most significant codon bias, the highest level of expression, and unique characteristics, including a large number of transcription factor binding sites, a higher degree in protein interaction networks, and significantly different amino acid usage patterns com- pared with the other gene groups. Notably, essential TATA genes were uniquely involved in func- tions such as unfolded protein binding, glycolysis, and alcohol and steroid-related processes. Ó 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. 1. Introduction Genes can be categorized as essential or non-essential depend- ing on their indispensability to life in rich medium [1,2]. According to this definition, approximately 20% of Saccharomyces cerevisiae genes are essential [3]. Essential genes are involved in most sur- vival-related housekeeping functions and tend to be highly ex- pressed in all cells [4–6]. Essential genes evolved more slowly, show higher codon bias and tend to encode more hubs in PIN com- pared to their non-essential counter parts [3,7–11]. Genes can also be classified as TATA (TATA-containing) and non-TATA (TATA-less) genes based on the presence or absence of a TATA box in the pro- moter region [12]. Approximately 20% of genes are TATA genes, and 80% are non-TATA genes [12,13]. TATA genes encode proteins involved in various stress–response functions for cellular defense, and the expression of these proteins tends to be ‘‘noisy’’ [13]. The TATA box is a universal element and is highly conserved [14]. TATA genes differ from non-TATA genes in that the regulation of TATA genes involves many transcription factors [15]. Although both essential genes and TATA genes are clearly important in the evolution and function of biological systems, their relationship is unknown because they are typically researched independently. The present study classified S. cerevisiae genes into and subsequently characterized four groups: NENT, NET, ENT, and ET genes. The results not only show the importance and uniqueness of ET genes but also shed light on the relationship between ENT and NET genes based on the codon adaptation index (CAI), expression level (EL), number of transcription factor binding sites (TFBSs), amino acid usage patterns and degree in the protein interaction network (Degree). Finally, the functional uniqueness of each of the four groups of S. cerevesiae genes was investigated using gene ontology (GO) enrichment analysis. 0014-5793/$36.00 Ó 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.febslet.2012.12.030 Abbreviations: NENT, non-essential non-TATA; NET, non-essential TATA; ENT, essential non-TATA; ET, essential TATA; CAI, codon adaptation index; Fop, frequency of optimal codon; EL, mRNA expression level; Degree, degree in protein interaction network; TFBS, the number of transcription factor binding sites; ORF, open reading frame; PIN, protein interaction network Corresponding author. Address: CHA University, Department of Applied Bioscience, 606-16 Yeoksam-1 dong, Gangnam-gu, Seoul, Republic of Korea. Fax: +82 2 538 4102. E-mail address: [email protected] (J. Moon). FEBS Letters 587 (2013) 444–451 journal homepage: www.FEBSLetters.org
Transcript
  • FEBS Letters 587 (2013) 444–451

    journal homepage: www.FEBSLetters .org

    Genome-wide characterization of the relationship between essentialand TATA-containing genes

    0014-5793/$36.00 � 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.febslet.2012.12.030

    Abbreviations: NENT, non-essential non-TATA; NET, non-essential TATA; ENT,essential non-TATA; ET, essential TATA; CAI, codon adaptation index; Fop,frequency of optimal codon; EL, mRNA expression level; Degree, degree in proteininteraction network; TFBS, the number of transcription factor binding sites; ORF,open reading frame; PIN, protein interaction network⇑ Corresponding author. Address: CHA University, Department of Applied

    Bioscience, 606-16 Yeoksam-1 dong, Gangnam-gu, Seoul, Republic of Korea.Fax: +82 2 538 4102.

    E-mail address: [email protected] (J. Moon).

    Hyun Wook Han a,b, Sang Hun Bae b, Yun-Hwa Jeong c, Jisook Moon a,b,c,⇑a College of Medicine, CHA University, CHA General Hospital, Seoul, Republic of Koreab College of Life Science, Department of Applied Bioscience, CHA University, Seoul, Republic of Koreac Clinical Statistics Center, CHA University, Seoul, Republic of Korea

    a r t i c l e i n f o

    Article history:Received 19 October 2012Revised 18 December 2012Accepted 26 December 2012Available online 18 January 2013

    Edited by Takashi Gojobori

    Keywords:Essential geneTATA-containing geneCodon biasExpressionDegree of protein interaction networkThe number of transcription factor bindingsiteThe pattern of amino acid usageSaccharomyces Cerevisiae

    a b s t r a c t

    Essential genes are involved in most survival-related housekeeping functions. TATA-containinggenes encode proteins involved in various stress–response functions. However, because essentialand TATA-containing genes have been researched independently, their relationship remainsunclear. The present study classified Saccharomyces cerevisiae genes into four groups: non-essentialnon-TATA, non-essential TATA, essential non-TATA, and essential TATA genes. The results showedthat essential TATA genes have the most significant codon bias, the highest level of expression,and unique characteristics, including a large number of transcription factor binding sites, a higherdegree in protein interaction networks, and significantly different amino acid usage patterns com-pared with the other gene groups. Notably, essential TATA genes were uniquely involved in func-tions such as unfolded protein binding, glycolysis, and alcohol and steroid-related processes.� 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

    1. Introduction

    Genes can be categorized as essential or non-essential depend-ing on their indispensability to life in rich medium [1,2]. Accordingto this definition, approximately 20% of Saccharomyces cerevisiaegenes are essential [3]. Essential genes are involved in most sur-vival-related housekeeping functions and tend to be highly ex-pressed in all cells [4–6]. Essential genes evolved more slowly,show higher codon bias and tend to encode more hubs in PIN com-pared to their non-essential counter parts [3,7–11]. Genes can alsobe classified as TATA (TATA-containing) and non-TATA (TATA-less)genes based on the presence or absence of a TATA box in the pro-

    moter region [12]. Approximately 20% of genes are TATA genes,and 80% are non-TATA genes [12,13]. TATA genes encode proteinsinvolved in various stress–response functions for cellular defense,and the expression of these proteins tends to be ‘‘noisy’’ [13].The TATA box is a universal element and is highly conserved[14]. TATA genes differ from non-TATA genes in that the regulationof TATA genes involves many transcription factors [15].

    Although both essential genes and TATA genes are clearlyimportant in the evolution and function of biological systems, theirrelationship is unknown because they are typically researchedindependently. The present study classified S. cerevisiae genes intoand subsequently characterized four groups: NENT, NET, ENT, andET genes.

    The results not only show the importance and uniqueness of ETgenes but also shed light on the relationship between ENT and NETgenes based on the codon adaptation index (CAI), expression level(EL), number of transcription factor binding sites (TFBSs), aminoacid usage patterns and degree in the protein interaction network(Degree). Finally, the functional uniqueness of each of the fourgroups of S. cerevesiae genes was investigated using gene ontology(GO) enrichment analysis.

    http://crossmark.dyndns.org/dialog/?doi=10.1016/j.febslet.2012.12.030&domain=pdfhttp://dx.doi.org/10.1016/j.febslet.2012.12.030mailto:[email protected]://dx.doi.org/10.1016/j.febslet.2012.12.030http://www.FEBSLetters.org

  • H.W. Han et al. / FEBS Letters 587 (2013) 444–451 445

    2. Materials and methods

    2.1. S. cerevisiae genes, amino acid sequences, CAI and Fop

    The ORF names, amino acid sequences, CAI and Fop of6717 genes were retrieved from the yeast genome database(SGD, http://downloads.yeastgenome.org/curation/calculated_pro-tein_info/protein_properties.tab).

    2.2. Essential genes

    Information regarding the essentiality (or lethality) of 5640 S.cerevisiae genes was retrieved from the MIPS database (http://mips.helmholtz-muenchen.de/genre/proj/yeast/Search/Catalogs/searchCatfirstDisruption.html). Of these, 1109 genes were essentialand 4531 were non-essential.

    2.3. TATA genes

    Information regarding the TATA box of 5671 S. cerevisiae geneswas obtained from the raw data of Basehoar et al. [12]. The analysisidentified 1090 TATA and 4581 non-TATA genes.

    2.4. El

    The mRNA expression values of 6250 S. cerevisiae genes, as re-ported by Greenbaum et al. [16], were used as comprehensive ref-erence values (http://bioinfo.mbb.yale.edu/genome/expression/translatome/ref.txt). These reference values were constructed bymerging and scaling the results of several previously publishedgene chips and serial analyses of gene expression experiments.

    2.5. Tfbs

    TFBSs for 6496 S. cerevisiae genes were obtained by querying6717 ORFs retrieved from the SGD using the default setting of‘‘Search for TFs’’ in the YEASTRACT database (http://www.yea-stract.co/). The TFBS per gene ranged from 0 to 58.

    2.6. Degree

    The degree indicates the number of protein interaction partnersof a certain protein. Interaction data were retrieved from the yeastgenome database (http://downloads.yeastgenome.org/curation/lit-erature/interaction_data.tab) and then filtered for physical interac-tions. igraph, an R-package for network analysis, was used to obtainthe degree of each protein.

    Fig. 1. (A) The relationship between essential and TATA gen

    2.7. Data for analysis

    Of the data obtained for 6717 S. cerevisiae genes from the yeastgenome database, complete information regarding essentiality, theTATA box, EL, the CAI, Fop, and TFBS was available for 5362 genes;therefore, these 5362 genes comprised the total data set for analy-sis. The source data are available in Dataset S1.

    2.8. Classification of genes

    Based on the relationship between the essential and TATAgenes, four groups of S. cerevisiae genes were classified as NENT,NET, ENT, or ET genes (Fig. 1A and B).

    2.9. k-core and excess retention (ER)

    The characteristics of the central vertices within a networkwere determined according to the ‘‘k-core’’, in which a sub-net-work obtained by a recursive pruning strategy is identified. ‘‘Excessretention (ER)’’ is defined as follows [17]:

    ERAk ¼ðNAk=NkÞðNA=NÞ

    where N, NA, NK, and NAK are the number of whole genes; the number

    of genes with a certain property, A, within the whole genes; the to-tal number of genes within the k-core; and the number of geneswith certain property, A, within the k-core, respectively. Of the5362 S. cerevisiae genes, only the 5210 with a degree of >1 in thePIN were used for plotting ER in a 100-core or less.

    2.10. Statistical analyses

    The two-tailed Fisher’s exact test (Fisher’s test) was used for theenrichment analysis of essential and TATA genes. The Shapiro–Wilk test was used for testing the normality of the distributionsof CAI, Fop, EL, TFBS and Degree. In the present study, becausethese variables did not follow normal distribution in any of thegene groups (Table S1 and S2), the Kruskal–Wallis test served asa non-parametric test with which to compare the four gene groups.The Wilcoxon rank sum test (Wilcoxon test) was also used for non-parametric comparisons, and Bonferroni’s correction was used tocorrect for multiple hypothesis testing. For the analysis of aminoacid usage, a two-tailed proportion test and a two-tailed Fisher’stest were used. For the GO enrichment analysis, we used on-linetools (http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl)within the SGD database to test for significant GO enrichment ofa given gene set in certain functional categories compared to the

    es, (B) the proportion of NENT, NET, ENT, and ET genes.

    http://downloads.yeastgenome.org/curation/calculated_protein_info/protein_properties.tabhttp://downloads.yeastgenome.org/curation/calculated_protein_info/protein_properties.tabhttp://mips.helmholtz-muenchen.de/genre/proj/yeast/Search/Catalogs/searchCatfirstDisruption.htmlhttp://mips.helmholtz-muenchen.de/genre/proj/yeast/Search/Catalogs/searchCatfirstDisruption.htmlhttp://mips.helmholtz-muenchen.de/genre/proj/yeast/Search/Catalogs/searchCatfirstDisruption.htmlhttp://bioinfo.mbb.yale.edu/genome/expression/translatome/Ref.txthttp://bioinfo.mbb.yale.edu/genome/expression/translatome/Ref.txthttp://www.yeastract.co/http://www.yeastract.co/http://downloads.yeastgenome.org/curation/literature/interaction_data.tabhttp://downloads.yeastgenome.org/curation/literature/interaction_data.tabhttp://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl

  • 446 H.W. Han et al. / FEBS Letters 587 (2013) 444–451

    set of whole genomes (6607 default background genomes) underthe assumption of hypergeometric distribution. A P-value

  • Fig. 3. The cumulative frequency distribution for (A) CAI and (B) EL for each of the four gene groups. Boxplots on that natural logarithm scale for (A) the CAI and (B) the EL foreach of the four gene groups. Values within parentheses indicate the median, (E) the proportion of the total EL made up by the four gene groups.

    H.W. Han et al. / FEBS Letters 587 (2013) 444–451 447

    contrast analyses showed that the cumulative frequency distribu-tion of CAIs for the ET genes stood out from that of the other groups(Fig. 3A), as ET genes had the highest average CAI (P = 9.637 � 10�5,P = 1.534 � 10�6, and P = 6.554 � 10�14 vs. NET, ENT, and NENTgenes, respectively; Wilcoxon test) (Fig. 3C). Another interestingfinding was the lack of difference between the median CAIs ofNET and ENT genes (P > 0.05; Wilcoxon test). A previous studydemonstrated that the CAIs of essential genes tend to be higherthan those of non-essential genes [6,11]. However, in the presentwork, the difference between the median CAIs of ENT genes andNET genes was not significant. In addition, NENT genes had a sig-nificantly lower median CAI than either NET or ENT genes(P < 2.2 � 10�16 and P < 2.2 � 10�16, respectively; Wilcoxon test).Similar results were obtained in an analysis of the Fop, which is an-other measure for codon bias (Fig. S1).

    Based on our analysis of the CAI and Fop, we expected that ET,NET and ENT genes would be highly expressed compared withNENT genes. The pattern of the cumulative frequency distributionsof the EL for each group was similar to that of the CAI (Fig. 3B).There was a significant EL difference between groups(P < 2.2 � 10�16; Kruskal–Wallis test). ET genes had the highestaverage EL (P = 3.283 � 10�9, P = 2.789 � 10�7, andP = 2.395 � 10�16 vs. NET, ENT, and NENT genes, respectively; Wil-coxon test) (Fig. 3D). The median EL of ENT genes was higher thanthat of NET genes (P = 0.0003; Wilcoxon test), whereas the medianEL of the NENT genes was significantly lower than the median EL ofeither NET or the ENT genes (P = 1.285 � 10�9 and P < 2.2 � 10�16,respectively; Wilcoxon test) (Fig. 3D). Notably, ET genes explainedapproximately 9% of total gene expression, despite comprising only2% of all the S. cerevisiae genes examined (Figs. 1B and 3E).

    In summary, the shared characteristics of ENT genes and NETgenes and the importance of ET genes were determined based on

    analyses of the CAIs and ELs of the four gene groups. Moreover,ET genes were shown to be under the highest selection pressure.

    3.3. The uniqueness of ET genes determined by investigating TFBS,Degree, and amino acid usage patterns

    Previous research has shown that TATA genes tend to be regu-lated by a larger number of transcription factors than non-TATAgenes [15]. Overall, essential genes are regulated by fewer tran-scription factors than non-essential genes [4]. In PINs, essentialgenes tend to encode hub proteins [10], whereas genes with highlyvariable expression levels tend to encode peripheral proteins [20].These observations provide indirect evidence that TATA genes arelikely to be on the periphery in PINs.

    The four gene groups were also subjected to a genomic charac-terization in which the TFBS for each group was determined(Fig. 4A, Table S2). The results showed that there is a significantgroup difference in the TFBS (P < 2.2 � 10�16; Kruskal–Wallis test).NET genes had the highest median TFBS (P = 4.838 � 10�6,P < 2.2 � 10�16, and P < 2.2 � 10�16 compared with ET, NENT andENT genes, respectively; Wilcoxon test) (Fig. 4A), and the medianTFBS was higher in ET genes than in NENT or ENT genes(P = 1.376 � 10�7 and P = 1.376 � 10�7, respectively; Wilcoxontest). Although the median TFBSs of ENT and NENT genes wereequal, statistical test showed that the TFBS of ENT genes is weaklylower than that of NENT genes (P = 0.0005; Wilcoxon test).

    Investigation of the Degree showed that there is a difference inDegree between groups (Table S2; P < 2.2 � 10�16; Kruskal–Wallistest). The median Degree was higher for ENT-encoded proteinsthan for proteins encoded by NET and NENT genes(P < 2.2 � 10�16 and P < 2.2 � 10�16, respectively; Wilcoxon test).The same was true for ET proteins (P < 2.2 � 10�16 and

  • Fig. 4. Boxplots on the natural logarithm scale showing the (A) TFBS and (B) Degree for each of the four gene groups, (C) the ER of the four gene groups according to k-core inthe PIN. The figure illustrates that the ENT and ET genes tend to encode hub proteins, whereas NET and NENT genes tend to encode peripheral proteins.

    448 H.W. Han et al. / FEBS Letters 587 (2013) 444–451

    P < 2.2 � 10�16, respectively; Wilcoxon test). By contrast, there wasno difference in Degree between ENT and ET proteins (P > 0.91;Wilcoxon test). The median Degree of NET-encoded proteins waslower than that of proteins encoded by any of the other genegroups (P < 2.2 � 10�16, P < 2.2 � 10�16, and P = 3.621 � 10�8 vs.ENT, ET and NENT proteins, respectively; Wilcoxon test). Plots ofthe excess retention of each gene group with k-core in the PIN sup-ported that essential genes tend to encode hub proteins, whereasnon-essential genes tend to encode proteins at the periphery,regardless of the presence of a TATA box (Fig. 4C).

    Gong et al. proposed that the amino acid usage patterns ofessential genes and those of non-essential genes would be signifi-cantly different [1]. This hypothesis was tested by plotting theusage patterns for the genes in the four gene groups based onthe overall frequency of use (%) of each of the 20 amino acids(Fig. 5A, Table S3). The highest usage frequencies in all gene groupswere determined for Ala, Val, Thr, Asp, Asn, Ile, Glu, Lys, Ser, andLeu (frequency, P5%). ET and NET genes also utilized Gly with highfrequency. The proportions of each amino acid were significantlydifferent in each of the four gene groups (Table S3; proportiontest).

    For the amino acid enrichment analysis for each group, a Fish-er’s test was used to examine the difference in amino acid usagebetween each gene group and the total genes (5365 genes) for eachof the 20 amino acids (Fig. 5B–F, Table S4). Based on the CAI and EL

    values, NET, ENT, and ET genes can be considered to be biologicallyimportant gene groups, whereas NENT genes are trivial. Accord-ingly, the difference in the amino acid usage patterns was deter-mined for important vs. trivial gene groups. Each plot was sortedwith respect to the odds ratios for each amino acid in the trivialgenes. A significant difference from the usage pattern in two genegroups was found (Fig. 5B). Odds ratio plots for the four genegroups are shown in Fig. 5C, but they were too complex to discerna pattern. Fig. 5D and E show odds ratio plots for NENT and ETgenes, and for NET and ENT genes, respectively. The amino acidusage pattern of ET genes followed the general trend of the impor-tant gene groups, with a few exceptions (Fig. 5D). By contrast, theamino acid usage patterns of NET and ENT genes did not follow thegeneral trend of the important genes. Furthermore, NET and ENTgenes showed opposing usage patterns for 14 amino acids (Gly,Glu, Thr, Asp, Lys, Leu, Tyr, Gln, Phe, Arg, Trp, Pro, Ser and Cys)(Fig. 5E). Thus, NET genes predominantly used Ala, Gly, Val, Thr,Tyr, Phe, Trp, Pro, Ser and Cys with relatively scarce usage of Glu,Asp, Lys, Leu, Gln, Arg, His, and Asn, whereas ENT genes predomi-nantly used Glu, Asp, Lys, Leu, Gln, and Arg with relatively scarceusage of Gly, Thr, Tyr, Phe, Trp, Pro, Ser, His, Cys, and Asn. BothAla and Val were the preferred amino acids of NET genes, but theirusage in ENT genes was not remarkable. His was the preferred ami-no acid in ENT genes, but not in NET genes. Asn showed depletionin both NET and ENT genes. Neither of these gene groups showed

  • Fig. 5. Amino acid usage patterns for the four gene groups. (A) The percentage usage of each amino acid by the four gene groups, (B–F) plots of the odds ratios obtained from aFisher’s test comparing the usage of each amino acid by each gene group with that of the background genome (5365 genes). Each plot is sorted with respect to the odds ratiosfor each amino acid in NENT genes, (B) odds ratio plots for the use of each amino acid in the important genes (ET, NET, and ENT genes) and the trivial genes (NENT genes), (C)for each of the four gene groups, (D) for NENT and ET genes, (E) for NET and ENT genes, and (F) for both ENT and NET genes, and ET genes. Error bars indicate the 95%confidence interval. The star, diamond, and triangle indicate P < 0.001, P < 0.01, and P < 0.05, respectively.

    H.W. Han et al. / FEBS Letters 587 (2013) 444–451 449

    preference or depletion for Met or Ile (Fig. 5E). Notably, most of theamino acids preferred by ENT genes were polar, with a net charge(except for Leu and Gln), whereas most of the amino acids pre-ferred by NET genes were non-polar or polar, with no net charge.The combined amino acid usage pattern by NET and ENT geneswas similar to that of ET genes (Fig. 5F).

    In summary, based on TFBS, Degree, and amino acid usage pat-tern, an opposing relationship between NET and ENT genes wasidentified. ET genes, however, are unique in that they displayedcharacteristics of both ENT and NET genes.

    3.4. Identification of the functions of the genes involved in essentialstress response

    As their name implies, essential genes are indispensable for thesurvival of the organism. These genes are typically involved in fun-damental biological processes, so-called ‘‘housekeeping functions’’,such as cell wall and membrane biogenesis, ribosome biosynthesis,and DNA replication [21]. TATA genes are involved in functions re-lated to wound healing, inflammatory response, and response toexternal stimuli [22]. However, the functions of ET genes in termsof survival and the stress response are unclear.

    We therefore performed a GO-enrichment analysis of molecularfunction and biological processes to investigate the functions en-coded by ET, NET, and ENT genes (Fig. 6, Table 1, Table 2, DatasetS2–3). For ENT, NET, and ET genes, 88, 31, and 4 enriched GO termswere obtained, respectively. Of the four GO terms for molecularfunctions encoded by ET genes, two overlapped with those ofENT genes, and one overlapped with one of the NET functions. Aunique GO term in ET genes is ‘‘unfolded protein binding’’, whichis related to chaperone activity and the binding of unfolded pro-teins of the endoplasmic reticulum (ER) in a process called ERstress. GO enrichment analysis of biological processes yielded235, 68, and 31 enriched GO terms for ENT, NET, and ET genes,respectively. Of the 31 GO terms for biological processes encodedby ET genes, nine overlapped with those of ENT genes, and elevenoverlapped with those of NET genes. Eleven GO terms were uniqueto ET genes; these were related to ‘‘glycolysis’’, ‘‘alcohol-relatedprocess’’ and ‘‘steroid-related process’’.

    4. Discussion

    Over the last few decades, knockout techniques and computa-tional methods have been used extensively to characterize

  • Fig. 6. GO enrichment analysis of ENT, NET, and ET genes. The number of GO termsenriched in the enrichment analysis of the (A) molecular function and (B) biologicalprocess of GO.

    Table 1GO enrichment analysis of molecular function for ET genes.

    GOID GO term P-value Co⁄

    16772 Transferase activity, transferring phosphorus-containing groups

    0.00108 ENT

    51082 Unfolded protein binding 0.00639 –16491 Oxidoreductase activity 0.03227 NET

    5515 Protein binding 0.03268 NET

    Co⁄ indicates co-occurrence.

    Table 2GO enrichment analysis of biological process for ET genes.

    GOID GO term P-value Co⁄

    9987 Cellular process 1.39E-14 ENT44237 Cellular metabolic process 4.63E-11 ENT

    8152 Metabolic process 1.95E-10 ENT44238 Primary metabolic process 4.91E-09 ENT44249 Cellular biosynthetic process 3.56E-08 ENT

    9058 Biosynthetic process 6.33E-08 ENT44283 Small molecule biosynthetic process 2.73E-06 NET44281 Small molecule metabolic process 1.15E-05 NET

    6066 Alcohol metabolic process 5.22E-05 NET46165 Alcohol biosynthetic process 5.92E-05 NET

    6096 Glycolysis 0.0005 –6007 Glucose catabolic process 0.00111 NET

    16129 Phytosteroid biosynthetic process 0.00174 –44108 Cellular alcohol biosynthetic process 0.00174 –

    6696 Ergosterol biosynthetic process 0.00174 –16128 Phytosteroid metabolic process 0.00267 –

    8204 Ergosterol metabolic process 0.00267 –19320 Hexose catabolic process 0.00294 NET

    8610 Lipid biosynthetic process 0.00407 ENT16126 Sterol biosynthetic process 0.0048 –44107 Cellular alcohol metabolic process 0.0048 –

    6694 Steroid biosynthetic process 0.0048 –46365 Monosaccharide catabolic process 0.00521 NET19318 Hexose metabolic process 0.00682 NET34641 Cellular nitrogen compound metabolic process 0.0076 ENT

    6807 Nitrogen compound metabolic process 0.01095 ENT46164 Alcohol catabolic process 0.0126 NET

    6006 Glucose metabolic process 0.01663 NET5996 Monosaccharide metabolic process 0.02249 NET

    16125 Sterol metabolic process 0.0303 –8202 Steroid metabolic process 0.0303 –

    Co⁄ indicates co-occurrence.

    450 H.W. Han et al. / FEBS Letters 587 (2013) 444–451

    essential genes and TATA genes. However, research that is focusedsolely on the essential genes required for survival or the role ofTATA genes in the stress response is not sufficient to fully under-stand the global evolutionary mechanisms of biological systems.In this study, we characterized the relationship between essentialand TATA genes, identified ET genes as essential stress responsegenes, and discovered the potential functions of each group ofgenes based on the analyses of CAI, Fop, EL, TFBS, Degree and GOfunctions. The present investigation clearly supports the impor-tance and uniqueness of ET genes. We were also able to investigatethe shared CAI and EL and distinct TFBS, Degree and amino acidusage patterns between ET and NET/ENT genes. The unique ETGO function ‘‘unfolded protein binding’’ is related to chaperoneactivity and ER stress. ‘‘Glycolysis’’ is an essential process that ex-tracts energy from glucose in both aerobic and anaerobic organ-isms. Both ‘‘unfolded protein binding’’ and ‘‘glycolysis’’ areconserved functions among all eukaryotic organisms [23,24]. Inhumans, a collapse of the ER stress response or glycolytic pathwayhas been implicated in various diseases, such as diabetes, neurode-generative diseases, cancer and heart disease [25–27]. Notably, S.cerevisiae appears to have developed an alcohol-related process

    as an essential stress response for alcoholic fermentation [28]. Asteroid-related process was related with triggering a general stressresponse [29].

    However, some questions remain unanswered. It is importantto investigate the relationship between each of the identified func-tions and the essential stress response. Another possibility for fu-ture work is to identify the differences within the core promoterelements of the 63% non-essential and 18% essential TATA-lessgenes. Additionally, although the present study addressed the evo-lutionary pressure of essential and TATA genes through parameterssuch as CAI and EL, other parameters, such as the number of phys-ical and genetic protein interactions, the fitness consequences ofgene knockout, the sequence length and the ‘‘age of the gene’’[30] are important evolutionary determinants. We anticipate thatthese parameters will become important future topics in genomics.

    Finally, these findings should contribute to elucidating the evo-lutionary and functional mechanisms of a biological system,including the genesis of various diseases, such as diabetes, cancers,and neurodegenerative diseases.

    Acknowledgments

    We thank Dr. Hyun-Seob Lee, Dr. Chul Kim, and Dr. Ki Won Seo,who participated in discussions of this work and offered commentsas members of the Moon group. This work was supported by TheKorea Science and Engineering Foundation (Grant numbers 2011-0029342 and 2011-0013280).

    Appendix A. Supplementary data

    Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.febslet.2012.12.030.

    http://dx.doi.org/10.1016/j.febslet.2012.12.030http://dx.doi.org/10.1016/j.febslet.2012.12.030

  • H.W. Han et al. / FEBS Letters 587 (2013) 444–451 451

    References

    [1] Gong, X., Fan, S., Bilderbeck, A., Li, M., Pang, H. and Tao, S. (2008) Comparativeanalysis of essential genes and nonessential genes in Escherichia coli K12. Mol.Genet. Genomics 279, 87–94.

    [2] Hillenmeyer, M.E. et al. (2008) The chemical genomic portrait of yeast:uncovering a phenotype for all genes. Science 320, 362–365.

    [3] Gustafson, A.M., Snitkin, E.S., Parker, S.C., DeLisi, C. and Kasif, S. (2006)Towards the identification of essential genes using targeted genomesequencing and comparative analysis. BMC Genomics 7, 265.

    [4] Acencio, M.L. and Lemke, N. (2009) Towards the prediction of essential genesby integration of network topology, cellular localization and biological processinformation. BMC Bioinformatics 10, 290.

    [5] Wang, G.Z., Lercher, M.J. and Hurst, L.D. (2011) Transcriptional coupling ofneighboring genes and gene expression noise: evidence that gene orientationand noncoding transcripts are modulators of noise. Genome Biol. Evol. 3, 320–331.

    [6] Fang, G., Rocha, E. and Danchin, A. (2005) How essential are nonessentialgenes? Mol. Biol. Evol. 22, 2147–2156.

    [7] Jordan, I.K., Rogozin, I.B., Wolf, Y.I. and Koonin, E.V. (2002) Essential genes aremore evolutionarily conserved than are nonessential genes in bacteria.Genome Res. 12, 962–968.

    [8] Koonin, E.V. (2005) Systemic determinants of gene evolution and function.Mol. Syst. Biol. 1 (2005), 0021.

    [9] Wu, X. et al. (2010) Computational identification of rare codons of Escherichiacoli based on codon pairs preference. BMC Bioinformatics 11, 61.

    [10] Jeong, H., Mason, S.P., Barabasi, A.L. and Oltvai, Z.N. (2001) Lethality andcentrality in protein networks. Nature 411, 41–42.

    [11] Theis, F.J., Latif, N., Wong, P. and Frishman, D. (2011) Complex principalcomponent and correlation structure of 16 yeast genomic variables. Mol. Biol.Evol. 28, 2501–2512.

    [12] Basehoar, A.D., Zanton, S.J. and Pugh, B.F. (2004) Identification and distinctregulation of yeast TATA box-containing genes. Cell 116, 699–709.

    [13] Lopez-Maury, L., Marguerat, S. and Bahler, J. (2008) Tuning gene expression tochanging environments: from rapid responses to evolutionary adaptation.Nat. Rev. Genet. 9, 583–593.

    [14] Dikstein, R. (2011) The unexpected traits associated with core promoterelements. Transcription 2, 201–206.

    [15] Tirosh, I., Weinberger, A., Carmi, M. and Barkai, N. (2006) A genetic signatureof interspecies variations in gene expression. Nat. Genet. 38, 830–834.

    [16] Greenbaum, D., Jansen, R. and Gerstein, M. (2002) Analysis of mRNAexpression and protein abundance data: an approach for the comparison ofthe enrichment of features in the cellular population of proteins andtranscripts. Bioinformatics 18, 585–596.

    [17] Wuchty, S. and Almaas, E. (2005) Peeling the yeast protein network.Proteomics 5, 444–449.

    [18] Hershberg, R. and Petrov, D.A. (2008) Selection on codon bias. Annu. Rev.Genet. 42, 287–299.

    [19] Sharp, P.M. and Li, W.H. (1987) The codon adaptation index – a measure ofdirectional synonymous codon usage bias, and its potential applications.Nucleic Acids Res. 15, 1281–1295.

    [20] Zhou, L., Ma, X. and Sun, F. (2008) The effects of protein interactions, geneessentiality and regulatory regions on expression variation. BMC Syst. Biol. 2,54.

    [21] Giaever, G. et al. (2002) Functional profiling of the Saccharomyces cerevisiaegenome. Nature 418, 387–391.

    [22] Moshonov, S., Elfakess, R., Golan-Mashiach, M., Sinvani, H. and Dikstein, R.(2008) Links between core promoter and basic gene features influence geneexpression. BMC Genomics 9, 92.

    [23] Ron, D. and Walter, P. (2007) Signal integration in the endoplasmic reticulumunfolded protein response. Nat. Rev. Mol. Cell Biol. 8, 519–529.

    [24] Chandra, F.A., Buzi, G. and Doyle, J.C. (2011) Glycolytic oscillations and limitson robust efficiency. Science 333, 187–192.

    [25] Yoshida, H. (2007) ER stress and diseases. FEBS J. 274, 630–658.[26] Yeh, C.S., Wang, J.Y., Chung, F.Y., Lee, S.C., Huang, M.Y., Kuo, C.W., Yang, M.J.

    and Lin, S.R. (2008) Significance of the glycolytic pathway and glycolysisrelated-genes in tumorigenesis of human colorectal cancers. Oncol. Rep. 19,81–91.

    [27] Leyva, F., Wingrove, C.S., Godsland, I.F. and Stevenson, J.C. (1998) Theglycolytic pathway to coronary heart disease: a hypothesis. Metabolism 47,657–662.

    [28] Ding, J., Huang, X., Zhang, L., Zhao, N., Yang, D. and Zhang, K. (2009) Toleranceand stress response to ethanol in the yeast Saccharomyces cerevisiae. Appl.Microbiol. Biotechnol. 85, 253–263.

    [29] Prasad, R., Devaux, F., Dhamgaye, S. and Banerjee, D. (2012) Response ofpathogenic and non-pathogenic yeasts to steroids. J. Steroid Biochem. Mol.Biol. 129, 61–69.

    [30] Vishnoi, A., Kryazhimskiy, S., Bazykin, G.A., Hannenhalli, S. and Plotkin, J.B.(2010) Young proteins experience more variable selection pressures than oldproteins. Genome Res. 20, 1574–1581.

    Genome-wide characterization of the relationship between essential and TATA-containing genes1 Introduction2 Materials and methods2.1 S. cerevisiae genes, amino acid sequences, CAI and Fop2.2 Essential genes2.3 TATA genes2.4 El2.5 Tfbs2.6 Degree2.7 Data for analysis2.8 Classification of genes2.9 k-core and excess retention (ER)2.10 Statistical analyses

    3 Results3.1 The relationship between essential genes and TATA genes and the identification of essential stress response genes3.2 Determination of the importance of ET genes based on the CAI and EL3.3 The uniqueness of ET genes determined by investigating TFBS, Degree, and amino acid usage patterns3.4 Identification of the functions of the genes involved in essential stress response

    4 DiscussionAcknowledgmentsAppendix A Supplementary dataReferences


Recommended