+ All Categories
Home > Documents > GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

Date post: 30-Sep-2016
Category:
Upload: jing-zhu
View: 213 times
Download: 0 times
Share this document with a friend
27
BioMed Central Page 1 of 27 (page number not for citation purposes) BMC Genomics Open Access Software GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology Jing Zhu 1 , Jing Wang 1 , Zheng Guo* 1,2 , Min Zhang 1 , Da Yang 1 , Yanhui Li 1 , Dong Wang 1 and Guohua Xiao 1 Address: 1 Department of Bioinformatics, Harbin Medical University, Harbin 150086, China and 2 Department of Pharmacology and Bio- pharmaceutical Key Laboratory of Heilongjiang Province and State, Harbin Medical University, Harbin 150086, China Email: Jing Zhu - [email protected]; Jing Wang - [email protected]; Zheng Guo* - [email protected]; Min Zhang - [email protected]; Da Yang - [email protected]; Yanhui Li - [email protected]; Dong Wang - [email protected]; Guohua Xiao - [email protected] * Corresponding author Abstract Background: Rapid progress in high-throughput biotechnologies (e.g. microarrays) and exponential accumulation of gene functional knowledge make it promising for systematic understanding of complex human diseases at functional modules level. Based on Gene Ontology, a large number of automatic tools have been developed for the functional analysis and biological interpretation of the high-throughput microarray data. Results: Different from the existing tools such as Onto-Express and FatiGO, we develop a tool named GO-2D for identifying 2-dimensional functional modules based on combined GO categories. For example, it refines biological process categories by sorting their genes into different cellular component categories, and then extracts those combined categories enriched with the interesting genes (e.g., the differentially expressed genes) for identifying the cellular-localized functional modules. Applications of GO-2D to the analyses of two human cancer datasets show that very specific disease-relevant processes can be identified by using cellular location information. Conclusion: For studying complex human diseases, GO-2D can extract functionally compact and detailed modules such as the cellular-localized ones, characterizing disease-relevant modules in terms of both biological processes and cellular locations. The application results clearly demonstrate that 2-dimensional approach complementary to current 1-dimensional approach is powerful for finding modules highly relevant to diseases. Background It is widely accepted that functionally related genes tend to express and perform their highly concerted cellular functions in some isolated and interactive modular fash- ions [1,2]. Global gene expression data have provided an opportunity for understanding the transcriptional modu- larity characterizing complex diseases [3-6]. For example, Mootha et al. [6] showed that the coordinate disease-asso- ciated changes of a set of functionally related genes could be identified even when the expression of individual genes changes modestly. Segal et al. [3] defined 'modules' as gene sets that are conditionally activated or repressed across a wide variety of cancer types, and identified some modules deregulated in cancer. Our recent study demon- Published: 24 January 2007 BMC Genomics 2007, 8:30 doi:10.1186/1471-2164-8-30 Received: 4 August 2006 Accepted: 24 January 2007 This article is available from: http://www.biomedcentral.com/1471-2164/8/30 © 2007 Zhu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Transcript
Page 1: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BioMed CentralBMC Genomics

ss

Open AcceSoftwareGO-2D: identifying 2-dimensional cellular-localized functional modules in Gene OntologyJing Zhu1, Jing Wang1, Zheng Guo*1,2, Min Zhang1, Da Yang1, Yanhui Li1, Dong Wang1 and Guohua Xiao1

Address: 1Department of Bioinformatics, Harbin Medical University, Harbin 150086, China and 2Department of Pharmacology and Bio-pharmaceutical Key Laboratory of Heilongjiang Province and State, Harbin Medical University, Harbin 150086, China

Email: Jing Zhu - [email protected]; Jing Wang - [email protected]; Zheng Guo* - [email protected]; Min Zhang - [email protected]; Da Yang - [email protected]; Yanhui Li - [email protected]; Dong Wang - [email protected]; Guohua Xiao - [email protected]

* Corresponding author

AbstractBackground: Rapid progress in high-throughput biotechnologies (e.g. microarrays) andexponential accumulation of gene functional knowledge make it promising for systematicunderstanding of complex human diseases at functional modules level. Based on Gene Ontology, alarge number of automatic tools have been developed for the functional analysis and biologicalinterpretation of the high-throughput microarray data.

Results: Different from the existing tools such as Onto-Express and FatiGO, we develop a toolnamed GO-2D for identifying 2-dimensional functional modules based on combined GOcategories. For example, it refines biological process categories by sorting their genes into differentcellular component categories, and then extracts those combined categories enriched with theinteresting genes (e.g., the differentially expressed genes) for identifying the cellular-localizedfunctional modules. Applications of GO-2D to the analyses of two human cancer datasets showthat very specific disease-relevant processes can be identified by using cellular location information.

Conclusion: For studying complex human diseases, GO-2D can extract functionally compact anddetailed modules such as the cellular-localized ones, characterizing disease-relevant modules interms of both biological processes and cellular locations. The application results clearlydemonstrate that 2-dimensional approach complementary to current 1-dimensional approach ispowerful for finding modules highly relevant to diseases.

BackgroundIt is widely accepted that functionally related genes tendto express and perform their highly concerted cellularfunctions in some isolated and interactive modular fash-ions [1,2]. Global gene expression data have provided anopportunity for understanding the transcriptional modu-larity characterizing complex diseases [3-6]. For example,

Mootha et al. [6] showed that the coordinate disease-asso-ciated changes of a set of functionally related genes couldbe identified even when the expression of individualgenes changes modestly. Segal et al. [3] defined 'modules'as gene sets that are conditionally activated or repressedacross a wide variety of cancer types, and identified somemodules deregulated in cancer. Our recent study demon-

Published: 24 January 2007

BMC Genomics 2007, 8:30 doi:10.1186/1471-2164-8-30

Received: 4 August 2006Accepted: 24 January 2007

This article is available from: http://www.biomedcentral.com/1471-2164/8/30

© 2007 Zhu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 27(page number not for citation purposes)

Page 2: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

strated that based on functional modules, i.e., GO catego-ries enriched with differentially expressed genes (DEGs),cancer types can be precisely and robustly classified bysupervised classification analysis [5] or discovered by clus-tering analysis [7].

For high-throughput microarray data analysis, translatinglists of interesting genes (e.g., DEGs) into functional mod-ules for understanding the biological phenomena hasbecome an important routine task. Based on Gene Ontol-ogy, a large number of tools such as Onto-Express [8],FatiGO [9], GoMiner [10] and GOstat [11] have beendeveloped for this purpose. However, most existingapproaches interpret the interesting genes using categoriesfrom three ontologies "biological process" (BP), "molecu-lar function" (MF) and "cellular component" (CC) sepa-rately, which may be inefficient for mapping some specificmodular activities in cells. For example, a GO BP categoryusually encompasses the genes involved in distinct proc-esses occurring in different cellular compartments [12],and the genes even within a same process may show aclear expression distinction with respect to their cellularlocalizations [13]. Therefore, in this paper, by combiningcategories from BP, CC, and MF, we propose GO-2D as atool for finding 2-dimensional functional modules (e.g.,the cellular-localized modules) for studying complexhuman diseases.

We use two cancer datasets for numerical analysis, and theresults show that with the same FDR (false discovery rate)criteria, many specific processes relevant to diseases can-not be found until additionally cellular location informa-tion is used. The results clearly demonstrate theinsufficiency of current 1-dimensional approaches andhighlight the importance of using 2-dimensional modulesfor disease analysis.

ImplementationGO-2D has been implemented in JAVA and intercon-nected to a relational database system by using MS-Access2000 for Windows version and SQLite for Linux version.

DatabaseIn GO-2D, associations of gene IDs from different organ-isms (including Homo sapiens, Drosophila melanogaster,Caenorhabditis elegans, and Saccharomyces cerevisiae) to GOterms are based on the databases Gene, SGD, FlyBase, andWormBase. Tables relating GO terms with gene IDs can befound in the NCBI web page [14] and GO Consortiumweb page [15]. The Unigene build #190 is used in GO-2D.

Analysis and visualizationData analysis is made flexible by subdividing the proce-dure into sequential steps:

(1) Import data: GO-2D starts by reading the input filescontaining reference and interesting gene lists (see Figure1). It queries the genes by using Entrez Gene and Unigenefor human and organism specific IDs in GO for the otherthree species (Drosophila melanogaster, Caenorhabditis ele-gans, and Saccharomyces cerevisiae).

(2) Cross annotation: GO-2D refines a BP category bysorting its genes into different CCs to form combined cat-egories for finding cellular-localized modules enrichedwith the interesting genes (see Figure 2). It also providesthe other 2-dimensional combinations of categories fromthe three ontologies (BP, MF, and CC).

(3) Filter data: GO-2D provides options for finding gen-eral or specific combined categories by determining theirsizes (the minimum/maximum numbers of includedgenes) and/or depths in GO.

(4) Statistic test: GO-2D calculates the probability of acombined category having the annotated number of inter-esting genes by random chance, based on hypergeometricor binomial statistical model [8], which is named "theobserved p value".

(5) Multiple tests correction: GO-2D offers Bonferronicorrection and FDR control [16] for multiple statisticaltests, the results are shown as "the corrected p value".When a total of n combined categories are tested, for theBonferroni correction, the corrected p value is pn, while pis the observed p value. For the FDR control, let p (k)denote the k-th smallest observed p value in a total of ncombined categories, then the FDR fk for hypothesis k isbounded by np(k)/k ≤ fk. If an FDR of f is required for theentire experiment, all hypotheses that satisfy p(k) ≤ fk/nare declared as significant. The corrected p value for the k-th smallest observed p value is np(k)/k. GO-2D can alsooutput all the observed p values, which can be used forother complicated multiple tests correction by many otherexisting tools such as the program for Storey's Q value[17].

(6) Results: GO-2D allows users to save the results fordetailed examination of the identified modules. The tabu-lar results collect the following information of a com-bined category: GO IDs, names and depths of categories(e.g. both BP and CC), numbers of genes and interestinggenes annotated in it, the observed p values, and the cor-rected p values for multiple tests of the combined catego-ries.

(7) Results visualization: GO-2D also provides tree viewto visualize the 2-dimensional modules (e.g. BP and CC).GO-2D firstly displays the primary categories (e.g. BP,user defined) in the primary tree, and then in the second-

Page 2 of 27(page number not for citation purposes)

Page 3: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 3 of 27(page number not for citation purposes)

A snapshot of GO-2D: the main user interfaceFigure 1A snapshot of GO-2D: the main user interface.

Page 4: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

ary tree, shows the sub-hierarchical structure of the sec-ondary categories (e.g. CC) within each primary category(see Figure 3). The user can select either BP or CC as the"primary tree" for visualization. The selection has noeffect on the calculation of over-represented categories.

(8) Redundancy treatment: GO-2D suggests an empiricalway to reduce the redundancy among the resulting 2-

dimensional modules identified in the hierarchical struc-ture of GO. When some modules share a same primarycategory in the primary tree (e.g. BP), GO-2D focuses onthe combined category containing the most specific sec-ondary category in the secondary tree (e.g. CC).

Details are described in the Additional file 1 (Figure 6, 7,8, 9, 10, 11, 12, 13, 14, 15–Figure 16). Furthermore, GO-

A snapshot of GO-2D: the processing pageFigure 2A snapshot of GO-2D: the processing page.

Page 4 of 27(page number not for citation purposes)

Page 5: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

2D provides additional standalone software GODAG forvisualizing the user selected GO category groups byDirected Acyclic Graph (DAG). In the same DAG, it can

show several groups of GO categories marked with differ-ent colours, which facilities visual comparisons for themodules identified by different methods (See details in

A snapshot of GO-2D: the results pageFigure 3A snapshot of GO-2D: the results page.

Page 5 of 27(page number not for citation purposes)

Page 6: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Additional file 2, Figure 17, 18, 19, 20, 21, 22–Figure 23).Also, GO-2D provides another additional standalone toolConfusionMatrix for comparing the resulting categoriesidentified by 1- and 2-dimensional approaches in GO-2D(see details in Additional file 3, Figure 24, Figure 25).

Related software comparisonA recent study [18] has made a detailed comparison of 14tools for ontological analysis of microarray data. Table 1compares GO-2D to some typical ones. We highlight that

using combined categories for analysis is unique to GO-2D.

ResultsBased on the three GO ontologies (Biological_Process,Cellular_Component and Molecular_Function) sepa-rately, similar as other tools, GO-2D can also find 1-dimensional modules enriched with interesting genes.Because of the multiple tests problem, the observed pvalue criterion is not justified for comparison, we thus use

Comparison between 1-dimensional and 2-dimensional modules in breast cancerFigure 4Comparison between 1-dimensional and 2-dimensional modules in breast cancer. (A) Horizontal axis represents BP categories ranked by their depths in the BP ontology. Vertical axis represents CC categories ranked by their depths in the CC ontology. The thick blue lines represent the 1-dimensional modules and the red squares represent the 2-dimensional mod-ules. (B) In the confusion matrix, we show the numbers of BP categories which are present in both 1-dimensional and 2-dimen-sional modules; present in 1-dimensional but absent in 2-dimensional modules; absent in 1-dimensional but present in 2-dimensional modules.

Page 6 of 27(page number not for citation purposes)

Page 7: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

the same FDR criterion [16] to compare the powers of theapproaches to finding 1- and 2-dimensional modules.

DatasetsThe breast cancer dataset contains 20849 genes measuredon 21 invasive lobular carcinoma (ILC) and 38 invasiveductal carcinoma (IDC) samples [19]. The gastric cancerdataset contains 20152 genes measured for 103 gastrictumours and 29 normal gastric specimens [20]. Followingthe pre-processing protocol proposed by Dudoit et al.[21], we eliminate the genes with missing data in morethan 5% arrays, apply a base 2 logarithmic transformationfor the remaining expression values, and impute the miss-ing values with zeros. Each experiment is normalized to

zero median across the genes. The breast and gastric can-cer data finally comprise 8575 and 13037genes (EntrezGene) respectively, of which 318 and 3388 are differen-tially expressed genes (DEGs) identified by t-test with FDR1%, calculated by BRB ArrayTools [22].

ParametersThe parameters are set as following:

(1) Hypergeometric distribution

(2) FDR = 0.1

Comparison between 1-dimensional and 2-dimensional modules in gastric cancerFigure 5Comparison between 1-dimensional and 2-dimensional modules in gastric cancer. (A) Horizontal axis represents BP categories ranked by their depths in the BP ontology. Vertical axis represents CC categories ranked by their depths in the CC ontology. The thick blue lines represent the 1-dimensional modules and the red squares represent the 2-dimensional mod-ules. (B) In the confusion matrix, we show the numbers of BP categories which are present in both 1-dimensional and 2-dimen-sional modules; present in 1-dimensional but absent in 2-dimensional modules; absent in 1-dimensional but present in 2-dimensional modules.

Page 7 of 27(page number not for citation purposes)

Page 8: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 8 of 27(page number not for citation purposes)

A snapshot of GO-2D: depth selectionFigure 6A snapshot of GO-2D: depth selection.

Page 9: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

(

3) MIN Gene Num 3; MAX Gene Num 150

(4) BP depth = Leaf Categories

(5) CC depth = 1 (for finding 1-dimensional modulesonly based on BP), or CC depth>= 1 (for finding 2-dimen-sional cellular-localized functional modules)

(6) Reduce redundancy

For breast cancer (318 interesting genes and 8575 refer-ence genes), it takes about 9 min and 12 min for 1-dimen-sional and 2-dimensional analysis respectively, with thesame computer (CPU: 2.8 GHz and Memory: 1 GB). Forgastric cancer (3388 interesting genes and 13037 referencegenes), it takes about 16 min and 22 for 1-dimensionaland 2-dimensional analysis, respectively.

Comparison of modules for breast cancerWith the statistical criterion FDR ≤ 0.1, we find eight cel-lular-localized modules, and two 1-dimensional modules

A snapshot of GO-2D: depth filterFigure 7A snapshot of GO-2D: depth filter.

A snapshot of GODAG: the main user interfaceFigure 17A snapshot of GODAG: the main user interface.

Page 9 of 27(page number not for citation purposes)

Page 10: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

based on BP only. As shown in Figure 4 and described inTable 2 (the details of genes in each module are shown inAdditional file 4), we can find that, in addition to the bio-

logical processes appeared in the two 1-dimensional mod-ules, the 2-dimensional approach discovers some newspecific processes relevant to disease. For example, the

A snapshot of GO-2D: primary tree viewFigure 8A snapshot of GO-2D: primary tree view.

Page 10 of 27(page number not for citation purposes)

Page 11: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 11 of 27(page number not for citation purposes)

A snapshot of GO-2D: secondary tree viewFigure 9A snapshot of GO-2D: secondary tree view.

Page 12: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 12 of 27(page number not for citation purposes)

A snapshot of GO-2D: gene information (Entrez gene)Figure 10A snapshot of GO-2D: gene information (Entrez gene).

Page 13: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 13 of 27(page number not for citation purposes)

A snapshot of GO-2D: gene information (UniGene)Figure 11A snapshot of GO-2D: gene information (UniGene).

Page 14: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 14 of 27(page number not for citation purposes)

A snapshot of GO-2D: unknown gene (Entrez gene)Figure 12A snapshot of GO-2D: unknown gene (Entrez gene).

Page 15: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 15 of 27(page number not for citation purposes)

A snapshot of GO-2D: unknown gene (UniGene)Figure 13A snapshot of GO-2D: unknown gene (UniGene).

Page 16: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 16 of 27(page number not for citation purposes)

A snapshot of GO-2D: FDR controlFigure 14A snapshot of GO-2D: FDR control.

Page 17: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 17 of 27(page number not for citation purposes)

A snapshot of GO-2D: corrected p value (bonferroni)Figure 15A snapshot of GO-2D: corrected p value (bonferroni).

Page 18: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 18 of 27(page number not for citation purposes)

A snapshot of GO-2D: observed p value (no correction is selected)Figure 16A snapshot of GO-2D: observed p value (no correction is selected).

Page 19: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

biological process 'lipid biosynthesis' is discovered in thecellular-localized module 'lipid biosynthesis & mitochon-drion'. Using cellular location information, we find that

there are three DEGs among the 10 measured genes thatare annotated in this cellular-localized module, and theobserved p-value is 0.005 (FDR = 4.8%). However, whenwe do not use cellular location information, we find fourDEGs among the 108 measured genes that are annotatedin 'lipid biosynthesis', and the observed p-value is only0.57 (FDR = 65.6%). This example clearly demonstratesthat finding cellular-localized modules is a usefulapproach to detecting additional disease relevant mod-ules.

A cellular-localized module identified is "BP: oxygen andreactive oxygen species metabolism" in "CC: extracellularregion part". Oxidative stress (generating reactive oxygenspecies) has been linked to cancer initiation and progres-sion. It has been suggested [23] that G. lucidum inhibitsthe oxidative stress-induced invasive behavior of breastcancer cells by modulating extracellular signal-regulatedprotein kinases signalling.

For the cellular-localized module "BP: lipid biosynthesis"in "CC: mitochondrial", Zhao, et al. [19] suggested thatlipid/fatty acid metabolism may be partially responsiblefor different proliferation rates of tumor cells in ILCs andIDCs. In addition, mtDNA polymorphisms may be under-appreciated factors in breast carcinogenesis [24].

The third example is "BP: G-protein coupled receptor pro-tein signalling pathway" in "CC: integral to plasma mem-brane", Holland JD et al [25] showed that CXCR4 issubject to controlled regulation in breast cancer cells viadifferential G protein-receptor complex formation, andthis regulation may play a role in the transition from non-metastatic to malignant tumors.

The last example is for the "BP: complement activation" in"CC: extracellular region". Caragine TA et al. provideddirect in vivo evidence that an inhibitor of complementactivation can facilitate breast tumor growth by modulat-ing C3 deposition [26].

Comparison of modules for gastric cancerWith the statistical criterion FDR ≤ 0.1, we find four 1-dimensional modules when based on BP only, and thir-teen cellular-localized modules. In addition, as shown inFigure 5 and described in Table 3 (the details of genes ineach module are shown in Additional file 4), the 2-dimen-sional approach detects new disease relevant biologicalprocesses combined with the cellular-localization infor-mation.

For example, for the cellular-localized functional module"BP: negative regulation of cell proliferation" in "CC:cytoplasm", Li X et al. [27] suggested that TGF-beta1affects both proliferation and apoptosis of gastric cancer

A snapshot of GODAG: input pageFigure 18A snapshot of GODAG: input page.

Page 19 of 27(page number not for citation purposes)

Page 20: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

cells through the regulation of p15 and p21, and inducestransient expression of Smad 7 as a negative feedbackmodulation of TGF-beta1 signal.

Another example is the module "BP: cell cycle arrest" in"CC: cytoplasm". Zheng JY et al. showed that p27 (KIP1)can lead to apoptosis in gastric carcinoma cells [28].

Furthermore, for the same BP, the cellular-localized func-tional modules are described by additional localizationinformation. For example, for the two cellular-localizedmodules, "BP: rRNA processing" in "CC: protein com-plex" and "BP: rRNA processing" in "CC: nucleolus", ithas been shown [29] that by reducing the occupancy ofthe SL1 complex subunits on the rRNA gene promoter andinducing dissociation of the SL1 complex subunits, thetranscription of rRNAs is controlled by tumor suppressorPTEN. In addition, a strong correlation has been observedbetween Nucleolar Organizer regions (loops of DNAencoding ribosomal RNA) counts and metastasis as wellas the microscopic type of the gastric carcinoma [30].

DiscussionWhen selecting modules from thousands of categorieshierarchically structured in GO, the main difficulty is toset statistical significance threshold accounting for themultiplicity of testing. For multiple tests problem, GO-2Dadopts the standard methods of Bonferroni correctionand FDR control [16,31], which are usually conservativefor the non-independent categories organized in ontolo-gies. It has been suggested that re-sampling simulationsmight be the most reliable way for selecting the significantmodules from thousands of categories organized in GO[32]. However, numerical simulations usually suffer fromheavy computation burden, and more efficient and feasi-ble re-sampling algorithms deserve further studies [32].GO-2D outputs the observed p values for the combinedcategories, which can be used as input data for some morecomplicated multiple comparisons by existing tools, e.g.,the program for Storey's Q value [17].

Since a BP category usually encompasses the genesinvolved in distinct processes occurring in different cellu-

A snapshot of GODAG: result pageFigure 19A snapshot of GODAG: result page.

Page 20 of 27(page number not for citation purposes)

Page 21: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

lar compartments [12] and the genes even within a sameprocess may show a clear expression distinction withrespect to their cellular localizations [13], the current 1-dimensional approaches are not sufficient enough foridentifying the diseases relevant modules. The 2-dimen-sional approach finds which parts of a BP category, occur-ring in some cellular compartments, are significantlyrelevant to disease. As demonstrated by its applications totwo cancers in this study, the cellular-localized modulesreveal some new biological processes relevant to the dis-eases in both datasets, in addition to the BPs identified inthe 1-dimensional modules. We note that, conceptually,the 2-dimensional approach should cover all BPs identifi-able by the 1-dimensional approach, but it might not bealways so because of the approximation procedure in themultiple test corrections. Therefore, GO-2D provides both1- and 2-dimensional approaches for identifying interest-ing modules of possible disease relevance. When CCDepth is chosen equal to one, the GO-2D just finds only1-dimensional modules as other software do. Addition-

A snapshot of GODAG: search pageFigure 21A snapshot of GODAG: search page.

A snapshot of GODAG: edit pageFigure 20A snapshot of GODAG: edit page.

Page 21 of 27(page number not for citation purposes)

Page 22: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 22 of 27(page number not for citation purposes)

A snapshot of GODAG: save pageFigure 23A snapshot of GODAG: save page.

A snapshot of GODAG: the DAG of resulting categoriesFigure 22A snapshot of GODAG: the DAG of resulting categories.

Page 23: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

ally, GO-2D provides the numbers of genes (from thegene expression dataset) annotated in the original BP andCC categories of each 2-dimensional module, so the usercan filter the results (e.g., according to the overlapping ofthe original BP and CC categories) to choose their interest-ing subsets. We conclude that GO-2D is a useful tool ofdetecting disease relevant modules for one of the mostimportant routine task of the functional analysis and bio-logical interpretation of the high-throughput microarraydata.

In a recent study, we have also shown the power of the 2-dimensional cellular-localized modules for dissecting theheterogeneity of the complex cancers, i.e. discovering dis-ease subtypes by unsupervised clustering analysis [7].However, there are still open spaces for further improvingthe module-based analysis approaches. For example,because changes in gene expression patterns can have var-ious forms, different statistical measures (and their thresh-olds) for finding DEGs and thus the correspondingfunctional modules should be further explored. Further-more, we will integrate GO-2D with more data resourcesin a future version.

ConclusionIn summary, we have developed a novel tool for identify-ing the well-characterized 2-dimensional modules, e.g., interms of both biological processes and cellular locations.The numerical analyses demonstrate that the 2-dimen-sional functional modules identified in two cancer data-sets enjoy explicit relevance to cancer biology, thus

suggesting hints for further experiments confirming thenovel modular mechanisms.

Availability and requirementsProject Name: GO-2D

Project home page:

For Windows version: http://www.systembiology.cn/go-2d/

For both Windows and Linux version: http://www.hrbmu.edu.cn/go-2d/index.htm

Operating system(s): Windows 2000 (XP) or Linux

Programming language: Java

Other requirements: Java 1.5

License: GNU General Public License

Restrictions to use by non-academics: Contact corre-sponding author

AbbreviationsGene Ontology (GO), biological process (BP), molecularfunction (MF), cellular component (CC), differentiallyexpressed genes (DEGs).

A snapshot of ConfusionMatrix: input pageFigure 24A snapshot of ConfusionMatrix: input page.

Page 23 of 27(page number not for citation purposes)

Page 24: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 24 of 27(page number not for citation purposes)

A snapshot of ConfusionMatrix: results pageFigure 25A snapshot of ConfusionMatrix: results page.

Page 25: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Page 25 of 27(page number not for citation purposes)

Table 1: Comparison of GO-2D with related software

Onto-Express FatiGO GoMiner GOstat GO-2D

Analysis scope 3 single categories 3 single categories 3 single categories 3 single categories Combined categoriesCorrection for multiple tests Šidák, Holm, Bonferroni, FDR Step-down minP, FDR [16, 31] Relative enrichment Holm, FDR [16] Bonferroni, FDR [16]Statistical Analysis Hypergeometric, Binomial,χ2,

Fisher's exact test,Fisher's exact test Fisher's exact test χ2, Fisher's exact test, Hypergeometric, Binomial

Visualization Flata, Tree Flata, Tree Tree, DAG Flata Tree, DAGApplication Web Web Stand-alone Web Stand-alone

Flata: The results are shown without hierarchical structure

Table 2: Functional modules for breast cancer (FDR ≤ 0.1)

Dimension Biological Process Name (# of genes d in BP)

Cellular Component Name (# of genes d in CC) # of genes d in 2D module Observed P Value

1Db nucleosome assembly (31) 8.33E-04

learning and/or memory (7) --- --- 1.58E-03

2Dc nucleosome assembly (31) nucleosome (21) 20 6.19E-05

nucleosome assembly (31) nucleus (1779) 31 8.33E-04

learning and/or memory (7) cell part (4794) 5 4.78E-04

oxygen and reactive oxygen species metabolism (36) extracellular region part (257) 5 4.78E-04

lipid biosynthesis (108) mitochondrion (376) 10 4.99E-03

complement activation (14) extracellular region (388) 11 6.68E-03

transmembrane receptor protein tyrosine kinase signalling pathway (73)

cytoplasmic part (1297) 11 6.68E-03

G-protein coupled receptor protein signalling pathway (172)

integral to plasma membrane (459) 61 7.07E-03

1Db: Functional modules based on biological process (one-dimensional modules).2Dc: Cellular-localized functional modules (two-dimensional modules).# of genes d: the numbers of genes from the gene expression dataset (annotated in the original BP/CC categories or 2-dimensional modules)

Table 3: Functional modules for gastric cancer (FDR ≤ 0.1)

Dimension Biological Process Name (# of genes d in BP)

Cellular Component Name (# of genes d in CC)

# of genes d in 2D module Observed P Value

1Db rRNA processing (46) 1.07E-05regulation of cyclin dependent protein kinase activity (28) --- --- 2.26E-05Digestion (26) 6.04E-04traversing start control point of mitotic cell cycle (5) 1.18E-03

2Dc rRNA processing (46) protein complex (1135) 22 2.22E-04rRNA processing (46) nucleolus (73) 20 1.34E-03regulation of cyclin dependent protein kinase activity (28) nucleus (2425) 18 5.27E-05Digestion (26) cell (6889) 16 4.09E-04traversing start control point of mitotic cell cycle (5) nucleus (2425) 5 1.18E-03electron transport (229) endoplasmic reticulum (378) 49 1.53E-04electron transport (229) membrane (2978) 102 2.66E-04electron transport (229) microsome (76) 29 7.80E-04cell cycle arrest (50) cytoplasm (2362) 14 4.77E-04negative regulation of cell proliferation (107) cytoplasm (2362) 37 8.41E-04NLS-bearing substrate import into nucleus (12) nuclear part (465) 5 1.18E-03one-carbon compound metabolism (24) intracellular membrane-bound organelle (3675) 7 1.67E-03

Note: See the footnotes of Table 2.

Page 26: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Authors' contributionsZG and JZ described and specified the features of, andproblems to be solved by GO-2D; JW implemented cod-ing of the software; MZ, DY, YL, DW and GX participatedin testing the program and applied the data mining strat-egy to the field datasets; all authors participated in read-ing, approving and revising the manuscript.

Additional material

AcknowledgementsThis work was supported in part by the National Natural Science Founda-tion of China (Grant Nos. 30170515, 30370388), the National High Tech Development Project of China (Grant Nos. 2003AA2Z2051 and 2002AA2Z2052).

References1. Rives AW, Galitski T: Modular organization of cellular net-

works. Proc Natl Acad Sci USA 2003, 100(3):1128-1133.2. Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular

to modular cell biology. Nature 1999, 402(6761 Suppl):C47-52.3. Segal E, Friedman N, Koller D, Regev A: A module map showing

conditional activity of expression modules in cancer. NatGenet 2004, 36(10):1090-1098.

4. Lamb J, Ramaswamy S, Ford HL, Contreras B, Martinez RV, KittrellFS, Zahnow CA, Patterson N, Golub TR, Ewen ME: A mechanismof cyclin D1 action encoded in the patterns of gene expres-sion in human cancer. Cell 2003, 114(3):323-334.

5. Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C,Topol EJ, et al.: Towards precise classification of cancers basedon robust gene functional expression profiles. BMC Bioinfor-matics 2005, 6(1):58.

6. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, LeharJ, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, et al.: PGC-1alpha-responsive genes involved in oxidative phosphoryla-tion are coordinately downregulated in human diabetes. NatGenet 2003, 34(3):267-273.

7. Xu JZ, Guo Z, Zhang M, Li X, Li YJ, Rao SQ: Peeling off the hiddengenetic heterogeneities of cancers based on disease-relevantfunctional modules. Mol Med 2006, 12(1–3):25-33.

8. Khatri P, Bhavsar P, Bawa G, Draghici S: Onto-Tools: an ensembleof web-accessible, ontology-based tools for the functionaldesign and interpretation of high-throughput gene expres-sion experiments. Nucleic Acids Res 2004:W449-456.

9. Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool forfinding significant associations of Gene Ontology terms withgroups of genes. Bioinformatics 2004, 20(4):578-580.

10. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Nar-asimhan S, Kane DW, Reinhold WC, Lababidi S, et al.: GoMiner: aresource for biological interpretation of genomic and pro-teomic data. Genome Biol 2003, 4(4):R28.

11. Beissbarth T, Speed TP: GOstat: find statistically overrepre-sented Gene Ontologies within a group of genes. Bioinformatics2004, 20(9):1464-1465.

12. Zhou X, Kao MC, Wong WH: Transitive functional annotationby shortest-path analysis of gene expression data. Proc NatlAcad Sci USA 2002, 99(20):12783-12788.

13. Jimenez JL, Mitchell MP, Sgouros JG: Microarray analysis oforthologous genes: conservation of the translational machin-ery across species at the sequence and expression level.Genome Biol 2003, 4(1):R4.

14. NCBI Web Page [ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/]15. GO Consortium [http://www.geneontology.org/]16. Benjamini Y, Y H: Controlling the false discovery rate: a practi-

cal and powerful approach to multiple testing. Journal of theRoyal Statistical Society Series B (Methodological) 1995, 57(1):289-300.

17. Storey's Q Value [http://faculty.washington.edu/jstorey/qvalue]18. Khatri P, Draghici S: Ontological analysis of gene expression

data: current tools, limitations, and open problems. Bioinfor-matics 2005, 21(18):3587-3595.

19. Zhao H, Langerod A, Ji Y, Nowels KW, Nesland JM, Tibshirani R,Bukholm IK, Karesen R, Botstein D, Borresen-Dale AL, et al.: Differ-ent gene expression patterns in invasive lobular and ductalcarcinomas of the breast. Mol Biol Cell 2004, 15(6):2523-2536.

20. Chen X, Leung SY, Yuen ST, Chu KM, Ji J, Li R, Chan AS, Law S, Troy-anskaya OG, Wong J, et al.: Variation in gene expression pat-terns in human gastric cancers. Mol Biol Cell 2003,14(8):3208-3215.

21. Dudoit S, Fridlyand J, Speed TP: Comparison of discriminationmethods for the classification of tumors using gene expres-sion data. Journal of the American Statistical Association 2002,97(457):77-87.

22. BRB ArrayTools [http://linus.nci.nih.gov/BRB-ArrayTools.html]23. Thyagarajan A, Jiang J, Hopf A, Adamec J, Sliva D: Inhibition of oxi-

dative stress-induced invasiveness of cancer cells by Gano-derma lucidum is mediated through the suppression ofinterleukin-8 secretion. Int J Mol Med 2006, 18(4):657-664.

24. Canter JA, Kallianpur AR, Parl FF, Millikan RC: Mitochondrial DNAG10398A polymorphism and invasive breast cancer in Afri-can-American women. Cancer Res 2005, 65(7):8028-8033.

25. Holland JD, Kochetkova M, Akekawatchai C, Dottore M, Lopez A,McColl SR: Differential functional activation of chemokinereceptor CXCR4 is mediated by G proteins in breast cancercells. Cancer Res 2006, 66(8):4117-4124.

26. Caragine TA, Okada N, Frey AB, Tomlinson S: A tumor-expressedinhibitor of the early but not late complement lytic pathwayenhances tumor growth in a rat model of human breast can-cer. Cancer Res 2002, 62(4):1110-1115.

27. Li X, Zhang YY, Wang Q, Fu SB: Association between endog-enous gene expression and growth regulation induced byTGF-beta1 in human gastric cancer cells. World J Gastroenterol2005, 11(1):61-68.

Additional File 1Manual of GO-2D. containing the manual of GO-2D, which is a stand-alone tool that identifies 2-dimensional functional modules enriched with interesting genes.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-8-30-S1.pdf]

Additional File 2Manual of GODAG. containing the manual of GODAG, which is a stand-alone tool that allows the users to visualize their interesting GO cat-egories as a directed acyclic graph.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-8-30-S2.pdf]

Additional File 3Manual of ConfusionMatrix. containing the manual of ConfusionMa-trix, which is a tool for comparing the resulting categories identified by 1- and 2-dimensional approaches in GO-2D.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-8-30-S3.pdf]

Additional File 4Gene information. spreadsheet containing the names of all the genes in the modules shown in Table 2 and Table 3.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2164-8-30-S4.xls]

Page 26 of 27(page number not for citation purposes)

Page 27: GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

BMC Genomics 2007, 8:30 http://www.biomedcentral.com/1471-2164/8/30

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

28. Zheng JY, Wang WZ, Li KZ, Guan WX, Yan W: Effect ofp27(KIP1) on cell cycle and apoptosis in gastric cancer cells.World J Gastroenterol 2005, 11(45):7072-7077.

29. Zhang C, Comai L, Johnson DL: PTEN represses RNA Polymer-ase I transcription by disrupting the SL1 complex. Mol Cell Biol2005, 25(16):6899-6911.

30. Prakash I, Mathur RP, Kar P, Ranga S, Talib VH: Comparative eval-uation of cell proliferative indices and epidermal growth fac-tor receptor expression in gastric carcinoma. Indian J PatholMicrobiol 1997, 40(4):481-490.

31. Benjamini Y, Drai D, Elmer G, Kafkafi N, I G: Controlling the falsediscovery rate in behavior genetics research. Behav Brain Res2001, 125(1–2):279-284.

32. Osier MV, Zhao H, Cheung KH: Handling multiple testing whileinterpreting microarrays with the Gene Ontology Database.BMC Bioinformatics 2004, 5(1):124.

Page 27 of 27(page number not for citation purposes)


Recommended