Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | aiman-roslizar |
View: | 216 times |
Download: | 0 times |
of 18
7/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
1/18
In Silico Biology 8, 0004 (2007); 2007, Bioinformation Systems e.V.
Distinct patterns in the regulation and evolution of humancancer genes
Simon J. Furney1,2#, Stephen F. Madden2#, Tomasz A. Kisiel1, Desmond G.Higgins2 and Nuria Lopez-Bigas1*
1 Research Unit on Biomedical Informatics, Experimental and Health Science Department, Universitat Pompeu Fabra, Barcelona,08080, Spain2 Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin 4, Ireland
# These authors contributed equally to the work.
* Corresponding authorEmail:[email protected]: ++34-93-3160507; Fax: +34-93-2240875
Edited by H. Michael; received July 27, 2007; revised November 12, 2007; accepted November 14, 2007; publishedDecember 13, 2007
Abstract
Understanding the mechanism of regulation of cancer genes and the constraints on their coding
sequences is of fundamental importance in understanding the process of tumour development. Here
we test the hypothesis that tumour suppressor genes and proto-oncogenes, due to their involvement
in tumourigenesis, have distinct patterns of regulation and coding selective constraints compared to
non-cancer genes.
Indeed, we found significantly greater conservation in the promoter regions of proto-oncogenes,
suggesting that these genes are more tightly regulated, i.e. they are more likely to contain a higher
density ofcis-regulatory elements. Furthermore, proto-oncogenes appear to be preferentially
targeted by microRNAs and have longer 3' UTRs. In addition, proto-oncogene evolution appears to
be highly constrained, compared to tumour suppressor genes and non-cancer genes. A number of
these trends are confirmed in breast and colon cancer gene sets recently identified by mutational
screening.
mailto:[email protected]:[email protected]:[email protected]:[email protected]7/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
2/18
Keywords: oncogenes, tumour suppressor genes, gene regulation, sequence conservation,
microRNAs, breast cancer, colon cancer
Introduction
The advent of large-scale transcriptomic and genomic analysis technologies has allowed the study of
many genes or even entire genomes simultaneously. Unfortunately, the determination of the
causative genes of cancer is hampered by the complexity of the disease, in that it may take a
number of mutated genes acting in concert to result in oncogenesis [1]. Furthermore, different genes
are mutated in different types or even subtypes of cancer [2]. Nevertheless, some genes would
appear to be innately prone to cancer induction either by aberrant expression (which can be caused
by mutation, chromosomal abnormality or misregulation) or by gain-of-function mutations.
The complexity of multicellular organisms requires an intricate mechanism to ensure that each gene
is expressed at the right time and place. This is achieved by a complex, and yet not well understood,
process of gene expression regulation, orchestrated by several layers of molecular genetic signals
(i.e. specific interactions between transcription factors and cis-regulatory motifs, postranscriptional
regulation by non-coding RNA, epigenetic signals). Aberrant expression of key genes by diverse
mechanisms is known to be the cause of many incidences of cancer. Understanding the mechanism
of regulation of genes involved in cancer is of fundamental importance to understand the process of
tumour development.
The importance of strictly-controlled gene expression in cancer is evidenced by many translocation
events such as a rearrangement placing the coding sequence of one gene downstream of the
regulatory region of another gene (e.g. T cell receptor or immunoglobulin genes) [3].
The proximal promoter regions of genes contain many transcription factor-binding sites (TFBSs)
necessary for the appropriate transcription of a gene. Mutations in the promoter regions of genescould result in improper transcription of a gene. Over 800 mutations in promoter regions are listed in
the Human Gene Mutation Database [4]. Mutations in the promoter regions of the retinoblastoma
gene have been shown to result in reduced expression leading to haploinsufficiency [5]. Furthermore,
the importance of epigenetic modifications in cancer is evident from studies of the aberrant DNA
methylation of tumour suppressor genes [6]. In addition, the role of splicing mutations is becoming
more acknowledged as an important cause of oncogenesis [7].
The discovery of microRNAs (miRNAs) as negative regulators of genes has unearthed another
mechanism by which cancer genes may be susceptible to misregulation [8]. miRNAs are short (~22
nucleotide), non-coding single stranded RNAs which, after transcription and processing, bind to
sequences that are significantly though not completely, complementary primarily in the 3' UTR ofmessenger RNAs and prevent translation [9]. Calin et al. reported the first association of cancer with
miRNAs when they identified frequent deletions and down-regulation of two miRNA genes in chronic
lymphocytic leukaemia [10]. Recently it has been shown that miRNA expression profiles can be used
as diagnostic and prognostic markers of cancer[11,12].
Cancer-causing genes have been traditionally classified as either proto-oncogenes or tumour
suppressor genes. Proto-oncogenes, are usually phenotypically dominant, requiring a pertinent
http://www.bioinfo.de/isb/2007/08/0004/main.html#1http://www.bioinfo.de/isb/2007/08/0004/main.html#1http://www.bioinfo.de/isb/2007/08/0004/main.html#1http://www.bioinfo.de/isb/2007/08/0004/main.html#2http://www.bioinfo.de/isb/2007/08/0004/main.html#2http://www.bioinfo.de/isb/2007/08/0004/main.html#2http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#4http://www.bioinfo.de/isb/2007/08/0004/main.html#4http://www.bioinfo.de/isb/2007/08/0004/main.html#4http://www.bioinfo.de/isb/2007/08/0004/main.html#5http://www.bioinfo.de/isb/2007/08/0004/main.html#5http://www.bioinfo.de/isb/2007/08/0004/main.html#5http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#7http://www.bioinfo.de/isb/2007/08/0004/main.html#7http://www.bioinfo.de/isb/2007/08/0004/main.html#7http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#9http://www.bioinfo.de/isb/2007/08/0004/main.html#9http://www.bioinfo.de/isb/2007/08/0004/main.html#9http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#11http://www.bioinfo.de/isb/2007/08/0004/main.html#11http://www.bioinfo.de/isb/2007/08/0004/main.html#11http://www.bioinfo.de/isb/2007/08/0004/main.html#12http://www.bioinfo.de/isb/2007/08/0004/main.html#12http://www.bioinfo.de/isb/2007/08/0004/main.html#12http://www.bioinfo.de/isb/2007/08/0004/main.html#12http://www.bioinfo.de/isb/2007/08/0004/main.html#11http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#9http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#7http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#5http://www.bioinfo.de/isb/2007/08/0004/main.html#4http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#2http://www.bioinfo.de/isb/2007/08/0004/main.html#17/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
3/18
mutation or chromosomal alteration of one allele to become oncogenic. Conversely, tumour
suppressor genes generally require inactivation of both alleles to induce cancer. Futreal et al. noticed
that 90% of germline mutations that are cancer-causing are recessive, which has been attributed to
probability of dominant germline mutations resulting in embryonic lethality [3,13].
Previously, global analyses of human disease genes by computational methods have yielded
important advances in the understanding of human diseases [14-18]. Recently, a number of studies
have undertaken this genomic approach for the study of cancer [19-22]. Two independent studies
have previously shown that cancer genes have higher conservation in their coding sequences
[21,22]. Also it has been shown that cancer genes are longer on average than other genes [22],
following a similar trend to hereditary disease genes [15]. This has been attributed to a higher
propensity of longer sequences to suffer mutations.
In this study we analyse patterns of regulation, coding selective constraints and gene structure in
human cancer genes by focusing on the differences between proto-oncogenes, tumour suppressor
genes, and non-cancer genes based on Cancer Gene Census annotation [3]. In addition, we extend
our analysis to a set of breast cancer and colon cancer genes recently identified by mutational
screening [23].
To allow cancer researchers to browse regulatory, evolutionary and gene structure data of cancer
genes, we have created a web-based, searchable database through which sequence conservation,
gene structure and splicing, and gene regulation information for individual cancer genes and cancer
genes as a group can be accessed (http://bg.upf.edu/cgprop/).
Methods
Data
The list of genes involved in cancer was obtained from the Cancer Gene Census [3]. Using the NCBI
LocusLink database [24] and the Ensembl version 37 database [25], we located the corresponding
gene sequence records. The result was a list of 325 genes associated with human cancer (C). All
other Ensembl protein-coding genes were classified as non-cancer (NC; n = 16,815). Therefore, the
non-cancer gene dataset potentially contains cancer genes that are not included in the Cancer Gene
Census as yet. Cancer genes were classified as Cancer Dominant (CD; n = 265) or Cancer
Recessive (CR; n = 60) according to the Cancer Gene Census [3]. Genes in which both type of
mutations (dominant and recessive) have been found were not used for the study. The colon cancer
(CC; n = 68) and breast cancer gene (BC; n = 122) sets were obtained from Sjoblom et al.[23].
Housekeeping genes are defined as those genes satisfying the condition of being expressed in morethan 75 tissues in Gene Expression Atlas version 2 [26] at an intensity above 200 standard
Affymetrix units [27]. After removing Cancer Census Genes, 2505 housekeeping genes remained.
Promoter conservation length
Promoter sequences were taken from the 4-way conserved genome sequence (human, mouse, rat
and dog) as generated by Xieet al.[28]. This dataset uses 2 kb upstream and 2 kb downstream of
http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#18http://www.bioinfo.de/isb/2007/08/0004/main.html#18http://www.bioinfo.de/isb/2007/08/0004/main.html#18http://www.bioinfo.de/isb/2007/08/0004/main.html#19http://www.bioinfo.de/isb/2007/08/0004/main.html#19http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://bg.upf.edu/cgprop/http://bg.upf.edu/cgprop/http://bg.upf.edu/cgprop/http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#26http://www.bioinfo.de/isb/2007/08/0004/main.html#26http://www.bioinfo.de/isb/2007/08/0004/main.html#26http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#26http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://bg.upf.edu/cgprop/http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#19http://www.bioinfo.de/isb/2007/08/0004/main.html#18http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#37/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
4/18
the annotated transcription start site (TSS) for 17,455 genes. The information was based on the
NCBI RefSeq project [24]. It should be pointed out that for some genes, the precise location of the
TSS is unclear and is an estimate based on computer predictions and EST data. If the translation
start site was within 2 kb of the annotated TSS a shorter promoter region was taken that did not
include the protein coding sequence. Approximately 51% of the bases could be aligned across all
four species. The promoter conservation length was the number of bases within the human promoterthat could be aligned with mouse, rat and dog. To account for differences in the lengths of the
promoter regions analysed, results are reported in terms of proportion of bases aligned.
Where possible, a single reference RefSeq promoter was chosen for each of the 325 genes
associated with human cancer. Genes with alternatively spliced first exons were excluded, as the
individual genes had multiple different promoter alignments. This gave a set of 236 cancer dominant
genes, and 56 cancer recessive genes (292 cancer genes in total).
CpG Islands
The cpgplot program from the EMBOSS package [29] was used to detect CpG islands in the alignedsegments of the human promoter sequences, based on the alignments generated by Xie et al.[28].
Cpgplot was run using its default parameters (at least 50% GC content and with CpG content of >0.6
times the expected frequency) with the exception that the window length was raised to 200 bp. This
follows the standard CpG island definitions [30,31].
MicroRNAs
Xie et al.[28] identified a set of 540 8-mers motifs from 3' UTRs of human genes that are potential
miRNA targets. From these 8-mers we generated a list 4988 genes that are putatively regulated by
miRNAs, i.e. they contain one or more 8-mers in their 3' UTR. The UTR information was obtained
fromhttp://www.broad.mit.edu/seq/HumanMotifs/ . If there were multiple transcripts for a single gene
each containing an 8-mer motif the gene was only counted once, i. e. 6888 RefSeq transcripts
contain 8-mer motifs in their 3' UTR representing 4988 genes. Where there were multiple different
UTRs for the same gene, the median UTR length was used to calculate the UTR length.
Calculation of evolutionary rates at DNA level
The set of 13,454 human-chimp orthologues used for gene evolution analysis by the Chimpanzee
Sequencing and Analysis Consortium [32] was used to create a non-redundant dataset of 250 cancer
genes, consisting of 202 dominant (CD) and 48 recessive (CR) genes, and 11,881 "non-cancer" (NC)
genes. KA, KS and KI data for the genes were taken from the Chimpanzee Sequencing and AnalysisConsortium supplementary data [32]. The set of human-mouse-dog orthologues used for gene
evolution analysis of the domestic dog genome [33] was used to create a non-redundant dataset of
224 cancer genes, consisting of 183 dominant (CD) and 41 recessive (CR) genes, and 11,775 "non-
cancer" (NC) genes. KA, KS and KA/KS data for the genes were downloaded
fromwww.broad.mit.edu/ftp/pub/papers/dog_genome/suppinfo/.
http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#24http://www.bioinfo.de/isb/2007/08/0004/main.html#29http://www.bioinfo.de/isb/2007/08/0004/main.html#29http://www.bioinfo.de/isb/2007/08/0004/main.html#29http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#30http://www.bioinfo.de/isb/2007/08/0004/main.html#30http://www.bioinfo.de/isb/2007/08/0004/main.html#30http://www.bioinfo.de/isb/2007/08/0004/main.html#31http://www.bioinfo.de/isb/2007/08/0004/main.html#31http://www.bioinfo.de/isb/2007/08/0004/main.html#31http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.broad.mit.edu/seq/HumanMotifs/http://www.broad.mit.edu/seq/HumanMotifs/http://www.broad.mit.edu/seq/HumanMotifs/http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#33http://www.bioinfo.de/isb/2007/08/0004/main.html#33http://www.bioinfo.de/isb/2007/08/0004/main.html#33http://www.broad.mit.edu/ftp/pub/papers/dog_genome/suppinfo/http://www.broad.mit.edu/ftp/pub/papers/dog_genome/suppinfo/http://www.broad.mit.edu/ftp/pub/papers/dog_genome/suppinfo/http://www.broad.mit.edu/ftp/pub/papers/dog_genome/suppinfo/http://www.bioinfo.de/isb/2007/08/0004/main.html#33http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.broad.mit.edu/seq/HumanMotifs/http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#31http://www.bioinfo.de/isb/2007/08/0004/main.html#30http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#29http://www.bioinfo.de/isb/2007/08/0004/main.html#247/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
5/18
Gene structure analysis
Gene length, coding sequence length and number of introns per gene were obtained from the
Ensembl database [25].
Protein-protein interaction analysis
Human protein-protein interaction data from Jonsson and Bates [34] were retrieved and analysed
independently.
Statistical analysis
To assess the statistical significance of the results we used the two-tailed Mann-Whitney test, using
the R statistics package [35], for conservation and gene structure data. A Fisher exact test was used
to determine whether datasets were more likely to contain CpG Islands or miRNA targets than
expected and to test for over-representation in the cancer set of individual 8-mer motifs. Results are
reported as significant at the p < 0.001 level.
Database
We have created a web-based database in which the results of all our comparisons are stored. This
searchable site is available athttp://bg.upf.edu/cgpropand information on individual and groups of
cancer genes from our study is presented.
Results
Gene regulation
Aberrant expression of key genes by diverse mechanisms is known to be a frequent cause of cancer.
In order to understand better the mechanisms of regulation of genes involved in cancer we have
studied regulatory properties of known cancer genes such as promoter conservation, 3' UTR
microRNA targets and CpG islands.
Promoter conservation
Our analysis of promoter conservation focused upon the proximal promoter regions of cancer genes
(4kb centred on the TSS). Here we found a significant greater degree of conservation in 292 cancer
genes when compared to ~15,000 non-cancer genes (Tab. 1,p-value M-W test = 1.89 x107). This
was based on the proportion of bases within the human proximal promoter that could be aligned with
mouse, rat and dog (median conservation length per bp = 0.545 bp). Moreover, the degree of
promoter conservation was significantly different between dominant and recessive cancer genes (p-
value M-W test = 1.32 106). Overall the regions of conservation are far shorter in recessive cancer
http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#25http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#35http://www.bioinfo.de/isb/2007/08/0004/main.html#35http://www.bioinfo.de/isb/2007/08/0004/main.html#35http://bg.upf.edu/cgprophttp://bg.upf.edu/cgprophttp://bg.upf.edu/cgprophttp://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://bg.upf.edu/cgprophttp://www.bioinfo.de/isb/2007/08/0004/main.html#35http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#257/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
6/18
(median length = 0.434 bp) than dominant cancer genes (median length = 0.689 bp). Examples of
cancer genes with very high promoter alignment between the four species include the proto-
oncogenes MTG8 (0.997 bp) and B-cell lymphoma 6 protein (0.997 bp).
Table 1: Median or the proportion of aligned promoter, percentages of genes with CpG islands and 3' UTR putativemiRNA targets, and 3' UTR lengths of cancer and non-cancer datasets
Non
cancer
House
keeping
Cancer Cancer
dom
Cancer
rec
Proportion of aligned promoter 0.543 0.531 0.636 0.689 0.434
CpG islands (%) 59 73 73 73 71
miRNA targets (%) 34 38 50 55 26
3' UTR length 690 629 920 958 668
CpG islands
Epigenetic alterations, such as abnormal DNA-methylation patterns, are associated with many
human tumour types. DNA methylation typically occurs at the CpG dinucleotide sequence. This
dinucleotide often occurs in clusters close to or overlapping the TSS called CpG islands. We found
that cancer genes were significantly more likely to contain CpG islands than non-cancer genes (p-
value Fisher exact (FE) test = 1.07 106). There was no significant difference between dominant
and recessive cancer genes.
MicroRNAs
Several articles have now shown that small non-coding RNAs play an important role in cancer
[36,37]. Using the conserved 3' UTR 8-mer motifs identified by Xie et al. from human-mouse-rat-dog
comparisons [28], we were able to identify a list of ~4700 genes that are potentially targeted by
miRNAs. A significantly larger number of cancer genes contain these 8-mers within their 3' UTRs,
compared to the non-cancer genes (p-value FE test = 5.59 108). This increase was only seen in
dominant cancer genes (55%). Interestingly, a lower percentage of recessive cancer genes (26%),
contain these 8-mer motifs in their 3' UTRs compared to the non-cancer genes (34%). In addition,
when normalised by 3' UTR length, dominant cancer genes contain a motif every 191 nucleotides
compared to every 216 and 212 nucleotides for recessive cancer genes and non-cancer genes,
respectively.
Moreover, the length of 3' UTR is significantly greater in cancer genes than in non-cancer genes
(Tab. 1). Overall, cancer genes have a median 3' UTR length of 920 bp compared to 690 bp for non-
cancer genes (p-value M-W test = 6.02 105). Again this difference is only evident in dominant
(median 3' UTR length = 958 bp) and not in recessive cancer genes (668 bp). It has recently been
shown that genes that are targeted by miRNAs have longer 3' UTRs [38]. It would therefore stand to
reason, that as proto-oncogenes genes are preferentially targeted by miRNAs, they also have longer
3' UTRs.
http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#38http://www.bioinfo.de/isb/2007/08/0004/main.html#38http://www.bioinfo.de/isb/2007/08/0004/main.html#38http://www.bioinfo.de/isb/2007/08/0004/main.html#38http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#367/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
7/18
A number of miRNAs have been implicated as oncogenes, in that they target and hence reduce the
production of specific proto-oncoproteins or tumour suppressor proteins [8]. We examined our cancer
dataset for the presence of conserved 8-mer motifs corresponding to the putative tumour suppressor
miRNAs hsa-miR-15a, hsa-miR-16, hsa-miR-143, hsa-miR-145 and the hsa-let7 family, and the
oncogenic hsa-miR-21 and hsa-miR-155. Among the 16 known cancer genes that contain potential
targets for these miRNAs are RET (hsa-miR-16), Homeobox protein PHOX1 (hsa-let7), andTranscriptional regulator ERG (has-miR-145) (Tab. 2). In addition, we found targets in two genes for
which there is experimental evidence of oncogenesis: hsa-let7 in N-RAS [39] and hsa-miR-15 / hsa-
miR-16 in BCL-2 [10]. The seven miRNAs listed have potential targets in a further 426 human genes.
Using the Fisher exact test we were able to identify six 8-mer motifs as being significantly over-
represented in the dominant cancer set (p-value FE test < 1 103,Tab. 3). This includes a motif that
is targeted by the mir-17-5p miRNA. Although this miRNA has recently been implicated in B-cell
lymphomas, the study reported the miRNA as being over-expressed [40]. There were no over- or
under-represented motifs in the recessive cancer set.
Table 2: Cancer Gene Census genes containing putative targets for oncogenic miRNAs.
Ensembl Gene Description miRNA hs-
ENSG00000007237 Growth-arrest-specific protein 7 let7
ENSG00000108091 Coiled-coil domain-containing protein 6 miR-15, miR-16
ENSG00000108821 Collagen alpha-1(I) chain let7
ENSG00000114999 Tubulin--tyrosine ligase let7
ENSG00000116132 Paired mesoderm homeobox protein 1 let7
ENSG00000119537 3-ketodihydrosphingosine reductase miR-15, miR-16
ENSG00000129993 myeloid translocation gene-related protein 2 miR-15, miR-16
ENSG00000137309 High mobility group protein HMG-I/HMG-Y let7
ENSG00000151702 Friend leukemia integration 1 TF miR-145
ENSG00000157554 Transcriptional regulator ERG miR-145
ENSG00000164985 PC4 and SFRS1 interacting protein miR-155
ENSG00000165731 Tyrosine-protein kinase receptor ret miR-15, miR-16
ENSG00000168638 GTPase NRas let7
ENSG00000171791 Apoptosis regulator Bcl-2 miR-15, miR-16
ENSG00000181690 pleiomorphic adenoma gene 1 miR-15, miR-16
ENSG00000184012 Transmembrane protease, serine 2 let7
Table 3: miRNA targets over-represented in Cancer Gene Census dominant cancer genes.
Motif Description # Total # Cancer
genes
P-value
TGACCAAA unknown 69 9 0.00029
TTTGGTGC unknown 42 7 0.00038
http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#8http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-2http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-2http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-2http://www.bioinfo.de/isb/2007/08/0004/main.html#39http://www.bioinfo.de/isb/2007/08/0004/main.html#39http://www.bioinfo.de/isb/2007/08/0004/main.html#39http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-3http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-3http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-3http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-3http://www.bioinfo.de/isb/2007/08/0004/main.html#10http://www.bioinfo.de/isb/2007/08/0004/main.html#39http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-2http://www.bioinfo.de/isb/2007/08/0004/main.html#87/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
8/18
GTTGCACT unknown 21 5 0.00065
AGCACTTT mir-17-5p 135 12 0.00078
AGTGCCTT unknown 155 13 0.00078
AAGTGCCT mir-65 136 12 0.00083
Gene structure of cancer genes
It has been reported that cancer genes involved in translocations have longer genic sequences whilst
genes whose cancer-inducing effects are due to point mutations possess longer coding sequences
[22]. Our analysis shows that both dominant and recessive cancer genes have significantly greater
total gene length and coding sequence length than non-cancer genes (Tab. 4andFig. 1). For
example, the promiscuous translocation proto-oncogenes ETV6 (>245 Kb) and MLL (>88 Kb)
possess much greater than the median gene length of non-cancer genes (~17 Kb). Furthermore,
three of the longest human coding sequences belong to the tumour suppressor genes BRCA2, ATMand APC (all >8.5 Kb; NC median = 1.1 Kb). As these genes have long coding sequences they are
more prone to the type of chance mutations that result in loss of function (i.e. missense, nonsense
and splicing mutations).
Defects in the splicing process have been implicated in oncogenesis [41-43] and mutations affecting
splicing have been reported to be a frequent cause of hereditary disease [44]. Interestingly, only
1.2% of cancer genes are single exon genes compared to 13.7% of non-cancer genes. Both
dominant cancer (CD) and recessive cancer (CR) genes have greater median number of exons (CD
= 11, CR = 14.5) compared to non-cancer genes (NC = 6).
Table 4: Median length properties and exon number of cancer and non-cancer genes.
Non
cancer
House
keeping
Cancer Cancer
dom
Cancer
rec
Gene length 16841 16716 52096 52573 49171
CDS length 1101 1073 1779 1734 2045
Exon length 133 130 134 134 133
Intron length 1473 1135 1594 1602 1574
Exon number 6 8 12 11 14.5
http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#img-1http://www.bioinfo.de/isb/2007/08/0004/main.html#img-1http://www.bioinfo.de/isb/2007/08/0004/main.html#img-1http://www.bioinfo.de/isb/2007/08/0004/main.html#41http://www.bioinfo.de/isb/2007/08/0004/main.html#41http://www.bioinfo.de/isb/2007/08/0004/main.html#43http://www.bioinfo.de/isb/2007/08/0004/main.html#43http://www.bioinfo.de/isb/2007/08/0004/main.html#43http://www.bioinfo.de/isb/2007/08/0004/main.html#44http://www.bioinfo.de/isb/2007/08/0004/main.html#44http://www.bioinfo.de/isb/2007/08/0004/main.html#44http://www.bioinfo.de/isb/2007/08/0004/main.html#44http://www.bioinfo.de/isb/2007/08/0004/main.html#43http://www.bioinfo.de/isb/2007/08/0004/main.html#41http://www.bioinfo.de/isb/2007/08/0004/main.html#img-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#227/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
9/18
Click on the thumbnail to enlarge the picture
Figure 1: Statistical comparisons for Cancer (C), Non-cancer (NC),
Cancer Dominant (CD), Cancer Recessive (CR) genes: (a) gene
regulation, (b) gene structure, and (c) coding sequence conservation (3
Ka/Ks denotes comparison of human-mouse-dog orthologues). All
comparisons are two-tailed Mann-Whitney tests with the exception of
Fisher exact tests for CpG islands and miRNA targets. Colours in the
red spectrum indicate that the first group of genes in a comparison has avalue that is significantly greater than that of the second group of genes.
Colours in the blue spectrum indicate that the first group of genes in a
comparison has a value that is significantly smaller than that of the
second group of genes.
Selective pressures acting on cancer genes at the coding sequence level
Many mutations in coding sequences of cancer genes are known to be involved in the development
of malignant tumours [3]. In addition, previous analyses have shown that cancer genes are on
average more conserved than other genes [21,22]. Thus, we decided to analyse the selective
pressures acting on coding sequences of proto-oncogenes and tumour suppressor genes.
We examined non-synonymous (KA) and synonymous (KS) coding sequence nucleotide substitution
rates, as well as intron sequence substitution rates (KI), from human-chimpanzee orthologues [32].
Our analysis has revealed a significantly higher value ofKA in non-cancer genes, in general,compared to cancer genes (p-value Mann Whitney (M-W) test forKA = 2.69 107). This result is in
congruence with a previous smaller-scale study and with results at the protein level [21,22].
However, the main difference between cancer and non-cancer genes appears to be due to dominant
cancer genes, which exhibit significantly lower non-synonymous coding sequence nucleotide
substitution rates, indicating greater selective pressure on these genes, compared to non-cancer
genes (p-value M-W test forKA = 6.21 108). There are no significant differences in KI (the intronic
substitution rate) between any of the groups, indicating that other differences between the groups, in
terms of selective pressures on coding sequence, cannot be simply ascribed to varying mutation
rates in the genome. The trend of lower non-synonymous coding sequence nucleotide substitution
rates in dominant cancer genes compared to both non-cancer genes and recessive cancer genes is
also evident from the analysis of human-mouse-dog orthologues, therefore showing that the human-
chimpanzee results hold at greater evolutionary distances (see web database for details). In addition,
protein sequence conservation information is included in the web database for 17 Ensembl
eukaryotic proteomes from human to yeast.
Division of genes by functional annotation
http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.htmlhttp://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#32http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#21http://www.bioinfo.de/isb/2007/08/0004/main.html#37/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
10/18
Previously it has been found that the proportions of different functional annotations in cancer and
non-cancer genes are not the same [22]. Therefore, it could be that the differences between cancer
and non-cancer genes are due to the dissimilarities between the different types of functional genes.
In order to rule out this possibility each human gene was classified according to the molecular
function of its protein product as determined by Gene Ontology (GO) 'slim' terms [45]. The patterns
for the seven GO slim terms with ten or more cancer genes (enzyme regulation, transcriptionregulation, nucleic acid binding, catalysis, signal transduction, and structural molecule) are the same
as for the analysis of the cancer genes and non-cancer genes as whole groups for gene regulation,
gene structure and sequence conservation (data not shown). The low number of recessive cancer
genes prevented a functional analysis between dominant and recessive cancer genes.
Analysis of housekeeping genes
It has been shown previously that housekeeping genes have different properties to other genes
[16,27]. Tables 1 and 4 show that housekeeping genes, as a group, display different regulatory and
gene structure properties to cancer genes and in particular proto-oncogenes, with the exception of
both groups of genes having similar proportions of genes with CpG islands. However, in terms of
coding sequence conservation cancer genes and housekeeping genes are not significantly different.
Analysis of protein-protein interaction data
A previous study has shown that, on average, Cancer Gene Census proteins have more interaction
partners than proteins from other genes [34]. We see that this holds when the dominant / recessive
division is made. Dominant cancer proteins (mean interactions = 15.9) and recessive cancer proteins
(mean interactions = 16.8) have more interaction partners than non-cancer proteins (mean
interactions = 6.2) or housekeeping proteins (mean interactions = 9.8).
Analysis of breast and colon cancer genes
Recently, two groups of candidate cancer genes involved in breast cancers and in colon cancers
have been identified by mutational screens [23]. We analysed the two groups of genes, breast
cancer (BC; n = 122 genes) and colon cancer (CC; n = 68 genes), as was conducted for the Cancer
Gene Census (CGC) genes (Tab. 5). In total, 2 genes are common to the breast and colon cancer
genes, 7 to the breast cancer and CGC genes, and 9 to the colon cancer and CGC genes. A
fundamental difference between the Cancer Gene Census set and BC and CC sets is that for the
latter two we do possess not certain information on the nature of the mutation (dominant or
recessive), thus preventing an analysis of these sets divided into proto-oncogenes and tumour
suppressor genes. This implies that those features that show essential differences between proto-
oncogenes and tumour suppressor genes (e.g. some regulatory properties and coding constraints)
cannot be properly assessed in these groups due to our lack of knowledge of the proportion of
cancer genes in each category. Therefore, this will affect the analysis of properties that are
essentially different between proto-oncogenes and tumour suppressor genes but not between cancer
and non-cancer genes (e.g. gene structure).
http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#45http://www.bioinfo.de/isb/2007/08/0004/main.html#45http://www.bioinfo.de/isb/2007/08/0004/main.html#45http://www.bioinfo.de/isb/2007/08/0004/main.html#16http://www.bioinfo.de/isb/2007/08/0004/main.html#16http://www.bioinfo.de/isb/2007/08/0004/main.html#16http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#34http://www.bioinfo.de/isb/2007/08/0004/main.html#27http://www.bioinfo.de/isb/2007/08/0004/main.html#16http://www.bioinfo.de/isb/2007/08/0004/main.html#45http://www.bioinfo.de/isb/2007/08/0004/main.html#227/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
11/18
Table 5: Comparison of Non-cancer, Cancer Gene Census, Breast cancer and Colon Cancer genes sets (median
values).
Non cancer CGC Colon Breast
Regulation
Promoter 0.542 0.636 0.670 0.626CpG islands (%) 0.59 0.73 0.54 0.57
miRNAs (%) 0.33 0.50 0.51 0.44
3' UTR 689 920 1032 853
Gene structure
Cds length 1095 1779 2292 2728.5
Gene length 16649 52096 80880 63955
Exon number 6 12 14 16.5
Paralogy
CS paralogues 0.466 0.398 0.392 0.327
Coding constraints
KA/KS hs-pt 0.148 0.083 0.136 0.204
3 KA/KS hs 0.112 0.068 0.092 0.110
3 KA/KS mm 0.088 0.062 0.055 0.083
3 KA/KS cf 0.095 0.063 0.061 0.107
Indeed, the gene structure of the sets of both breast and colon cancer genes follows a similar pattern
to the CGC set, with even greater differences compared to non-cancer genes (Tab. 5). Both breast
and colon cancer genes have significantly longer coding (p < 2.2 1016 for both) and gene
sequences (p < 2.2 1016 for BC; p = 1.04 104 for CC) and higher number of exons (p < 2.2
1016 for BC; p = 1.08 1011 for CC).
In addition, the colon cancer set has longer promoter conservation (p = 0.004), higher proportion of
genes containing putative miRNA targets (p = 0.006) and longer 3' UTRs (p = 0.008). The set of
genes mutated in breast cancer also has higher median values for these three regulatory properties
(Tab. 5), although the differences are not statistically significant. However, neither of these two sets
appears to be more likely to contain CpG islands than non-cancer genes.
In general the breast and colon cancer genes are not significantly more conserved at the coding
sequence level compared to non-cancer genes, although the median KA/KS value is consistently
lower in colon cancer genes for all comparisons.
Previously, we ascertained that tumour suppressor genes and genes involved in Mendelian
recessive diseases generally do not possess close paralogues (as determined by conservation score
http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-57/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
12/18
(CS) of the closest paralogue) [15,22]. This pattern has been attributed to the fact that highly similar
paralogues can potentially compensate for a mutated protein in which case this gene will not be
involved in the disease. We have studied the pattern of paralogy in the sets of genes in BC and CC
finding that breast cancer genes have significantly less conserved paralogues than non-cancer
genes (p-value M-W test = 1 106,Tab. 5). Colon cancer genes have also lower median CS for
paralogues than non-cancer genes but it is only marginally significant (p-value M-W test =0.016,Tab. 5).
Discussion
Our analysis has focused on sequence differences between proto-oncogenes, tumour suppressor
genes and non-cancer that are likely to be important for appropriate gene regulation and function.
We have shown that in general proto-oncogenes appears to be more highly tightly regulated, as is
evident by their long conserved promoters, regulation by miRNA and presence of CpG islands. In
addition, we show that proto-oncogenes have more selective constraints in their coding sequences
than no-cancer genes or tumour suppressor genes. And both sets of cancer genes have longergenes and coding sequences and more exons.
The observation that dominant cancer genes are more strictly regulated than either recessive cancer
genes, or non-cancer genes is consistent with the biological characteristics of these genes, where
any form of misregulation of the right levels of these proteins is potentially oncogenic. Thus, these
genes appear to require a complex regulatory control at both transcriptional and post-transcriptional
level (i.e. by microRNAs).
Epigenetic effects, particularly DNA hypermethylation, have long been associated with many human
tumour types. Unlike genetic mutations, the nucleotide sequence is not altered as a result of this
epigenetic event. When CpG island hypermethylation occurs within the regulatory regions of genes, it
may result in silencing of the corresponding genes. Fully methylated CpG islands are known to
naturally occur in the promoters of genes inactivated on the female X chromosome [46]. Aberrant
CpG island methylation is associated with tumourgenesis [6,47]. We found a disproportionately large
number of CpG islands in the promoters of proto-oncogenes compared to non-cancer genes.
We searched our dataset for the presence of conserved octamers that are potentially seed sites for
microRNA binding. MicroRNAs are small non-coding RNAs that suppress translation via non-perfect
pairing with target mRNAs [48]. Recent estimates have calculated that between 20% and 30% of all
human genes are targeted by miRNAs [28]. They have also been implicated in a wide range of
diseases including cancer [36,37]. We found that an over-representation of cancer genes that are
potentially targeted by miRNAs (Tab. 1). Interestingly, this enrichment of miRNA targets is specific to
dominant cancer genes. Recessive cancer genes show no difference to background levels of miRNAtargets. Since miRNAs are negative regulators at the posttranscriptional level, it is conceivable that
proto-oncogenes, whose over-expression could cause inappropriate cell proliferation, are
preferentially targeted by miRNAs, allowing a further level of the control of protein expression.
Of the 540 8-mer motifs that we used to identify miRNA targets, none was over-represented in
recessive cancer genes (the lowestp-value was 0.32). The six motifs that were over-represented in
http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#15http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#46http://www.bioinfo.de/isb/2007/08/0004/main.html#46http://www.bioinfo.de/isb/2007/08/0004/main.html#46http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#47http://www.bioinfo.de/isb/2007/08/0004/main.html#47http://www.bioinfo.de/isb/2007/08/0004/main.html#47http://www.bioinfo.de/isb/2007/08/0004/main.html#48http://www.bioinfo.de/isb/2007/08/0004/main.html#48http://www.bioinfo.de/isb/2007/08/0004/main.html#48http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-1http://www.bioinfo.de/isb/2007/08/0004/main.html#37http://www.bioinfo.de/isb/2007/08/0004/main.html#36http://www.bioinfo.de/isb/2007/08/0004/main.html#28http://www.bioinfo.de/isb/2007/08/0004/main.html#48http://www.bioinfo.de/isb/2007/08/0004/main.html#47http://www.bioinfo.de/isb/2007/08/0004/main.html#6http://www.bioinfo.de/isb/2007/08/0004/main.html#46http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#157/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
13/18
cancer dominant genes (p-value < 0.001) included the motif bound by the mir-17-5p miRNA, which
has previously been implicated in B-cell lymphomas [40].
Our study also demonstrates that the coding sequences of proto-oncogenes are significantly more
constrained by selection than tumour suppressor genes and non-cancer genes. We recently studied
the evolutionary history of hereditary disease genes, reporting significantly greater selective
constraints on the molecular evolution of dominant disease genes compared to recessive disease
and non-disease genes [49]. We found that the mode of inheritance of a gene may be an important
determinant of its rate of evolution. These results have been used to differentiate between candidate
genes for dominant and recessive disorders [50].
In terms of coding sequence evolution, much of the difference between non-cancer and cancer
genes would appear to be attributable to a higher level of purifying selection evident in the molecular
evolution of dominant cancer genes (proto-oncogenes). Point mutations that are activating in
dominant cancer genes are predominantly somatic mutations, which has been ascribed to the
potential embryonic-lethality of germline mutations in the genes [13]. However, according to the data
in the Cancer Gene Census approximately only 10% of dominant cancer genes are prone to
activation by missense mutations with the majority being activated by translocation [3].
Recessive cancer genes appear to be under no greater selectional pressure than non-cancer genes.
An individual may harbour a germline mutation in one allele of a recessive cancer gene without a
significant reduction in fitness in the absence of a further somatic mutation to the remaining
functional allele. In this way, the mutant allele is invisible to selection. Although both cancer genes
and disease genes display similar conservation and sequence length properties compared, they
differ in terms of their functional properties [14,22].
The dataset of cancer genes used for analysis is from the Cancer Gene Census [3], which is
predominantly comprised of proto-oncogenes involved in translocations and is hence biased by
traditional cancer gene discovery techniques. It is important to examine if the traits identified in thisset of genes are unique to this group or are generally indicative of further cancer genes. Recently,
two groups of cancer genes potentially involved in breast cancers and in colon cancers have been
identified by mutational screens [23], thus affording us the opportunity to test this.
The most striking similarity between all three groups of cancer genes is the greater coding sequence
length compared to non-cancer genes (Tab. 5). This trend is also clear in both dominant and
recessive cancer genes (Tab. 4). The property may contribute to a gene's susceptibility to become
cancer-inducing. Genes with longer coding sequences are presumably more likely to suffer mutations
by chance. Indeed, the breast and colon cancer genes have greater median coding sequence
lengths than the CGC genes.
Median values for promoter length and 3' UTR length are also consistently higher for colon and
breast cancer genes compared to non-cancer genes (Tab. 5), and higher proportions of genes with
microRNA targets are found in these sets. However the differences are not always statically
significant.
Although the breast and colon cancer genes are not significantly different to non-cancer genes in
sequence conservation and some regulatory properties, we have seen from the CGC dataset that
treating proto-oncogenes and tumour suppressor genes as a single dataset can dilute the signals of
http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#40http://www.bioinfo.de/isb/2007/08/0004/main.html#49http://www.bioinfo.de/isb/2007/08/0004/main.html#49http://www.bioinfo.de/isb/2007/08/0004/main.html#49http://www.bioinfo.de/isb/2007/08/0004/main.html#50http://www.bioinfo.de/isb/2007/08/0004/main.html#50http://www.bioinfo.de/isb/2007/08/0004/main.html#50http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-4http://www.bioinfo.de/isb/2007/08/0004/main.html#tab-5http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#22http://www.bioinfo.de/isb/2007/08/0004/main.html#14http://www.bioinfo.de/isb/2007/08/0004/main.html#3http://www.bioinfo.de/isb/2007/08/0004/main.html#13http://www.bioinfo.de/isb/2007/08/0004/main.html#50http://www.bioinfo.de/isb/2007/08/0004/main.html#49http://www.bioinfo.de/isb/2007/08/0004/main.html#407/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
14/18
the proto-oncogenes. It is possible that the breast and colon cancer genes contain a higher
proportion of tumour suppressor than the CGC genes as tumour suppressor genes are more
susceptible to loss-of-function mutations, which are more likely to occur by chance than specific gain-
of-function mutations in proto-oncogenes.
As with global analyses of this nature, our study has certain limitations both methodologically, due to
the small sample size of the cancer genes and much greater "non-cancer" gene dataset, and
biologically. We are presenting a view of the Sanger Cancer Gene Census, the genes in which
(particularly the proto-oncogenes) have mainly been discovered in haematological tumours resulting
from translocations. As the involvement of further genes in cancer may be elucidated by different
methods such as mutational screens or epigenetic studies, the trends we have discovered may
change or become insignificant. Indeed, we see some different properties in the breast and colon
candidate cancer genes from the Sjoblom study [23] compared to those in the Cancer Gene Census.
We also acknowledge that high evolutionary sequence conservation and signals of complex
regulatory control in a gene are not inevitable signs of involvement in cancer. However, it is clear that
some genes are more susceptible to becoming oncogenic than others and it is hoped that global
studies such as the present one will in some way contribute to unearthing why this is the case.
In summary, our results demonstrate that proto-oncogenes display greater proximal promoter
conservation, have a greater tendency to be associated with CpG islands and have longer 3' UTRs
are more likely to contain potential miRNA targets than non-cancer genes. These traits are likely
indicators of genes that are strictly regulated in normal cells and aberrant expression of these genes
can result in tumour formation. Furthermore, we demonstrate that proto-oncogenes are subject to
greater selective evolutionary pressures than both tumour suppressor and non-cancer genes, and
that both subsets of cancer genes have more complex gene structure. Finally, we show that the gene
structure and a number of regulatory identified in Cancer Gene Census genes are shared in genes
identified in mutational screens of breast and colon tumours.
Acknowledgements
We thank Professor F.X. Real for valuable comments and discussion. N. L-B is recipient of a Ramn
y Cajal contract of the Spanish Ministerio de Educacin y Ciencia (MEC). We acknowledge funding
from the International Human Frontier Science Program Organization (HFSPO) and from the Spanish
Ministerio de Educacin y Ciencia grant number SAF2006-0459.
References
1. Hanahan, D. and Weinberg, R.A. (2000). The hallmarks of cancer. Cell 100, 57-70.
2. Dyrskjt, L., Thykjaer, T., Kruhffer, M., Jensen, J. L., Marcussen, N., Hamilton-Dutoit, S., Wolf, H. and rntoft, T. F. (2003). Identifying distinct classes of bladder carcinomausing microarrays. Nat. Genet. 33, 90-96.
http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.bioinfo.de/isb/2007/08/0004/main.html#23http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10647931http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10647931http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10647931http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10647931http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12469123http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10647931http://www.bioinfo.de/isb/2007/08/0004/main.html#237/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
15/18
3. Futreal, P. A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman,N. and Stratton, M. R. (2004). A census of human cancer genes. Nat. Rev. Cancer 4, 177-183.
4. Stenson, P. D., Ball, E. V., Mort, M., Phillips, A. D., Shiel, J. A., Thomas, N. S. T.,
Abeysinghe, S., Krawczak, M. and Cooper, D. N. (2003). Human Gene Mutation Database(HGMD): 2003 update. Hum. Mutat. 21, 577-581.
5. Sakai, T., Ohtani, N., McGee, T. L., Robbins, P. D. and Dryja, T. P. (1991).Oncogenic germ-line mutations in Sp1 and ATF sites in the human retinoblastoma gene.Nature 353, 83-86.
6. Esteller, M., Corn, P.G., Baylin, S. B. and Herman, J. G. (2001). A genehypermethylation profile of human cancer. Cancer Res. 61, 3225-3229.
7. Venables, J. P. (2006). Unbalanced alternative splicing and its significance incancer. Bioessays 28, 378-386.
8. Esquela-Kerscher, A. and Slack, F. J. (2006). Oncomirs - microRNAs with a role incancer. Nat. Rev. Cancer6, 259-269.
9. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function.Cell 116, 281-297.
10. Calin, G. A., Dumitru, C. D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H.,Rattan, S., Keating, M., Rai, K., Rassenti, L., Kipps, T., Negrini, M., Bullrich, F. and Croce, C.M. (2002). Frequent deletions and down-regulation of micro- RNAgenes miR15and miR16at 13q14 in chronic lymphocytic leukemia. Proc. Natl. Acad. Sci.USA 99, 15524-15529.
11. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B. L., Mak, R. H., Ferrando, A. A., Downing, J. R., Jacks, T., Horvitz, H.R. and Golub, T. R. (2005). MicroRNA expression profiles classify human cancers.Nature 435, 834-838.
12. Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M., Stephens,R. M., Okamoto, A., Yokota, J., Tanaka, T., Calin, G. A., Liu, C.-G., Croce, C. M. and Harris,C. C. (2006). Unique microRNA molecular profiles in lung cancer diagnosis and prognosis.Cancer Cell 9, 189-198.
13. Ponder, B. A. (2001). Cancer genetics. Nature 411, 336-341.
14. Lpez-Bigas, N., Blencowe, B. J. and Ouzounis, C. A. (2006). Highly consistentpatterns for inherited human diseases at the molecular level. Bioinformatics 22, 269-277.
15. Lpez-Bigas, N. and Ouzounis, C. A. (2004). Genome-wide identification of geneslikely to be involved in human genetic disease. Nucleic Acids Res. 32, 3108-3114.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11357140http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11357140http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11357140http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11357140http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15181176http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16287936http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11357140http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16530703http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15944708http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12434020http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14744438http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16557279http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16547952http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=11309270http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=1881452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12754702http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14993899http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=149938997/30/2019 Distinct Patterns in the Regulation and Evolution of Human Cancer Genes
16/18
16. Tu, Z., Wang, L., Xu, M., Zhou, X., Chen, T. and Sun, F. (2006). Furtherunderstanding human disease genes by comparing with housekeeping genes and othergenes. BMC Genomics 7, 31.
17. Kondrashov, F. A., Ogurtsov, A. Y. and Kondrashov, A. S. (2004). Bioinformaticalassay of human gene morbidity. Nucleic Acids Res. 32, 1731-1737.
18. Adie, E. A., Adams, R. R., Evans, K. L., Porteous, D. J. and Pickard, B. S. (2005).Speeding disease gene discovery by sequence based candidate prioritization. BMCBioinformatics 6, 55.
19. Fortini, M. E., Skupski, M. P., Boguski, M. S. and Hariharan, I. K. (2000). A survey ofhuman disease gene counterparts in the Drosophila genome. J. Cell Biol. 150, F23-F30.
20. Pickeral, O. K., Li, J. Z., Barrow, I., Boguski, M. S., Makaowski, W. and Zhang, J.(2000). Classical oncogenes and tumor suppressor genes: a comparative genomicsperspective. Neoplasia 2, 280-286.
21. Thomas, M. A., Weston, B., Joseph, M., Wu, W., Nekrutenko, A. and Tonellato, P. J.(2003). Evolutionary dynamics of oncogenes and tumor suppressor genes: higher intensitiesof purifying selection than other genes. Mol. Biol. Evol. 20, 964-968.
22. Furney, S. J., Higgins, D. G., Ouzounis, C. A. and Lpez-Bigas, N. (2006). Structuraland functional properties of genes involved in human cancer. BMC Genomics 7, 3.
23. Sjblom, T., Jones, S., Wood, L. D., Parsons, D. W., Lin, J., Barber, T. D.,Mandelker, D., Leary, R. J., Ptak, J., Silliman, N., Szabo, S., Buckhaults, P., Farrell, C.,Meeh, P., Markowitz, S. D., Willis, J., Dawson, D., Willson, J. K. V., Gazdar, A. F., Hartigan,J., Wu, L., Liu, C., Parmigiani, G., Park, B. H., Bachman, K. E., Papadopoulos, N.,Vogelstein, B., Kinzler, K. W. and Velculescu, V. E. (2006). The consensus codingsequences of human breast and colorectal cancers. Science 314, 268-274.
24. Pruitt, K. D. and Maglott, D. R. (2001). RefSeq and LocusLink: NCBI gene-centeredresources. Nucleic Acids Res. 29, 137-140.
25. Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff,J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L.,Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M.,Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A.,Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I. and Clamp, M. (2002). The Ensemblgenome database project. Nucleic Acids Res. 30, 38-41.
26. Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J.,
Soden, R., Hayakawa, M., Kreiman, G., Cooke, M. P., Walker, J. R. and Hogenesch, J. B.(2004). A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl.Acad. Sci. USA 101, 6062-6067.
27. Eisenberg, E. and Levanon, E. Y. (2003). Human housekeeping genes are compact.Trends Genet. 19, 362-365.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=16504025http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15020709http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=15766383http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10908582http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=10935514http://www.ncbi.nlm.nih.gov/