CELL INDEX DATABASE (CELLX): A WEB TOOL FOR CANCER PRECISION MEDICINE *
KEITH A. CHING1, KAI WANG1, ZHENGYAN KAN1, JULIO FERNANDEZ1, WENYAN ZHONG1, JAREK KOSTROWICKI1, TAO XIE1, ZHOU ZHU1, JEAN-FRANCOIS MARTINI2, MARIA KOEHLER2, KIM ARNDT1,
PAUL REJTO1
1Oncology Research Unit, 2Oncology Business Unit, Pfizer Global Research & Development, Pfizer Inc., 10777 Science Center Drive San Diego, CA 92121, USA Email: [email protected]
The Cell Index Database, (CELLX) (http://cellx.sourceforge.net) provides a computational framework for integrating expression, copy number variation, mutation, compound activity, and meta data from cancer cells. CELLX provides the computational biologist a quick way to perform routine analyses as well as the means to rapidly integrate data for offline analysis. Data is accessible through a web interface which utilizes R to generate plots and perform clustering, correlations, and statistical tests for associations within and between data types for ~20,000 samples from TCGA, CCLE, Sanger, GSK, GEO, GTEx, and other public sources. We show how CELLX supports precision oncology through indications discovery, biomarker evaluation, and cell line screening analysis.
1. Introduction
To support precision medicine patient selection strategies, genomics data is used to identify oncogenic drivers or dysregulated pathways in cancer cells susceptible to therapeutic intervention. Notably, efforts by The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov), the Cancer Cell Line Encyclopedia (CCLE)[1], and Sanger Wellcome Trust Genomics of Drug Sensitivity in Cancer (GDSC)[2] have generated a plethora of data and datatypes that can be used for generating patient selection hypotheses. However, multiple genomics data types such as expression, copy number variation (CNV), and mutation are large and unwieldy to manage. For the computational biologist, much time and effort can be spent to assemble an up to date table of features which can be computed on because new data are often generated frequently and incrementally. Thus, there is a need for an infrastructure to perform simple, quick, and routine analyses on multi-dimensional genomics data as well as the automated assembly of data tables for offline computation using more sophisticated algorithms. !Currently, there exist several cancer genomics databases to access expression, CNV, mutation, and integrated data as reviewed in [3]. For example, BioGPS[4] provides expression data, Tumorscape[5] contains CNV measurements, the Sanger Catalog of Somatic Mutations in Cancer (COSMIC)[6] lists mutations, and the cBio Portal[7] integrates multiple TCGA data types. Additionally, databases with compound activity data include GDSC and CCLE. Here we present a publicly available web-based informatics tool to integrate data, perform analysis, and visualize results from public as well as private internal sources to support precision medicine activities. !
* This work is supported by Pfizer, Inc.
2. Architecture
The underlying MySQL database consists of 22 tables for expression, CNV, mutation, compound, sample, meta data, RNAi, RPPA, and gene annotation data. The Perl CELLX application runs on an Apache web server. R-serve (http://www.rforge.net/Rserve/) instances generate plots and perform statistical analyses. An Apache Tomcat application server runs a custom Java servlet which bridges Perl and R by funneling Perl http requests to the R-serves and sends results back to the web server. A demo site, instructions, source code, database dumps, and data parsing / loading scripts are available at http://cellx.sourceforge.net.
3. Gene Based Search
A common starting point for indications discovery is asking where the target of interest is altered. CELLX can plot the relative expression or CNV of a gene within a dataset or across multiple compatible datasets. For instance, RNA-Seq data processed by RSEM[8] can be compared across tumors profiled not only by TCGA, but CCLE as well. CDK4 expression can be seen to have high outliers in Glioblastoma Multiforme (GBM), melanoma (SKCM), breast (BRCA), Lower Grade Glioma (LGG), and sarcomas (SARC) (Figure 1). A similar plot can be generated of CNV to identify datasets with amplifications or deletions. CELLX can chart the relationship between expression and CNV across datasets using scatter plots of expression versus CNV. A hallmark of amplification, CDK4 expression levels scale with CNV level in several datasets (Figure 2a,b). !!!!!!!!!!!!!!!!!!!!!!!!!!!!
CCLE−R
SEM
TCGA−AC
C−RS
EMTC
GA−BLCA
−RSE
MTC
GA−BLCA
−RSE
M_N
TCGA−BR
CA−R
SEM
TCGA−BR
CA−R
SEM_N
TCGA−CE
SC−R
SEM
TCGA−CE
SC−R
SEM_N
TCGA−CO
AD−R
SEM
TCGA−CO
AD−R
SEM_N
TCGA−DL
BC−R
SEM
TCGA−GBM
−RSE
MTC
GA−HN
SC−R
SEM
TCGA−HN
SC−R
SEM_N
TCGA−KICH
−RSE
MTC
GA−KICH
−RSE
M_N
TCGA−KIRC
−RSE
MTC
GA−KIRC
−RSE
M_N
TCGA−KIRP
−RSE
MTC
GA−KIRP
−RSE
M_N
TCGA−LAML−RS
EMTC
GA−LG
G−R
SEM
TCGA−LIHC
−RSE
MTC
GA−LIHC
−RSE
M_N
TCGA−LUAD
−RSE
MTC
GA−LUAD
−RSE
M_N
TCGA−LU
SC−R
SEM
TCGA−LU
SC−R
SEM_N
TCGA−MES
O−R
SEM
TCGA−OV
−RSE
MTC
GA−PAAD
−RSE
MTC
GA−PAAD
−RSE
M_N
TCGA−PC
PG−R
SEM
TCGA−PC
PG−R
SEM_N
TCGA−PR
AD−R
SEM
TCGA−PR
AD−R
SEM_N
TCGA−RE
AD−R
SEM
TCGA−RE
AD−R
SEM_N
TCGA−SA
RC−R
SEM
TCGA−SA
RC−R
SEM_N
TCGA−SK
CM−R
SEM
TCGA−SK
CM−R
SEM_N
TCGA−TH
CA−R
SEM
TCGA−TH
CA−R
SEM_N
TCGA−UC
EC−R
SEM
TCGA−UC
EC−R
SEM_N
TCGA−UC
S−RS
EM
0
5
10
15
RSEM Expression CDK4
CDK4
TumorNormal
Figure 1. RNA-Seq RSEM gene expression of CDK4 (y-axis, log2) across datasets shows higher expression in tumor vs. adjacent normal tissue. Particular groups of outliers can be seen in GBM (glioblastoma multiforme), SARC (sarcoma), SKCM (skin cutaneous melanoma), LGG (brain lower grade glioma), and cell lines (CCLE).
!!!!!!!!!!!!
4. Integrated Visualization
Mixed data types can be visualized in 2D scatter plots to look at the relationship between two datatypes on the same or different genes. For instance, expression of gene A on the x-axis can be plotted versus the CNV of gene B on the y-axis. Other plottable datatypes are protein levels for Reverse Phase Protein Arrays (RPPA), the mutation count per sample, the general amount of CNV per sample, IC50 values for compounds, and meta data. Multiple layers of data can be added to the plot to increase dimensionality. As a simple example, one can plot the expression of ERBB2 expression vs. ERBB2 CNV overlaid with ERBB2 mutations (Figure 3a) or breast cancer subtype meta data. (Figure 3b). The underlying data used to generate each plot is linked as a tab separated tsv file for downloading. !!!!!!!!!!!
5 10 15
810
1214
1618
Expression TCGA−BRCA−RSEM
ESR1
ERBB
2
BasalBasal
BasalBasal
Basal
Basal Basal
Basal
Basal
BasalBasal
BasalBasal
Basal
Basal
Basal
BasalBasalBasal
Basal
Basal
Basal
Basal
Basal
Basal
Basal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
BasalBasal
Basal
Basal
BasalBasal
Basal
BasalBasal
Basal
BasalBasal
BasalBasal
Basal
Basal
Basal
Basal
BasalBasal
Basal
BasalBasal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
Basal
Basal
BasalBasal
Basal
BasalBasal
Basal
Basal
BasalBasalBasal
Basal
Basal
BasalBasal
BasalBasal
BasalBasal
BasalBasal
Basal
Basal
Basal
Basal
Basal Basal
Basal
Basal
BasalBasal Basal
BasalBasal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
BasalBasal
BasalBasal
Basal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal
Basal
Basal
Basal BasalBasal
Basal
Basal
Basal BasalBasal
Basal
Basal
Basal
Basal
BasalBasal
Basal
Basal
Basal
Basal Basal
Basal
Basal
BasalBasalBasal
Basal
Her2
Her2 Her2
Her2Her2
Her2
Her2
Her2Her2 Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2 Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2
Her2Her2 Her2
Her2
Her2
Her2
Her2Her2Her2
Her2
Her2
Her2Her2
Her2Her2
Her2Her2
Her2
Her2
Her2
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumALumA
LumA
LumA
LumALumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumALumALumALumA LumA
LumALumA
LumALumA LumA
LumA
LumALumA
LumA
LumALumALumA
LumALumALumA
LumA
LumA
LumA
LumALumA
LumALumA
LumA
LumALumALumA
LumA
LumA
LumALumA
LumA
LumA LumA
LumA
LumALumA LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumALumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA LumALumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA LumALumA
LumA
LumA
LumA
LumA
LumA
LumALumALumALumA
LumA
LumA
LumA
LumALumA
LumA
LumALumALumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumALumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumALumALumA
LumALumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumALumA
LumALumA
LumA LumA
LumA
LumALumA
LumALumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumALumALumA
LumA
LumALumA
LumALumA
LumA
LumALumA LumA
LumA LumA
LumA
LumA
LumALumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumALumA
LumA
LumALumA
LumA
LumALumA
LumA
LumA
LumALumA LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumALumA
LumA
LumALumALumALumA
LumA
LumA
LumA
LumALumA
LumA
LumALumA LumA
LumALumA
LumALumALumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumA
LumALumALumA
LumA
LumA
LumA
LumALumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA LumA
LumA
LumALumA
LumA
LumALumA
LumA
LumALumA
LumALumA
LumALumALumA
LumA
LumA
LumA
LumALumALumA LumA
LumA
LumA
LumA
LumALumALumA
LumALumA
LumA
LumA
LumA
LumA
LumA
LumA
LumA
LumALumALumA
LumA
LumA
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumBLumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumB LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB LumB LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumBLumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB LumB
LumB
LumB
LumBLumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumB
LumB
LumB
LumB
LumBLumBLumB
LumBLumB
LumB
LumB
LumB
LumBLumBLumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB LumB
LumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumBLumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumB
LumBLumB
LumB
LumB
LumB
LumB
LumB
meta_PAM50.RNASEQ
5 10 15
810
1214
1618
Expression TCGA−BRCA−RSEM
ESR1
ERBB
2
foc foc
foc
focfoc
foc
focfoc
foc
foc
foc foc
foc
foc
foc
foc
foc
foc
focfoc
foc
foc
foc
foc
foc
foc
foc
foc
focfoc
foc
foc
foc foc
foc
foc
focfoc
focfoc
foc
foc
foc
foc
foc
focfoc
foc
foc
focfocfoc
focfoc
foc foc
foc
foc
foc
focfoc foc
foc
foc
foc
focfoc
focfoc
foc
foc
foc focfoc
foc
foc
focfocfoc
foc
focfocfoc
focfoc
foc
foc
focfoc
foc
mutantAmp >= 1Del <= −2FocalERBB2_cERBB2_m
Figure 2. Correlation of expression and CNV. CNV (y-axis in log2 diploid genome) vs. RSEM expression levels (log2) for CDK4 show that a) SARC and b) GBM datasets have a sizable population of cells overexpressing CDK4 due to amplification of the locus. Additionally, expression levels scale with CNV levels. Clear outliers from the main distribution of CNV values can help determine appropriate CNV cut offs for amplification status. In this example, samples colored red have > 1 log2 diploid genomes (i.e. >~4 copies).
Figure 3. 2D scatter plots. a) Gene expression of ESR1 (x-axis, log2) vs. ERBB2 (y-axis, log2) gene expression. ERBB2 CNV over the selected threshold of 1 (log2 diploid genome) is colored pink. Focal amplifications (< 10MB) are denoted with ‘foc’. Mutations in ERBB2 are colored green. c) Meta data for PAM50 subtype classification are colored and overlaid on the ESR1 vs. ERBB2 gene expression plot.
a) b)
11 12 13 14 15 16
−2−1
01
2
TCGA−LUSC−RSEM
CDK4
CDK4_c
11.0 11.5 12.0 12.5 13.0
−2−1
01
2
TCGA−MESO−RSEM
CDK4
CDK4_c
10 11 12 13
−2−1
01
2
TCGA−OV−RSEM
CDK4
CDK4_c
10.5 11.0 11.5 12.0
−2−1
01
2
TCGA−PAAD−RSEM
CDK4
CDK4_c
10.5 11.0 11.5 12.0 12.5
−2−1
01
2
TCGA−PCPG−RSEM
CDK4
CDK4_c
9.5 10.0 10.5 11.0 11.5 12.0
−2−1
01
2
TCGA−PRAD−RSEM
CDK4
CDK4_c
10.5 11.0 11.5 12.0 12.5 13.0
−2−1
01
2
TCGA−READ−RSEM
CDK4
CDK4_c
10 12 14 16
−2−1
01
2
TCGA−SARC−RSEM
CDK4
CDK4_c
10 11 12 13 14 15 16 17
−2−1
01
2
CCLE−RSEM
CDK4
CDK4_c
10.5 11.0 11.5 12.0 12.5 13.0 13.5
−2−1
01
2
TCGA−ACC−RSEM
CDK4
CDK4_c
9 10 11 12 13
−2−1
01
2
TCGA−BLCA−RSEM
CDK4
CDK4_c
9 10 11 12 13 14 15
−2−1
01
2
TCGA−BRCA−RSEM
CDK4
CDK4_c
10.5 11.0 11.5 12.0 12.5 13.0
−2−1
01
2
TCGA−CESC−RSEM
CDK4
CDK4_c
9 10 11 12 13 14
−2−1
01
2
TCGA−COAD−RSEM
CDK4
CDK4_c
11.0 11.5 12.0 12.5 13.0
−2−1
01
2
TCGA−DLBC−RSEM
CDK4
CDK4_c
12 14 16 18
−2−1
01
2
TCGA−GBM−RSEM
CDK4
CDK4_c
a) b)
5. Biomarker Frequency Reports
Tables of the frequency of alterations across datasets can help to prioritize indications for therapies with known biomarkers. For instance, the venn report of the frequency of CDK4 biomarker alterations within datasets shows significant frequencies of CDK4 amplification in sarcoma, gliomas, and melanoma TCGA datasets (Table 1). Cutoffs can be defined by expression level, CNV level, and/or mutation status. The co-occurrence or exclusion of 2-4 biomarkers within the same sample can also be quantified.
6. Analysis
CELLX can identify genes whose expression correlates with a gene of interest and return a table of significant genes that can be visualized via a heat map with labelled metadata. For example, a search for genes correlated with CDK4 expression in the TCGA sarcoma dataset yields ACVRL1 which is expressed by vascular endothelium and a potential anti-angiogenesis target. (Figure 4a)
sourcename CDK4_c cells_c CDK4_m cells_m cell_type tumor_type CNV% MUT%TCGA-SARC 35 171 0 0 soft_tissue Sarcoma 20.47 NATCGA-GBM 73 607 0 150 neuronal Glioblastoma multiforme 12.03 0TCGA-LGG 14 471 1 612 neuronal Brain Lower Grade Glioma 2.97 0.16TCGA-ACC 2 90 0 91 adrenal_gland Adrenocortical carcinoma 2.22 0TCGA-SKCM 7 387 8 372 skin Skin Cutaneous Melanoma 1.81 2.15TCGA-LUAD 5 510 3 491 lung Lung adenocarcinoma 0.98 0.61TCGA-STAD 2 403 1 373 stomach Stomach adenocarcinoma 0.5 0.27TCGA-BRCA 5 1074 1 777 breast Breast invasive carcinoma 0.47 0.13TCGA-BLCA 1 255 2 242 urinary_tract Bladder Urothelial Carcinoma 0.39 0.83TCGA-OV 2 569 0 476 ovary Ovarian serous cystadenocarcinoma 0.35 0TCGA-LUSC 1 487 0 233 lung Lung squamous cell carcinoma 0.21 0TCGA-COAD 0 446 2 219 large_intestine Colon adenocarcinoma 0 0.91TCGA-PRAD 0 381 0 300 prostate Prostate adenocarcinoma 0 0TCGA-THCA 0 508 0 428 thyroid Thyroid carcinoma 0 0TCGA-PAAD 0 92 1 91 pancreas Pancreatic adenocarcinoma 0 1.1TCGA-PCPG 0 175 0 0 adrenal_gland Pheochromocytoma and Paraganglioma 0 NATCGA-MESO 0 37 0 0 pleura Mesothelioma 0 NATCGA-READ 0 164 0 1 rectum Rectum adenocarcinoma 0 0TCGA-UCEC 0 533 5 248 endometrium Uterine Corpus Endometrial Carcinoma 0 2.02TCGA-KIRC 0 521 6 328 kidney Kidney renal clear cell carcinoma 0 1.83TCGA-ESCA 0 126 0 0 oesophagus Esophageal carcinoma 0 NATCGA-DLBC 0 28 0 79 haematopoietic_
and_lymphoid_tiLymphoid Neoplasm Diffuse Large B-cell Lymphoma
0 0TCGA-KICH 0 66 0 66 kidney Kidney Chromophobe 0 0TCGA-UCS 0 57 0 57 uterus Uterine Carcinosarcoma 0 0TCGA-KIRP 0 212 0 169 kidney Kidney renal papillary cell carcinoma 0 0TCGA-LAML 0 194 0 118 haematopoietic_
and_lymphoid_tiAcute Myeloid Leukemia 0 0
TCGA-LIHC 0 213 5 202 liver Liver hepatocellular carcinoma 0 2.48TCGA-HNSC 0 516 5 513 upper_aerodiges
tive_tractHead and Neck squamous cell carcinoma 0 0.97
TCGA-CESC 0 206 0 41 cervix Cervical squamous cell carcinoma and endocervical adenocarcinoma
0 0
Table 1. Frequency report for CDK4 alterations in TCGA. CDK4_c is the number of samples in which the CNV exceeds the set threshold, in this case ~4 copies. CDK4_m is the number of samples with a CDK4 mutation. The cells_c/_m columns are the number of samples for which CNV or mutation data are available, respectively. Percentages are calculated as altered / total for each individual alteration type.
!!!!!!!!!!! !
!!!!!!!
A scatter plot of CDK4 vs. ACVRL1 shows higher ACVRL1 in Dedifferentiated Liposarcomas (DDPLS) vs. Leiomyosarcomas (Figure 4b). This is consistent with a study reporting immature and intermediate blood vessels in sarcomas and quantifying tumor microvessel density that is ~3X higher in DDLPS vs. Leiomyosarcomas. [9] The plot also shows that CDK4 expression is high in DDLPS and often focally amplified which is consistent with the literature.[10] CELLX can also
metavalue min_pvalhistologic_diagnosis 2.54E-19well_differentiated_liposarcoma_primary_dx 5.62E-08residual_tumor 2.88E-05leiomyosarcoma_uterine_involvement 0.010166952gender 0.011140224histologic_subtype 0.012659498primary_tumor_lower_uterus_segment 0.030764968prior_dx 0.033818919history_of_neoadjuvant_treatment 0.043692764
10 11 12 13 14 15 16 17
68
1012
14
Expression TCGA−SARC−RSEM
CDK4
ACVR
L1
foc
foc
foc
foc
foc
foc
foc
foc
foc
foc
foc
foc
foc
focfoc
foc
foc
foc
foc
foc
foc
foc
foc
foc
focfoc
foc
foc
Ded
Ded
Ded
Ded
Ded
Ded
Ded
DedDed
Ded
Ded
Ded
Ded
Ded
DedDed
DedDed
Ded
Ded
Ded
Ded
Ded
Ded
Ded
Ded
Ded
Ded
DedDed
Ded
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
LeiLei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei
Lei Lei
Lei
Lei
LeiLei
LeiLei
Lei
Lei
Lei
Lei Lei
Lei
Lei
Lei
Myx
Myx
Und
Und
Und
Und
Und
Amp >= 1Del <= −2FocalCDK4_cmeta_histologic_diagnosis
Figure 4. Analysis of features associated with CDK4 expression. a) Heatmap of top 200 genes (columns) correlated with CDK4 expression levels in samples (rows) from the TCGA sarcoma dataset showing ACVRL1 expression correlates with CDK4 (arrows). Meta data labels for histologic diagnosis are colored in a column on the left side of the plot. b) Scatter plot of CDK4 expression versus ACVRL1 expression showing high ACVRL1 expression in dedifferentiated liposarcomas. Metavalues from a) are colored and abbreviated by the first 3 letters. Amplification of CDK4 is denoted by a violet circle. foc=focal. c) Meta data with significantly different CDK4 expression levels. Min p-value is the lowest pairwise t-test score. d) Boxplot of histologic diagnosis by CDK4 expression data used in c).
SPAR
CCO
L1A1
COL3
A1M
MP2
FBN1
COL5
A2CO
L5A1
OS9
LRP1
FSTL
1DC
NM
XRA5
CD24
8H1
9CC
T2CD
K4CT
DSP2
RNAS
E1M
RC2
MAR
CKS
CHPF
CKAP
4PC
DHG
C3TS
PAN3
1M
DM2
MFA
P2PT
K7CE
RCAM FY
NDA
B2TT
YH3
FSCN
1M
AFCO
PZ1
TMEM
214
SNX1
7NR
BP1
HDAC
7SA
R1A
SEP1
5RA
B1A
TM9S
F3C4
BPB
KCNM
B2CY
P2W
1HY
MAI
LOC3
8903
3KC
NG2
ZG16
BHM
GA2
TNNT
3SG
CGAD
H1C
SMTN
L2KA
NK4
DLK1
MEG
3XP
NPEP
2TN
FAIP
6PL
AC9
HSD1
1B2
COLG
ALT2
IGDC
C4IL
17D
RFTN
2KB
TBD1
1G
AP43
SPAT
A18
GRI
N2D
CYP2
7B1
AVIL
CR1
FHDC
1TN
FRSF
8G
PR85
C19o
rf38
KRT2
22PD
E9A
RPH3
ALPD
PNPT
GES
MED
AGLY
VE1
CCDC
8AD
AMTS
14KC
NMB4
APO
L4CD
C42E
P5PR
R5AG
AP2
GBG
T1KC
NQ1
IGF1
CCND
2PL
AGL1
ADAM
TS2
HSPA
12B
S1PR
2PA
PSS2
TFPI
SPRY
1FG
D5PC
DH12
GAL
CAR
HGAP
18G
CNT1
XRCC
6BP1
SLC3
5E3
ADA
FKBP
11SL
C18B
1TB
PL1
ABRA
CLFK
BP14
PLEK
HG1
PRKC
EZD
HHC1
FAM
45A
PTRH
D1TO
R2A
YEAT
S4Q
SOX2
GXY
LT1
NRBF
2SL
C17A
5AC
VR1B
C10o
rf32
NDST
2DH
X32
PPP2
R2D
SUFU
MAR
CH5
PYCR
1CP
XM1
LAM
A2PD
GFR
ATM
EM11
9CR
EB3L
1CO
LEC1
2AG
AP2.
AS1
MAR
CH9
MET
TL21
BM
ETTL
1TS
FMG
NG2
JDP2
ASAP
3AD
D3G
PX7
SESN
1FK
BP7
C11o
rf95
B3G
NT9
UBE2
E3C9
orf6
9NC
KAP5
LDN
AJB1
2NT
5C2
XPNP
EP1
CREM
LTV1
REPS
1SE
SN2
MPD
U1EM
L4SL
C35C
1TM
EM9B
ZDHH
C9AD
PGK
SRPX
FRS2
DYRK
2RG
L1CN
OT2
LOC6
5434
2AC
VRL1
MET
RNL
LSP1
TNFR
SF1B
FUCA
2IF
NGR1
GO
LM1
CMTM
3PG
DEF
NB1
P4HA
1LE
PREL
4LI
MK1
FAM
3CG
PX8
WBP
1LRB
MS1
NECA
P2LI
PATS
PAN1
4AL
DH18
A1O
STC
IFNG
R2C5
orf1
5G
LT8D
1DC
TD
TCGA−HS−A5N9−01ATCGA−HB−A43Z−01ATCGA−HB−A3L4−01ATCGA−IS−A3KA−01ATCGA−IS−A3K6−01ATCGA−IE−A4EH−01ATCGA−KD−A5QU−01ATCGA−K1−A42X−02ATCGA−K1−A42X−01ATCGA−HS−A5NA−01ATCGA−KD−A5QT−01ATCGA−DX−A3UC−01ATCGA−DX−A48R−01ATCGA−IE−A4EI−01ATCGA−IE−A4EK−01ATCGA−DX−A6B7−01ATCGA−DX−A3UD−01ATCGA−DX−A48U−01ATCGA−DX−A48O−01ATCGA−MO−A47P−01ATCGA−MB−A5YA−01ATCGA−K1−A42W−01ATCGA−PC−A5DL−01ATCGA−DX−A48P−01ATCGA−HS−A5N7−01ATCGA−DX−A3UF−01ATCGA−FX−A3NJ−01ATCGA−PC−A5DK−01ATCGA−DX−A48J−01ATCGA−PC−A5DO−01ATCGA−DX−A3U7−01ATCGA−DX−A3U9−01ATCGA−IE−A3OV−01ATCGA−DX−A3UE−01ATCGA−DX−A48L−01ATCGA−IF−A4AJ−01ATCGA−MJ−A68H−01ATCGA−IF−A4AK−01ATCGA−K1−A3PN−01ATCGA−K1−A3PN−02ATCGA−DX−A6BA−01ATCGA−PC−A5DP−01ATCGA−HB−A5W3−01ATCGA−DX−A3LY−01BTCGA−DX−A3U5−01ATCGA−DX−A3LS−01ATCGA−DX−A1KU−01ATCGA−DX−A1KZ−01ATCGA−DX−A23Y−01ATCGA−DX−A3LT−01ATCGA−DX−A3U6−01ATCGA−DX−A23T−01ATCGA−DX−A2J1−01ATCGA−DX−A2J4−01ATCGA−DX−A1L3−01ATCGA−DX−A3LU−01ATCGA−IF−A3RQ−01ATCGA−DX−A23V−01ATCGA−DX−A23Z−01ATCGA−DX−A2J0−01ATCGA−DX−A1L0−01ATCGA−DX−A23R−01ATCGA−DX−A3M1−01ATCGA−DX−A3LW−01ATCGA−HB−A3YV−01ATCGA−DX−A48N−01ATCGA−DX−A1KW−01ATCGA−DX−A6BH−01ATCGA−FX−A3NK−01ATCGA−RN−A68Q−01ATCGA−DX−A2IZ−01ATCGA−MO−A47R−01ATCGA−FX−A2QS−01ATCGA−FX−A3RE−01ATCGA−PC−A5DN−01ATCGA−IW−A3M5−01ATCGA−HS−A5N8−01ATCGA−DX−A23U−01ATCGA−JV−A5VF−01ATCGA−IS−A3K7−01ATCGA−DX−A1L2−01ATCGA−DX−A1L1−01ATCGA−MJ−A68J−01ATCGA−MB−A5Y9−01ATCGA−DX−A3M2−01ATCGA−FX−A3TO−01ATCGA−IE−A6BZ−01ATCGA−IE−A4EJ−01ATCGA−DX−A1KX−01ATCGA−DX−A3UB−01ATCGA−K1−A3PO−01ATCGA−FX−A48G−01ATCGA−IW−A3M4−01ATCGA−IS−A3K8−01ATCGA−DX−A1KY−01ATCGA−LI−A67I−01ATCGA−DX−A3U8−01ATCGA−JV−A5VE−01ATCGA−KD−A5QS−01ATCGA−PC−A5DM−01ATCGA−DX−A3UA−01ATCGA−IW−A3M6−01ATCGA−KF−A41W−01ATCGA−MB−A5Y8−01ATCGA−HB−A2OT−01A
TCGA−SARC−RSEM correlation
histologic_diagnosisDedifferentiated liposarcomaLeiomyosarcoma (LMS)MyxofibrosarcomaUndifferentiated Pleomorphic Sarcoma (UPS)NA
Ded
iffer
entia
ted
lipos
arco
ma
Leio
myo
sarc
oma
(LM
S)
Myx
ofib
rosa
rcom
a
Und
iffer
entia
ted
Pleo
mor
phic
Sar
com
a (U
PS)
no_v
alue
norm
al_t
issu
e10
11
12
13
14
15
16
17
TCGA−SARC−RSEM CDK4 histologic_diagnosis
min pval = 2.54443103386852e−19
CD
K4
Ded
iffer
entia
ted
lipos
arco
ma
Leio
myo
sarc
oma
(LM
S)
Myx
ofib
rosa
rcom
a
Und
iffer
entia
ted
Pleo
mor
phic
Sar
com
a (U
PS)
no_v
alue
norm
al_t
issu
e
10
11
12
13
14
15
16
17
TCGA−SARC−RSEM CDK4 histologic_diagnosis
min pval = 2.54443103386852e−19
CD
K4
CDK4 ACVRL1
c)b)
d)
a)
test for significant gene expression associated with meta data features by performing a t-test of a gene’s expression grouped by a sample’s meta data. As an example, a search for meta data with significantly different CDK4 expression in the TCGA sarcoma dataset reveals that the histologic diagnosis type has large differences in CDK4 expression levels (lowest p-val = 2.54e-19) as calculated by a pairwise t-test between all groups (Figure 4c). A box plot of the groups from histologic diagnosis shows that the CDK4 values from DDPLS are higher than other sarcomas (Figure 4d). Additional types of analyses include the identification of differentially expressed genes using t-tests of gene expression between groups defined by a gene’s expression, a gene’s mutation status, or a meta value label. For example, one could ask what genes are differentially expressed between samples with high CDK4 vs. low CDK4, samples with mutated EGFR vs. wild type EGFR, or samples annotated as male vs. female. Conversely, one can search for mutated genes which differentially express the query gene. e.g. which gene(s) mutations have higher or lower expression of EGFR than wild-type.
7. Precision Medicine
To support precision medicine, CELLX can be used to generate responder / non-responder hypotheses from cell line screening data. As a retrospective example, one can analyze the cell line sensitivity profile of Palbociclib, a CDK4/6 inhibitor under development for ER+ breast cancer. Published breast cell line IC50 values for Palbociclib[11] show a range of responses. (Figure 5a) CELLX can associate IC50 values with cell line expression, CNV, and mutation data from data sources such as CCLE. Samples divided into two groups by user defined cutoffs, in this case <1uM for responder cell lines (LOW IC50) and > 1uM for non-responder cell lines (HIGH IC50) can be used to identify genes whose expression is significantly different between responder and non-responder cells by calculating t-tests on the expression of ~20,000 genes and displaying a p-value ranked table (Figure 5b). Hierarchical clustering on the top 100 most significant genes, ordering the samples from low to high IC50, and coloring the samples by intrinsic breast subtype as defined by PAM50[12] shows that luminal B and Her2 subtypes tend to be sensitive to Palbociclib whereas cells of the basal subtype tend to be resistant (Figure 5c). Luminal A cell line subtypes were not represented in the screening set. Additionally, CELLX can dynamically generate a combination CNV / mutation table for genes which meet user defined amplification / deletion thresholds or have annotated mutations. A ranked table of p-values from Fisher’s exact test for all genes with either a CNV or mutation alteration (Table 2) highlights genes potentially associated with compound activity. While individually, the appearance of any one gene is not necessarily significant, together the combined results from the expression, CNV, and mutation associations highlight RB1, CCNE1, and to a lesser extent CDKN2A. Specifically, the expression of RB1 was low in resistant cells whereas CDKN2A and CCNE1 were high in resistant cells. Interestingly, unlike other targeted therapies where the small molecule target is often the biomarker of sensitivity (e.g. EGFR, MET, BRAF) the significant Palbociclib biomarkers represent markers of resistance. RB1 deficiency (CNV deletion, STOP mutations, and low expression) and concomitant high CDKN2A expression[13] are characteristics of the basal or triple negative breast subtype status (Figure 5c). Thus, if most of the RB1 deficient samples
belong to the triple negative subtype, the remaining luminal A/B (ER+/ERBB2+/-) and ERBB2+ segments would be enriched for possible CDK4i responders. In support of this notion, luminal B and Her2 breast subtype cell lines are mostly sensitive to CDK4i (Figure 5c). !CELLX can also confirm if the low RB1 expression found in triple negative breast cell lines also occurs in primary tissues by using the TCGA-BRCA breast invasive carcinoma dataset. CELLX can identify the genes that are most differentially expressed between RB1 high (> 9.5) vs. RB1 low (< 9.5) expressing cells using t-tests. Several of the top 100 ranking genes by p-value are related to cell cycle (RB1, CDKN2A, CCNE1) or DNA replication/repair (RFC2, RFC4, MCM5, MCM7, CDT1, NASP, POLK, POLD1, MUTYH, FANCE). Hierarchical clustering and labeling with the intrinsic subtype via PAM50[12] shows that similar to cell lines, we find that tumors with low RB1 and high CCNE1/CDKN2A expression are often of the basal subtype (Figure 6).
ZR75
30_L
OW
CAM
A1_L
OW
MDA
MB1
34VI
_LO
WH
CC
202_
LOW
UAC
C89
3_LO
WEF
M19
_LO
WEF
M19
2A_L
OW
MDA
MB3
61_L
OW
HC
C15
00_L
OW
HC
C14
19_L
OW
MDA
MB4
15_L
OW
HC
C38
_LO
WUA
CC
812_
LOW
HC
C22
18_L
OW
ZR75
1_LO
WM
DAM
B453
_LO
WT4
7D_L
OW
MC
F7_L
OW
BT20
_LO
WBT
474_
LOW
SKBR
3_LO
WKP
L1_L
OW
HC
C11
43_L
OW
MDA
MB2
31_L
OW
HC
C13
95_L
OW
HS5
78T_
LOW
CAL
51_L
OW
HC
C15
69_H
IGH
HC
C70
_HIG
HH
CC
1187
_HIG
HH
CC
1954
_HIG
HM
DAM
B468
_HIG
HH
CC
1806
_HIG
HM
DAM
B436
_HIG
HD
U44
75_H
IGH
MDA
MB1
57_H
IGH
BT54
9_H
IGH
HC
C19
37_H
IGH
SLC9A2UHRF1BP1LFREM2DEGS2KIAA1324ZG16BTBC1D30PRRT3THSD4LARGERB1DCAF5PER2RUNDC1HERC1PI3CDKN2ACMPK2CARD6LYNIFIT2IFIT3B3GNT5CCDC82FMNL2RGS2HHLA3NUPL2OAFTMEM39BC19orf66RNF19BSPATA24PLAGL1KANK4NRN1RSAD2LAMA4SLC1A3PTGS2CCL5LOC375196SOX8CLDN11PLXNA2FAM89AHOXA1MIR31HGMAML2ICAM1RTP4CASP1CARD16CTSSBTN3A3NLRC5HCP5TAPSAR1PLIN2CEBPDCCNE1TTYH3HLA.FNCOA7STX10CDCA8TOB1CENPWCDC20CSTBTMBIM6CLTCTFF1CLDN3SPDEFGPR160CA12ARL1GNPNAT1CSADFAM177A1RGL2CANT1GNSRAB11ALMCD1RSAD1SPATA20PTPN11DIP2BGNPTABSLC39A9SERF2POC1BTMTC3WHAMMRNF103SPTLC2MON2SEL1L
CCLE PD0332991
LOW <= 999 HIGH > 999
−2 0 2Row Z−Score
050
100
150
200
250
300
Color Keyand Histogram
Cou
nt
PAM50BasalHer2LumBNA
MDA
MB1
75ZR
7530
CAM
A1M
DAM
B134
VIH
CC
202
UAC
C89
3EF
M19
SUM
190
EFM
192A
MDA
MB3
61H
CC
1500
HC
C14
19M
DAM
B415
HC
C38
MC
F10A
UAC
C81
2H
CC
2218
ZR75
1M
DAM
B453
184A
1T4
7DM
CF7
BT20
MDA
MB4
35BT
474
SKBR
3KP
L1H
CC
1143
MDA
MB2
31H
CC
1395
SUM
225C
WN
HS5
78T
184B
5UA
CC
732
CAL
51H
CC
1569
CO
LO82
4H
CC
70H
CC
1187
HC
C19
54M
DAM
B468
HC
C18
06M
DAM
B436
DU
4475
MDA
MB1
57BT
549
HC
C19
37
PD0332991 data
IC50
[nM
]
5
10
20
50
100
200
500
1000IC50<100<500<25002500+
a) c)
Figure 5. a) Waterfall plot of breast cell line responses to Palbociclib (PD0332991) colored by IC50 range. b) Example output listing the p-value of genes. dm = difference in group means, statistic = t-statistic (LOW-HIGH), p.value = uncorrected p-value of two-sided, two-class t-test with equal variances. Not shown: FDR and Hochberg adjusted p-values. c) Heatmap of gene expression of top 100 genes by t-test between sensitive (IC50 < 999nM, LOW) and resistant cell lines (IC50 > 999nM, HIGH). The positions of RB1, CDKN2A, and CCNE1 are denoted with arrows. Cell lines are ordered by IC50 and colored by intrinsic breast subtype via PAM50.
b)
8. Summary
CELLX is an informatics infrastructure to manage multi-dimensional genomics datasets containing expression, copy number variation, mutation, and compound sensitivity information. A browser based web page enables an accessible way to visualize, analyze, and download the database data in a pre-formatted table suitable for offline computation. CELLX is presently
cell_name PD0332991 RESPONSE RB1 PIK3C2G CCNE1 CDKN2Apvalue 0.0004 0.0048 0.0136 0.1362MDAMB175 4 LOWZR7530 5 LOW p.P129delCAMA1 8 LOWMDAMB134VI 13 LOW p.P129delHCC202 21 LOWUACC893 24 LOWEFM19 27 LOW p.0?/-2.16SUM190 28 LOWEFM192A 42 LOWMDAMB361 44 LOW p.P129del p.M52IHCC1500 45 LOW 1.27 -2.24HCC1419 51 LOW p.P129delHCC38 64 LOW p.P129del p.0?/-2.75MDAMB415 64 LOW p.P129delMCF10A 92 LOWUACC812 96 LOW 1.26 p.P129delHCC2218 100 LOW p.P129delZR751 110 LOW p.P129delMDAMB453 115 LOW p.P129del184A1 118 LOWT47D 127 LOW p.P129delMCF7 148 LOW p.0?/-2.19BT20 177 LOW p.I388S p.P129del p.0?/-2.11MDAMB435 201 LOW p.?BT474 240 LOW -1.07SKBR3 300 LOWKPL1 327 LOW -1.97HCC1143 359 LOWMDAMB231 432 LOW p.P129del p.0?/-2.53HCC1395 472 LOW p.0?/-2.03SUM225CWN 503 LOWHS578T 524 LOW p.0?184B5 538 LOWUACC732 744 LOWCAL51 905 LOW p.P129delMDAMB468 1000 HIGH p.?/-1.89MDAMB436 1000 HIGH p.G203fs*9HCC1954 1000 HIGHHCC1937 1000 HIGH p.T738_R775del38DU4475 1000 HIGH p.0?/-1.92HCC1569 1000 HIGH 2.02HCC1187 1000 HIGHBT549 1000 HIGH p.?/-2.22MDAMB157 1000 HIGH 1.01COLO824 1000 HIGH p.?HCC70 1000 HIGH p.N480delHCC1806 1000 HIGH 1.25 p.0?/-2.25
Table 2. Association of mutations / CNV with response to Palbociclib (PD0332991). a) Ranking of genes by p-value for Fisher’s Exact test. b) Breast cell line table of selected alterations. Breast cell lines are labeled LOW (sensitive) or HIGH (resistant) and marked altered or non-altered for mutation or CNV change in each gene. Cell lines are ordered by Palbociclib IC50 value. Genes with CNV values > abs(1) or mutations from CCLE are marked as altered. CNV units are in log2 diploid genomes. (i.e. 1=~ 4 copies) CCLE mutation nomenclature: del = deletion, p.0 = whole gene deletion, ? = unknown change, fs = frameshift, * = STOP codon
GENE pval GENE pvalRB1 0.0004 ATP9B 0.0611PIK3C2G 0.0048 CAPRIN1 0.0611C19orf12 0.0136 CTIF 0.0611CCNE1 0.0136 DNM2 0.0611LOC284395 0.0136 EHF 0.0611PLEKHF1 0.0136 ELP2 0.0611POP4 0.0136 EPG5 0.0611URI1 0.0136 FANCI 0.0611VSTM2B 0.0136 HDLBP 0.0611DOCK3 0.0136 LRP6 0.0611NCOA4 0.0136 MAPK4 0.0611ADRA1A 0.0136 MCPH1 0.0611CTNNA1 0.0136 NKX6.3 0.0611TCF12 0.0136 PDCD6 0.0611CDH1 0.0459 PEBP4 0.0611ANKS1B 0.0459 PTK2B 0.0611DIP2C 0.0459 RP1L1 0.0611GSTT1 0.0595 SGK223 0.0611GSTTP2 0.0595 SMAD4 0.0611LOC391322 0.0595 ZFYVE26 0.0611D2HGDH 0.0611 MTAP 0.0932DHRS4L1 0.0611 USP32 0.0932DHRS4L2 0.0611 BCAS1 0.0932ELAC1 0.0611 TRIM37 0.0932GAL3ST2 0.0611 PIK3CA 0.0952LINC00906 0.0611 TP53 0.0952LINC01029 0.0611 AUTS2 0.0971LOC100420587 0.0611 LOC649352 0.0971LOC100505835 0.0611 MIR4650.1 0.0971LOC102724958 0.0611 MIR4650.2 0.0971LOC439994 0.0611 SIGLEC14 0.0971MIR6511B1 0.0611 FHIT 0.0971NAALADL2 0.0611 PIK3C2B 0.0971NUTM2A.AS1 0.0611 PTEN 0.1176RBFOX1 0.0611 CDKN2A 0.1362SALL3 0.0611 LOC284344 0.1560UGT2B28 0.0611 LPAR6 0.1560UQCRFS1 0.0611 NRG1 0.1560APC 0.0611 PDE4D 0.1560BTK 0.0611 EEF2K 0.1560ELN 0.0611 EPHB3 0.1560EPHB6 0.0611 ITPR1 0.1560GCNT2 0.0611 KIAA1549 0.1560HIPK2 0.0611 MAP3K19 0.1560KLK15 0.0611 MELK 0.1560NOS2 0.0611 MLKL 0.1560OMG 0.0611 MMP8 0.1560TBX22 0.0611 MYLK 0.1560ZNF142 0.0611 PLCB2 0.1560AGPAT5 0.0611 SPTA1 0.1560
a) b)
!focused on supporting oncology precision medicine through the evaluation of preconceived hypotheses as well as unbiased, data driven hypothesis generation. Though usable by the general user, CELLX is aimed at the computational biologist who desires more control over the data or wants to integrate custom data not available in public databases.
9. Data Processing
When available, summarized data from the source was used for TCGA, CCLE, and Tumorscape except for CNV calls. If Affymetrix SNP files were available, they were processed relative to the hg18 assembly using the aroma.affymetrix R package according to the methods of H. Bengtsson et al.[14] using the average baseline of 128 female HapMap samples[15] as the reference to maintain consistency and comparability across datasets. Microarray expression data from GEO, Sanger, and CCLE were GC Robust Multiarray Average normalized using R and the gcrma[16] library. Comparable to the TCGA RNA-Seq RSEM pipeline, CCLE RNA-Seq[17] data was processed using RSEM[8] on RefSeq sequences, quartile normalized to 1000, and log2 transformed. The R library genefu[18] predicted PAM50 subtypes and genefilter[19] enabled fast t-tests, F-tests, and correlations. Plots were made using CELLX and edited using Preview and Pages.
Acknowledgements
The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/ dbGaP Study Accession: phs000178.v8.p7 We thank Andy Futreal and the Wellcome Trust Sanger Institute for generously providing access to cell molecular profiling data. We also thank Adam Pavlicek and Shibing Deng for help with R and Heather Estrella for discussion and feedback.
C9o
rf53
NEK
5KI
AA08
25C
TU1
CC
NE1
CD
KN2A
CD
T1FA
NC
EPS
RC
1M
UTY
HW
DR
4IM
PA2
HAU
S5R
FC4
UQ
CR
HL
MAG
OH
POLD
1C
CH
CR
1G
TF2H
4M
RPL
2AR
NBE
AC
PEB3
KBTB
D7
FAM
63B
INPP
4BU
GC
GTM
EM14
4PO
LKFA
M11
4A2
ARID
4AFA
M17
9BR
BM43
ARL1
5U
EVLD
CR
EBR
FD
NAJ
B14
KLH
L28
ITM
2BAH
NAK
IL6S
TFA
M19
8BAR
HG
EF12
LRR
FIP1
SEL1
LC
CPG
1SP
G11
AFF1
LIM
A1FB
XL5
CAS
C4
TMED
7PJ
A2N
ASP
MC
M7
KLH
DC
3U
2AF1
DAZA
P1U
QC
RH
MR
PL37
ATP8
B1SM
IM14
FAM
214A
SPO
PLFY
CO
1SN
AP23
TMEM
87B
FBXL
3U
BL3
AKAP
11FN
DC
3AEL
F1ZC
3H13
RR
P1R
FC2
ALYR
EFR
ANBP
1M
CM
5ER
I3SN
RPA
CPE
B2LY
SMD
3FC
HO
2VW
A8C
OG
6N
HLR
C3
RC
BTB1
SUC
LA2
CO
G3
CPE
B4R
B1VP
S36
KIAA
1430
MFA
P3U
TP14
CC
TSO
CD
K17
UBR
1AT
G2B
CO
L4A3
BPST
AM2
TCGA−A2−A3XV−01ATCGA−LL−A5YM−01ATCGA−LL−A7SZ−01ATCGA−OL−A66P−01ATCGA−LL−A6FQ−01ATCGA−E9−A6HE−01ATCGA−OL−A66O−01ATCGA−V7−A7HQ−01ATCGA−E2−A576−01ATCGA−LL−A5YN−01ATCGA−AC−A3QQ−01ATCGA−AC−A62V−01ATCGA−E2−A572−01ATCGA−E9−A3X8−01ATCGA−AC−A3TM−01ATCGA−OL−A66L−01ATCGA−A2−A0CR−01ATCGA−AC−A2QJ−01ATCGA−A7−A56D−01ATCGA−AQ−A54N−01ATCGA−E2−A574−01ATCGA−LL−A6FR−01ATCGA−A2−A3XU−01ATCGA−B6−A0I1−01ATCGA−E2−A1LK−01ATCGA−AN−A04D−01ATCGA−AR−A0TU−01ATCGA−AO−A1KR−01ATCGA−AC−A62X−01ATCGA−B6−A0IQ−01ATCGA−AC−A7VC−01ATCGA−BH−A0E6−01ATCGA−OL−A5S0−01ATCGA−OL−A5D7−01ATCGA−A2−A3Y0−01ATCGA−OL−A5RW−01ATCGA−A7−A6VW−01ATCGA−A2−A04U−01ATCGA−A2−A0D0−01ATCGA−HN−A2NL−01ATCGA−E9−A22G−01ATCGA−GM−A3XL−01ATCGA−A1−A0SK−01ATCGA−E2−A1LG−01ATCGA−A2−A0YJ−01ATCGA−B6−A402−01ATCGA−A7−A13E−01ATCGA−AO−A0J6−01ATCGA−B6−A0IJ−01ATCGA−AN−A0AT−01ATCGA−AR−A5QQ−01ATCGA−AC−A2QH−01ATCGA−E2−A1LL−01ATCGA−BH−A0DL−01ATCGA−BH−A18V−06ATCGA−AO−A0J4−01ATCGA−BH−A18V−01ATCGA−E9−A244−01ATCGA−A2−A0T0−01ATCGA−A8−A08X−01ATCGA−A7−A0DA−01ATCGA−GI−A2C9−01ATCGA−AR−A0U4−01ATCGA−D8−A1XQ−01ATCGA−BH−A0BG−01ATCGA−E2−A150−01ATCGA−A8−A08R−01ATCGA−D8−A142−01ATCGA−AN−A0AR−01ATCGA−OL−A5RZ−01ATCGA−D8−A143−01ATCGA−D8−A1Y3−01ATCGA−A2−A3XW−01ATCGA−D8−A1XL−01ATCGA−E9−A1RG−01ATCGA−A7−A5ZV−01ATCGA−A2−A3XX−01ATCGA−B6−A409−01ATCGA−EW−A6SB−01ATCGA−E9−A5FL−01ATCGA−E2−A1B6−01ATCGA−E2−A1LH−01ATCGA−E9−A1NC−01ATCGA−AC−A2BK−01ATCGA−A1−A0SO−01ATCGA−AR−A0TP−01ATCGA−C8−A1HJ−01ATCGA−A2−A0D2−01ATCGA−B6−A0I6−01ATCGA−AN−A0FJ−01ATCGA−E9−A1N8−01ATCGA−E2−A158−01ATCGA−E2−A14Y−01ATCGA−D8−A1JM−01ATCGA−A8−A07R−01ATCGA−AO−A124−01ATCGA−AC−A6IW−01ATCGA−E2−A14R−01ATCGA−E2−A1LI−01ATCGA−AR−A1AQ−01ATCGA−BH−A5IZ−01ATCGA−OL−A6VO−01ATCGA−B6−A0I2−01ATCGA−OL−A66I−01ATCGA−A2−A04P−01ATCGA−E2−A573−01ATCGA−BH−A18T−01ATCGA−BH−A1F6−01ATCGA−E2−A1IP−01ATCGA−A7−A4SD−01ATCGA−AR−A1AI−01ATCGA−A2−A0YM−01ATCGA−A2−A0CM−01ATCGA−D8−A1XK−01ATCGA−A2−A04T−01ATCGA−C8−A134−01ATCGA−C8−A27B−01ATCGA−EW−A1PB−01ATCGA−A1−A0SP−01ATCGA−C8−A12K−01ATCGA−E2−A1II−01ATCGA−E2−A14N−01ATCGA−A2−A0T2−01ATCGA−EW−A1PH−01ATCGA−EW−A1P8−01ATCGA−AR−A1AH−01ATCGA−BH−A0E0−01ATCGA−A8−A07O−01ATCGA−BH−A0RX−01ATCGA−A7−A13D−01ATCGA−EW−A3U0−01ATCGA−A7−A6VV−01ATCGA−B6−A3ZX−01ATCGA−B6−A0RE−01ATCGA−AR−A1AY−01ATCGA−EW−A1OW−01ATCGA−BH−A0B9−01ATCGA−BH−A1FC−01ATCGA−A2−A3XT−01ATCGA−AO−A129−01ATCGA−BH−A0B3−01ATCGA−GM−A2DF−01ATCGA−A2−A3XS−01ATCGA−B6−A400−01ATCGA−E9−A3QA−01ATCGA−A7−A6VY−01ATCGA−LL−A5YP−01ATCGA−AO−A0JB−01ATCGA−EW−A423−01ATCGA−LL−A7T0−01ATCGA−PE−A5DC−01ATCGA−BH−A42T−01ATCGA−LL−A5YO−01ATCGA−E9−A3HO−01ATCGA−BH−A0HU−01ATCGA−AO−A0J2−01ATCGA−B6−A0RH−01ATCGA−C8−A131−01ATCGA−GM−A2DB−01ATCGA−GM−A2DD−01ATCGA−AN−A0FL−01ATCGA−A2−A4S1−01ATCGA−AO−A0JL−01ATCGA−BH−A18G−01ATCGA−B6−A1KF−01ATCGA−AO−A126−01ATCGA−AQ−A54O−01ATCGA−C8−A12Y−01ATCGA−AO−A03T−01ATCGA−LL−A5YL−01ATCGA−LL−A6FP−01ATCGA−E9−A5FK−01ATCGA−OL−A66N−01ATCGA−BH−A5J0−01ATCGA−LL−A50Y−01ATCGA−AC−A3TN−01ATCGA−BH−A42U−01ATCGA−AR−A2LJ−01ATCGA−LD−A66U−01ATCGA−A2−A0CY−01ATCGA−B6−A0IE−01ATCGA−AN−A04C−01ATCGA−A2−A0D1−01ATCGA−B6−A0IB−01ATCGA−A7−A4SF−01ATCGA−E2−A14P−01ATCGA−AN−A0AL−01ATCGA−AN−A0G0−01ATCGA−AR−A24Q−01ATCGA−AO−A12F−01ATCGA−C8−A130−01ATCGA−A2−A3XZ−01ATCGA−LL−A441−01ATCGA−B6−A0X1−01ATCGA−AR−A256−01ATCGA−A2−A04W−01ATCGA−B6−A0WX−01ATCGA−AR−A0U1−01ATCGA−EW−A1PC−01BTCGA−C8−A1HF−01ATCGA−A8−A08L−01ATCGA−D8−A13Y−01ATCGA−AN−A0FY−01ATCGA−GM−A2DH−01ATCGA−BH−A0EE−01ATCGA−C8−A26Y−01ATCGA−AC−A7VB−01ATCGA−E2−A14X−01ATCGA−AR−A0U0−01ATCGA−A7−A0CE−01ATCGA−EW−A1P4−01ATCGA−A2−A0YE−01ATCGA−AN−A0XU−01ATCGA−D8−A27F−01ATCGA−AR−A0TS−01ATCGA−A7−A4SE−01ATCGA−AR−A2LR−01ATCGA−AN−A0FX−01ATCGA−B6−A0RU−01ATCGA−C8−A12V−01ATCGA−AO−A128−01ATCGA−A8−A07U−01ATCGA−AR−A1AJ−01ATCGA−A2−A3XY−01ATCGA−A2−A04Q−01ATCGA−E9−A1ND−01ATCGA−D8−A1JL−01ATCGA−A8−A07C−01ATCGA−D8−A147−01ATCGA−BH−A0WA−01ATCGA−E2−A159−01ATCGA−BH−A0AV−01ATCGA−E2−A1LS−01ATCGA−AQ−A04L−01BTCGA−AR−A251−01ATCGA−E9−A248−01ATCGA−A2−A1G1−01ATCGA−AR−A1AR−01ATCGA−A7−A26I−01ATCGA−BH−A0BL−01ATCGA−E2−A1AZ−01ATCGA−C8−A12P−01ATCGA−AN−A0AM−01ATCGA−JL−A3YW−01ATCGA−C8−A26X−01ATCGA−A2−A4RX−01ATCGA−EW−A1P7−01ATCGA−AR−A2LH−01ATCGA−D8−A13Z−01ATCGA−BH−A18Q−01ATCGA−A2−A0SX−01ATCGA−A2−A0ST−01ATCGA−AQ−A04J−01ATCGA−B6−A0RT−01ATCGA−GM−A2DC−01ATCGA−AN−A03X−01ATCGA−B6−A0WW−01ATCGA−E9−A5UP−01ATCGA−GM−A3NW−01ATCGA−OL−A66K−01ATCGA−A2−A4S2−01ATCGA−AC−A5XU−01ATCGA−E2−A107−01ATCGA−OL−A66H−01ATCGA−A7−A3RF−01ATCGA−C8−A133−01ATCGA−E2−A1IK−01ATCGA−AO−A12H−01ATCGA−AO−A03N−01BTCGA−C8−A1HK−01ATCGA−A8−A06R−01ATCGA−BH−A0GZ−01ATCGA−AO−A0JM−01ATCGA−A2−A04X−01ATCGA−B6−A408−01ATCGA−A2−A0SW−01ATCGA−AC−A62Y−01ATCGA−E2−A1B5−01ATCGA−C8−A1HI−01ATCGA−EW−A2FS−01ATCGA−GM−A2DK−01ATCGA−B6−A0IN−01ATCGA−AO−A0J9−01ATCGA−C8−A1HL−01ATCGA−BH−A18K−01ATCGA−B6−A0WZ−01ATCGA−EW−A1P3−01ATCGA−GM−A2DL−01ATCGA−BH−A0W7−01ATCGA−E2−A2P6−01ATCGA−GM−A2D9−01ATCGA−GM−A2DA−01ATCGA−A7−A3IY−01ATCGA−E9−A54X−01ATCGA−A2−A04Y−01ATCGA−E9−A22A−01ATCGA−EW−A2FR−01ATCGA−AN−A0FT−01ATCGA−C8−A138−01ATCGA−GM−A3NY−01ATCGA−AC−A5XS−01ATCGA−E2−A570−01ATCGA−E2−A56Z−01ATCGA−AO−A125−01ATCGA−A2−A0CQ−01ATCGA−D8−A1JS−01ATCGA−AO−A12B−01ATCGA−AC−A3YI−01ATCGA−E9−A1R2−01ATCGA−A2−A0CP−01ATCGA−E2−A1B4−01ATCGA−GM−A5PV−01ATCGA−A7−A5ZX−01ATCGA−AC−A3BB−01ATCGA−A7−A426−01ATCGA−OL−A5DA−01ATCGA−EW−A3E8−01BTCGA−BH−A0HN−01ATCGA−E2−A10C−01ATCGA−AC−A3YJ−01ATCGA−B6−A0I5−01ATCGA−BH−A0BT−01ATCGA−E2−A14Z−01ATCGA−EW−A424−01ATCGA−JL−A3YX−01ATCGA−B6−A40C−01ATCGA−A7−A425−01ATCGA−B6−A40B−01ATCGA−PE−A5DD−01ATCGA−E2−A2P5−01ATCGA−B6−A401−01ATCGA−LD−A7W5−01ATCGA−BH−A6R8−01ATCGA−AO−A03R−01ATCGA−A8−A08F−01ATCGA−AO−A1KQ−01ATCGA−E2−A14O−01ATCGA−BH−A0HB−01ATCGA−A2−A0SV−01ATCGA−E2−A14W−01ATCGA−GM−A4E0−01ATCGA−A2−A0CK−01ATCGA−OL−A66J−01ATCGA−GM−A5PX−01ATCGA−E2−A1IF−01ATCGA−AC−A3QP−01ATCGA−GM−A3XG−01ATCGA−C8−A1HG−01ATCGA−E2−A1LE−01ATCGA−A2−A0T1−01ATCGA−A2−A0EQ−01ATCGA−BH−A0AY−01ATCGA−BH−A0B6−01ATCGA−EW−A1IZ−01ATCGA−AC−A3W6−01ATCGA−AC−A3W5−01ATCGA−A2−A0EP−01ATCGA−OK−A5Q2−01ATCGA−B6−A0RL−01ATCGA−B6−A0X5−01ATCGA−A2−A0D4−01ATCGA−E9−A5UO−01ATCGA−A8−A08B−01ATCGA−EW−A1OV−01ATCGA−EW−A1OZ−01ATCGA−C8−A135−01ATCGA−C8−A12X−01ATCGA−C8−A12L−01ATCGA−B6−A0IG−01ATCGA−C8−A12T−01ATCGA−BH−A0DK−01ATCGA−D8−A27N−01ATCGA−E9−A1N9−01ATCGA−A2−A4RW−01ATCGA−E9−A295−01ATCGA−BH−A18J−01ATCGA−C8−A12U−01ATCGA−EW−A6SA−01ATCGA−A2−A4S3−01ATCGA−B6−A0RS−01ATCGA−E9−A247−01ATCGA−E9−A22D−01ATCGA−BH−A209−01ATCGA−A2−A0CW−01ATCGA−A8−A06X−01ATCGA−D8−A27R−01ATCGA−BH−A0BZ−01ATCGA−D8−A1JK−01ATCGA−D8−A1Y2−01ATCGA−AO−A1KT−01ATCGA−BH−A0AW−01ATCGA−OL−A5RY−01ATCGA−LL−A73Z−01ATCGA−A8−A08I−01ATCGA−A8−A079−01ATCGA−S3−A6ZH−01ATCGA−A8−A085−01ATCGA−BH−A0DD−01ATCGA−A7−A13F−01ATCGA−A8−A06O−01ATCGA−E9−A1RB−01ATCGA−A7−A6VX−01ATCGA−EW−A1OY−01ATCGA−BH−A0BR−01ATCGA−BH−A1FN−01ATCGA−E2−A1L7−01ATCGA−D8−A1XJ−01ATCGA−A8−A07Z−01ATCGA−C8−A1HM−01ATCGA−AO−A03O−01ATCGA−A7−A0CJ−01ATCGA−BH−A0C0−01ATCGA−AN−A03Y−01ATCGA−AN−A0AJ−01ATCGA−A8−A092−01ATCGA−LQ−A4E4−01ATCGA−E9−A2JS−01ATCGA−A8−A09E−01ATCGA−A8−A07W−01ATCGA−C8−A12W−01ATCGA−AR−A24H−01ATCGA−AN−A0AK−01ATCGA−BH−A0HY−01ATCGA−AO−A03M−01BTCGA−C8−A26W−01ATCGA−E2−A15S−01ATCGA−A8−A082−01ATCGA−A8−A07L−01ATCGA−BH−A18U−01ATCGA−AN−A0XR−01ATCGA−AR−A1AS−01ATCGA−D8−A1Y1−01ATCGA−E2−A14V−01ATCGA−AC−A3EH−01ATCGA−BH−A2L8−01ATCGA−A1−A0SF−01ATCGA−AR−A1AT−01ATCGA−AC−A6NO−01ATCGA−A2−A0T5−01ATCGA−OL−A5D8−01ATCGA−OL−A5RX−01ATCGA−D8−A3Z6−01ATCGA−PE−A5DE−01ATCGA−GM−A3XN−01ATCGA−E2−A1BD−01ATCGA−B6−A0RM−01ATCGA−OL−A6VR−01ATCGA−D8−A1XZ−01ATCGA−D8−A3Z5−01ATCGA−AC−A3W7−01ATCGA−LD−A7W6−01ATCGA−BH−A42V−01ATCGA−LD−A74U−01ATCGA−BH−A28O−01ATCGA−LL−A440−01ATCGA−A7−A13H−01ATCGA−A2−A4RY−01ATCGA−D8−A4Z1−01ATCGA−AO−A0J3−01ATCGA−E2−A1LA−01ATCGA−B6−A0RG−01ATCGA−BH−A0B4−01ATCGA−A2−A25D−01ATCGA−A7−A4SA−01ATCGA−A8−A09M−01ATCGA−A8−A08P−01ATCGA−E9−A1NI−01ATCGA−A2−A04V−01ATCGA−BH−A0HA−01ATCGA−OL−A5RU−01ATCGA−AR−A5QM−01ATCGA−BH−A0DQ−01ATCGA−AR−A5QP−01ATCGA−AC−A6IV−01ATCGA−AQ−A7U7−01ATCGA−E9−A1NA−01ATCGA−A8−A09W−01ATCGA−A8−A06U−01ATCGA−AC−A3OD−01ATCGA−BH−A0HF−01ATCGA−BH−A0BC−01ATCGA−B6−A0X0−01ATCGA−A8−A08A−01ATCGA−A2−A0CS−01ATCGA−BH−A0B0−01ATCGA−A2−A0CV−01ATCGA−AO−A12G−01ATCGA−BH−A18H−01ATCGA−BH−A8FZ−01ATCGA−BH−A0B2−01ATCGA−BH−A0BQ−01ATCGA−AC−A23G−01ATCGA−B6−A0WY−01ATCGA−AC−A2FE−01ATCGA−AN−A04A−01ATCGA−A8−A09D−01ATCGA−AN−A0FD−01ATCGA−BH−A1FU−01ATCGA−E2−A108−01ATCGA−A8−A08H−01ATCGA−AR−A1AO−01ATCGA−A8−A0A7−01ATCGA−AR−A0TX−01ATCGA−BH−A0B7−01ATCGA−A2−A0T4−01ATCGA−AN−A0XW−01ATCGA−AN−A0XN−01ATCGA−BH−A0AZ−01ATCGA−EW−A1P1−01ATCGA−BH−A0H7−01ATCGA−AN−A049−01ATCGA−E9−A1QZ−01ATCGA−BH−A0BD−01ATCGA−BH−A0DI−01ATCGA−A2−A0T7−01ATCGA−E2−A1IN−01ATCGA−A8−A09T−01ATCGA−A7−A13E−01BTCGA−A7−A0DB−01CTCGA−A7−A26E−01BTCGA−A7−A26J−01BTCGA−A7−A13G−01BTCGA−AC−A3OD−01BTCGA−A7−A0DC−01BTCGA−AC−A3QQ−01BTCGA−D8−A1JF−01ATCGA−BH−A1F5−01ATCGA−BH−A1FM−01ATCGA−AO−A1KS−01ATCGA−D8−A1JA−01ATCGA−BH−A1F8−01ATCGA−D8−A1J8−01ATCGA−EW−A1IY−01ATCGA−D8−A1JT−01ATCGA−A2−A1FV−01ATCGA−AO−A1KP−01ATCGA−E2−A1L8−01ATCGA−EW−A1J3−01ATCGA−E9−A1R7−01ATCGA−AR−A252−01ATCGA−EW−A1OX−01ATCGA−A2−A0YF−01ATCGA−E9−A226−01ATCGA−BH−A1F2−01ATCGA−A2−A1FW−01ATCGA−BH−A1EX−01ATCGA−AC−A23C−01ATCGA−B6−A0RO−01ATCGA−E2−A15I−01ATCGA−D8−A27E−01ATCGA−D8−A1XV−01ATCGA−D8−A1XC−01ATCGA−EW−A1J1−01ATCGA−BH−A204−01ATCGA−A2−A0ET−01ATCGA−A8−A091−01ATCGA−A2−A0ER−01ATCGA−A8−A09V−01ATCGA−A2−A259−01ATCGA−AR−A2LE−01ATCGA−BH−A18N−01ATCGA−E2−A156−01ATCGA−A8−A0A2−01ATCGA−E9−A1R5−01ATCGA−BH−A0E7−01ATCGA−A2−A1G0−01ATCGA−E9−A1R3−01ATCGA−EW−A1PG−01ATCGA−BH−A0BO−01ATCGA−E2−A1BC−01ATCGA−BH−A1ET−01ATCGA−E9−A24A−01ATCGA−BH−A1EU−01ATCGA−A7−A26E−01ATCGA−AR−A24M−01ATCGA−D8−A27V−01ATCGA−E2−A1B1−01ATCGA−AO−A1KO−01ATCGA−AC−A2FK−01ATCGA−AR−A24W−01ATCGA−BH−A1FH−01ATCGA−A2−A0EW−01ATCGA−A2−A0ES−01ATCGA−A2−A0EM−01ATCGA−BH−A0HI−01ATCGA−BH−A0EB−01ATCGA−E2−A15M−01ATCGA−AC−A2FF−01ATCGA−A2−A25A−01ATCGA−AC−A23E−01ATCGA−GM−A2DI−01ATCGA−AR−A2LQ−01ATCGA−A2−A0YK−01ATCGA−A7−A0CG−01ATCGA−D8−A1XY−01ATCGA−AR−A24V−01ATCGA−AQ−A0Y5−01ATCGA−GI−A2C8−01ATCGA−B6−A0RN−01ATCGA−BH−A0DE−01ATCGA−E9−A1NH−01ATCGA−A1−A0SH−01ATCGA−BH−A0EA−01ATCGA−BH−A0DV−01ATCGA−A8−A07J−01ATCGA−AO−A12C−01ATCGA−D8−A1XO−01ATCGA−D8−A1XB−01ATCGA−D8−A27L−01ATCGA−E2−A15C−01ATCGA−EW−A1IX−01ATCGA−AR−A2LN−01ATCGA−A2−A0T6−01ATCGA−BH−A1FB−01ATCGA−EW−A1J2−01ATCGA−E9−A1RD−01ATCGA−AR−A255−01ATCGA−A2−A0YI−01ATCGA−A1−A0SD−01ATCGA−AR−A24T−01ATCGA−E9−A1RI−01ATCGA−D8−A1JU−01ATCGA−BH−A0H3−01ATCGA−A8−A07G−01ATCGA−AO−A0JF−01ATCGA−AR−A24O−01ATCGA−D8−A27P−01ATCGA−D8−A1JH−01ATCGA−AR−A2LM−01ATCGA−AC−A2QH−01BTCGA−A7−A26I−01BTCGA−A7−A26F−01BTCGA−A7−A13D−01BTCGA−EW−A1J6−01ATCGA−D8−A1XW−01ATCGA−B6−A0IA−01ATCGA−BH−A1ES−06ATCGA−AR−A0TV−01ATCGA−LL−A442−01ATCGA−C8−A1HO−01ATCGA−A1−A0SQ−01ATCGA−E2−A106−01ATCGA−E2−A1IG−01ATCGA−C8−A3M8−01ATCGA−D8−A27H−01ATCGA−BH−A0BW−01ATCGA−GM−A2DO−01ATCGA−AO−A0JC−01ATCGA−A1−A0SB−01ATCGA−LL−A73Y−01ATCGA−E9−A243−01ATCGA−E2−A1IH−01ATCGA−D8−A27M−01ATCGA−BH−A1F0−01ATCGA−A2−A25F−01ATCGA−AO−A03U−01BTCGA−BH−A6R9−01ATCGA−BH−A1FE−06ATCGA−BH−A208−01ATCGA−E9−A22E−01ATCGA−E9−A1NF−01ATCGA−EW−A2FW−01ATCGA−AO−A0JD−01ATCGA−B6−A0I9−01ATCGA−AO−A03P−01ATCGA−E2−A14T−01ATCGA−AR−A0U2−01ATCGA−BH−A0HL−01ATCGA−E2−A109−01ATCGA−EW−A1PA−01ATCGA−BH−A203−01ATCGA−E9−A1RH−01ATCGA−AR−A0TY−01ATCGA−D8−A1JG−01BTCGA−E2−A1B0−01ATCGA−A8−A09G−01ATCGA−BH−A1EN−01ATCGA−AN−A0AS−01ATCGA−A8−A07B−01ATCGA−AN−A046−01ATCGA−B6−A0IK−01ATCGA−EW−A6SD−01ATCGA−C8−A137−01ATCGA−A8−A094−01ATCGA−D8−A1JP−01ATCGA−A7−A4SB−01ATCGA−C8−A275−01ATCGA−A1−A0SI−01ATCGA−GM−A2DN−01ATCGA−AC−A5EH−01ATCGA−OL−A5D6−01ATCGA−LL−A740−01ATCGA−E9−A1N6−01ATCGA−C8−A278−01ATCGA−AO−A0JE−01ATCGA−A2−A0CX−01ATCGA−A8−A09X−01ATCGA−A7−A0D9−01ATCGA−B6−A1KN−01ATCGA−C8−A27A−01ATCGA−E2−A105−01ATCGA−C8−A1HN−01ATCGA−AR−A0TQ−01ATCGA−EW−A2FV−01ATCGA−C8−A273−01ATCGA−D8−A1XF−01ATCGA−D8−A1XT−01ATCGA−A7−A26G−01ATCGA−A7−A26F−01ATCGA−BH−A0H9−01ATCGA−D8−A27W−01ATCGA−BH−A18R−01ATCGA−E9−A228−01ATCGA−EW−A1P0−01ATCGA−A7−A3J0−01ATCGA−BH−A1FR−01ATCGA−AO−A0J5−01ATCGA−E9−A1N3−01ATCGA−A2−A4S0−01ATCGA−B6−A0X4−01ATCGA−E9−A245−01ATCGA−C8−A12Z−01ATCGA−AR−A0TW−01ATCGA−B6−A0WV−01ATCGA−D8−A1JC−01ATCGA−A2−A0YH−01ATCGA−AR−A254−01ATCGA−E2−A14S−01ATCGA−S3−A6ZF−01ATCGA−A2−A0YT−01ATCGA−E9−A1R4−01ATCGA−BH−A0E2−01ATCGA−C8−A3M7−01ATCGA−AC−A6IX−01ATCGA−AC−A6IX−06ATCGA−E9−A22B−01ATCGA−A8−A08J−01ATCGA−E9−A227−01ATCGA−E2−A1L9−01ATCGA−A2−A0EN−01ATCGA−AN−A0FN−01ATCGA−E9−A1NG−01ATCGA−A2−A0CZ−01ATCGA−E2−A1IJ−01ATCGA−A7−A5ZW−01ATCGA−C8−A12N−01ATCGA−A2−A04N−01ATCGA−AR−A5QN−01ATCGA−D8−A73U−01ATCGA−AC−A2QI−01ATCGA−A2−A0CO−01ATCGA−AR−A1AM−01ATCGA−BH−A0HK−01ATCGA−BH−A0BP−01ATCGA−BH−A0DG−01ATCGA−BH−A0DX−01ATCGA−E2−A14Q−01ATCGA−B6−A0RQ−01ATCGA−E2−A14U−01ATCGA−AN−A0XS−01ATCGA−A1−A0SG−01ATCGA−EW−A1P6−01ATCGA−MS−A51U−01ATCGA−B6−A0X7−01ATCGA−D8−A73X−01ATCGA−E9−A1R0−01ATCGA−A2−A0EO−01ATCGA−AN−A0FS−01ATCGA−AO−A0J8−01ATCGA−B6−A0IP−01ATCGA−BH−A0H5−01ATCGA−A2−A0SY−01ATCGA−E2−A1IO−01ATCGA−A7−A4SC−01ATCGA−BH−A0H6−01ATCGA−A2−A0EX−01ATCGA−HN−A2OB−01ATCGA−BH−A8G0−01ATCGA−B6−A0IH−01ATCGA−AC−A2FO−01ATCGA−B6−A1KI−01ATCGA−E2−A153−01ATCGA−A8−A06P−01ATCGA−E2−A3DX−01ATCGA−A8−A096−01ATCGA−AR−A1AU−01ATCGA−BH−A0HQ−01ATCGA−D8−A141−01ATCGA−A8−A093−01ATCGA−B6−A0RI−01ATCGA−OL−A5RV−01ATCGA−A7−A3J1−01ATCGA−E9−A1N5−01ATCGA−AR−A2LO−01ATCGA−A2−A3KC−01ATCGA−AC−A3HN−01ATCGA−BH−A0E9−01BTCGA−BH−A0BJ−01ATCGA−D8−A146−01ATCGA−D8−A145−01ATCGA−A2−A0YD−01ATCGA−AC−A2B8−01ATCGA−BH−A0W5−01ATCGA−A2−A0YL−01ATCGA−A8−A08Z−01ATCGA−D8−A1XM−01ATCGA−A2−A1FZ−01ATCGA−BH−A18M−01ATCGA−E2−A10F−01ATCGA−BH−A201−01ATCGA−B6−A0WT−01ATCGA−E9−A229−01ATCGA−B6−A0I8−01ATCGA−AQ−A1H2−01ATCGA−D8−A1JI−01ATCGA−D8−A1JN−01ATCGA−BH−A0BM−01ATCGA−A2−A1G6−01ATCGA−C8−A12M−01ATCGA−AC−A23H−01ATCGA−AN−A0FZ−01ATCGA−AN−A0FW−01ATCGA−AR−A250−01ATCGA−AN−A041−01ATCGA−AO−A0J7−01ATCGA−A8−A06Z−01ATCGA−A8−A06Q−01ATCGA−A2−A0SU−01ATCGA−A2−A25E−01ATCGA−E2−A1IE−01ATCGA−D8−A1X6−01ATCGA−E9−A249−01ATCGA−E9−A1RA−01ATCGA−A8−A09I−01ATCGA−E2−A15A−06ATCGA−E2−A15A−01ATCGA−AN−A0FV−01ATCGA−E2−A1LB−01ATCGA−BH−A0DZ−01ATCGA−A2−A0EY−01ATCGA−AR−A0U3−01ATCGA−E2−A154−01ATCGA−BH−A0W3−01ATCGA−AN−A0FK−01ATCGA−BH−A18L−01ATCGA−BH−A0BA−01ATCGA−A7−A0DB−01ATCGA−BH−A0HP−01ATCGA−BH−A1EY−01ATCGA−C8−A12Q−01ATCGA−D8−A1J9−01ATCGA−BH−A1EV−01ATCGA−A8−A07I−01ATCGA−EW−A6S9−01ATCGA−A8−A076−01ATCGA−A8−A081−01ATCGA−A7−A2KD−01ATCGA−AO−A03L−01ATCGA−S3−A6ZG−01ATCGA−A8−A095−01ATCGA−AN−A0XO−01ATCGA−A8−A086−01ATCGA−A8−A09B−01ATCGA−AR−A1AK−01ATCGA−D8−A27I−01ATCGA−A8−A0A4−01ATCGA−A8−A08S−01ATCGA−EW−A1IW−01ATCGA−AN−A0XL−01ATCGA−B6−A0RV−01ATCGA−A1−A0SN−01ATCGA−D8−A1JJ−01ATCGA−AC−A2FM−01ATCGA−AR−A1AW−01ATCGA−E9−A2JT−01ATCGA−AR−A0TT−01ATCGA−BH−A0BF−01ATCGA−A2−A0CL−01ATCGA−AO−A12D−01ATCGA−AR−A24U−01ATCGA−BH−A0GY−01ATCGA−A8−A075−01ATCGA−E9−A1RF−01ATCGA−BH−A0DS−01ATCGA−A1−A0SM−01ATCGA−B6−A0IO−01ATCGA−AO−A03V−01ATCGA−E2−A15H−01ATCGA−E2−A15K−01ATCGA−A8−A084−01ATCGA−A8−A09C−01ATCGA−AR−A1AN−01ATCGA−BH−A0E1−01ATCGA−A8−A07E−01ATCGA−AR−A24S−01ATCGA−D8−A27G−01ATCGA−BH−A0C3−01ATCGA−D8−A1XS−01ATCGA−AR−A0TZ−01ATCGA−B6−A0IM−01ATCGA−BH−A1EW−01ATCGA−D8−A1Y0−01ATCGA−E9−A22H−01ATCGA−A8−A07P−01ATCGA−AO−A12E−01ATCGA−EW−A1PF−01ATCGA−A2−A25B−01ATCGA−E2−A10A−01ATCGA−AR−A24R−01ATCGA−A7−A26H−01ATCGA−AR−A24P−01ATCGA−A2−A25C−01ATCGA−E9−A1N4−01ATCGA−AR−A1AV−01ATCGA−E2−A10E−01ATCGA−AO−A0JA−01ATCGA−E2−A15T−01ATCGA−A2−A1G4−01ATCGA−A2−A3KD−01ATCGA−D8−A1X7−01ATCGA−E9−A1RE−01ATCGA−A8−A07F−01ATCGA−C8−A274−01ATCGA−BH−A1FE−01ATCGA−E2−A15E−01ATCGA−E2−A15E−06ATCGA−BH−A1FL−01ATCGA−E2−A15G−01ATCGA−A2−A0YC−01ATCGA−EW−A1J5−01ATCGA−AC−A2FB−01ATCGA−BH−A0W4−01ATCGA−A8−A099−01ATCGA−D8−A1JE−01ATCGA−D8−A1XU−01ATCGA−E2−A10B−01ATCGA−A8−A08C−01ATCGA−BH−A0BS−01ATCGA−A8−A07S−01ATCGA−E2−A15J−01ATCGA−A2−A0D3−01ATCGA−D8−A73W−01ATCGA−D8−A1XG−01ATCGA−A8−A08T−01ATCGA−E9−A3Q9−01ATCGA−GM−A2DM−01ATCGA−D8−A1JB−01ATCGA−A8−A0A6−01ATCGA−AR−A24X−01ATCGA−E2−A15D−01ATCGA−BH−A0DT−01ATCGA−AO−A12A−01ATCGA−BH−A0HO−01ATCGA−E2−A15K−06ATCGA−A8−A083−01ATCGA−B6−A0IC−01ATCGA−BH−A28Q−01ATCGA−D8−A27K−01ATCGA−AQ−A1H3−01ATCGA−AR−A1AL−01ATCGA−A7−A0CH−01ATCGA−A7−A13G−01ATCGA−AC−A2FG−01ATCGA−BH−A0DO−01BTCGA−A8−A0A1−01ATCGA−AN−A0XP−01ATCGA−EW−A1PE−01ATCGA−E2−A1L6−01ATCGA−E2−A15L−01ATCGA−A7−A26J−01ATCGA−C8−A1HE−01ATCGA−E2−A1IU−01ATCGA−D8−A1X8−01ATCGA−A8−A09Z−01ATCGA−A8−A09K−01ATCGA−E2−A155−01ATCGA−A8−A06T−01ATCGA−AC−A2BM−01ATCGA−B6−A1KC−01BTCGA−E2−A15R−01ATCGA−AO−A0JI−01ATCGA−A8−A09Q−01ATCGA−AN−A0XT−01ATCGA−BH−A1FD−01ATCGA−BH−A0B8−01ATCGA−C8−A12O−01ATCGA−BH−A202−01ATCGA−A7−A0DC−01ATCGA−A8−A0AB−01ATCGA−BH−A0H0−01ATCGA−A2−A04R−01ATCGA−E2−A15O−01ATCGA−A2−A0CT−01ATCGA−B6−A2IU−01ATCGA−BH−A0DH−01ATCGA−BH−A0BV−01ATCGA−B6−A0WS−01ATCGA−AO−A0JJ−01ATCGA−BH−A0B1−01ATCGA−E2−A152−01ATCGA−D8−A1JD−01ATCGA−A8−A09N−01ATCGA−A2−A0EV−01ATCGA−EW−A1P5−01ATCGA−BH−A0C7−01BTCGA−D8−A1XD−01ATCGA−EW−A1PD−01ATCGA−A2−A0T3−01ATCGA−BH−A1ES−01ATCGA−AR−A1AP−01ATCGA−A1−A0SJ−01ATCGA−BH−A0HX−01ATCGA−BH−A0DP−01ATCGA−D8−A1X5−01ATCGA−AR−A2LK−01ATCGA−A8−A0A9−01ATCGA−AR−A2LL−01ATCGA−BH−A0HW−01ATCGA−A2−A1FX−01ATCGA−A8−A08G−01ATCGA−EW−A6SC−01ATCGA−AN−A0FF−01ATCGA−A2−A0YG−01ATCGA−BH−A18P−01ATCGA−BH−A0B5−01ATCGA−BH−A0C1−01BTCGA−BH−A18F−01ATCGA−A1−A0SE−01ATCGA−A8−A097−01ATCGA−C8−A26Z−01ATCGA−C8−A132−01ATCGA−AR−A1AX−01ATCGA−D8−A27T−01ATCGA−BH−A18I−01ATCGA−E9−A1RC−01ATCGA−AR−A24K−01ATCGA−AR−A24N−01ATCGA−D8−A1XA−01ATCGA−A7−A3IZ−01ATCGA−A7−A0CD−01ATCGA−A8−A0AD−01ATCGA−A8−A090−01ATCGA−C8−A26V−01ATCGA−BH−A1FJ−01ATCGA−AQ−A04H−01BTCGA−A2−A0CU−01ATCGA−BH−A1FG−01ATCGA−AR−A24Z−01ATCGA−A8−A06Y−01ATCGA−E9−A1R6−01ATCGA−A2−A0EU−01ATCGA−BH−A0AU−01ATCGA−E2−A15F−01ATCGA−A8−A09R−01ATCGA−AR−A24L−01ATCGA−BH−A1EO−01ATCGA−BH−A0EI−01ATCGA−A8−A09A−01ATCGA−D8−A1X9−01ATCGA−E9−A1NE−01ATCGA−A8−A08O−01ATCGA−A8−A06N−01ATCGA−D8−A1XR−01ATCGA−AR−A0TR−01ATCGA−AO−A0JG−01ATCGA−D8−A140−01ATCGA−E2−A1IL−01ATCGA−E2−A15P−01ATCGA−AN−A0XV−01ATCGA−B6−A0RP−01ATCGA−BH−A18S−01A
TCGA−BRCA−RSEM
−5 0 5Column Z−Score
020
0040
0060
0080
0010
000
1200
014
000
Color Keyand Histogram
Cou
nt
PAM50.RNASEQBasalHer2LumALumBNA
CCNE1 CDKN2A
RB1
Figure 6. TCGA Breast differential gene expression between RB1 high and RB1 low expressing tumors. Hierarchical clustering of the top 100 genes in a heat map colored by breast subtype as determined by PAM50. Positions of CDKN2A, CCNE1, and RB1 are denoted by arrows.
References !1. Barretina J, et.al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012 Mar 28;483(7391):603-7. PMID: 22460905 2. Yang W, et.al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013 Jan;41:D955-61 PMID: 23180760 3. Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes Dev. 2011 Mar 15;25(6):534-55. doi: 10.1101/gad.2017311. PMID:21406553 4. Wu C, Macleod I, Su AI. BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic Acids Res. 2013 Jan;41:D561-5 PMID: 23175613 5. Beroukhim R, et.al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010 Feb 18;463(7283):899-905. PMID: 20164920 6. Forbes SA, et.al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011 Jan;39:D945-50. PMID: 20952405 7. Cerami E, et.al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2012 May;2(5):401-4. PMID: 22588877 8. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323. PMID:21816040 9. Baneth V, Raica M, Cîmpean AM. Rom J Assessment of angiogenesis in soft-tissue tumors. Morphol Embryol. 2005;46(4):323-7. PMID:16688371 10. Binh MB, Sastre-Garau X, Guillou L, de Pinieux G, Terrier P, Lagacé R, Aurias A, Hostein I, Coindre JM. MDM2 and CDK4 immunostainings are useful adjuncts in diagnosing well-differentiated and dedifferentiated liposarcoma subtypes: a comparative analysis of 559 soft tissue neoplasms with genetic data. Am J Surg Pathol. 2005 Oct;29(10):1340-7. PMID:16160477 11. Finn RS, Dering J, Conklin D, Kalous O, Cohen DJ, Desai AJ, Ginther C, Atefi M, Chen I, Fowst C, Los G, Slamon DJ. PD 0332991, a selective cyclin D kinase 4/6 inhibitor, preferentially inhibits proliferation of luminal estrogen receptor-positive human breast cancer cell lines in vitro. Breast Cancer Res. 2009;11(5):R77. PMID:19874578 12. Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009 Mar 10;27(8):1160-7. doi: 10.1200/JCO.2008.18.1370. Epub 2009 Feb 9. PMID:19204204 13. Knudsen ES, Knudsen KE. Tailoring to RB: tumour suppressor status and therapeutic response. Nat Rev Cancer. 2008 Sep;8(9):714-24. doi: 10.1038/nrc2401 PMID:19143056 14. Bengtsson H, Irizarry R, Carvalho B, Speed TP (2008) Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24: 759– 767. PMID:18204055 15. HapMap data available from http://hapmap.ncbi.nlm.nih.gov/downloads/raw_data/hapmap3_affy6.0/ Originally obtained from Affymetrix, but no longer available from that source. 16. Wu J and Gentry RIwcfJMJ. gcrma: Background Adjustment Using Sequence Information. R package version 2.36.0. http://www.bioconductor.org/packages/release/bioc/html/gcrma.html 17. CCLE RNA-Seq data obtained from The Cancer Genomics Hub (CGHub) https://cghub.ucsc.edu/ 18. Haibe-Kains B, Schroeder M, Bontempi G, Sotiriou C and Quackenbush J (2014). genefu: Relevant Functions for Gene Expression Analysis, Especially in Breast Cancer.. R package version 1.14.0, http://compbio.dfci.harvard.edu. 19. Gentleman R, Carey V, Huber W and Hahne F. genefilter: genefilter: methods for filtering genes from microarray experiments. R package version 1.46.1. http://www.bioconductor.org/packages/release/bioc/html/genefilter.html