Date post: | 06-Dec-2015 |
Category: |
Documents |
Upload: | joaoboscojares |
View: | 9 times |
Download: | 1 times |
Data Driven Cancer(DDC) or Cancer Driven Data(CDD)?
An omics puzzle to be solved for better prognosis in the disease
Alokkumar JhaPhD Student
Insight centre for data analytics
NUI Galway
Cancer is like the Mafia • Treatments have variable effect• Resistance can evolve• Doesn't work for all people• Doesn’t hit the progenitor
• Looks ordinary(almost)• Don’t play by the rules• Have competitive advantage• Allude detection
Current Scenario in cancer research and data science
Flood of Data
Expression
DNA Methylation
StructuralVariation Exome
Sequences
Copy NumberAlterations
NextGen Biology Mantra: More data is good.
Data Generation Mechanisms• There is approximately 500 petabytes of healthcare data in existence today and
that number is expected to skyrocket to more than 25,000 petabytes within the next seven years Groves, P., Kayyali, B., Knott, D. & Kuilen, S. V. (2013). The ‘Big-Data’ Revolution in Healthcare. MicKinsey & Company Report.
Data Driven Cancer (DDC)Know biomarkers for certain cancer types but difficult to
understand gene behaviour and alternation disease from the gene(Gene->Data) ,Biomarker research, Targeted therapy
Cancer Driven Data (CDD)
Molecular level information for cancer is not very well know so use existing open source data and discover cancer behaviour(Data->Cancer) 1000Geneome,GWAS
Analysis like investigating a plane crashPatient Sample 1 Patient Sample 2
Patient Sample 3 Patient Sample N
…
Data Driven Cancer
Indication of novel cancer types based on their signature and targets (Genes/ Proteins) or alternate indications
MLN7243
MET,ITGA2,CAV1,ASPH,LGALS3,F2RL1,SERPINE2,EGFR,CAV2SDC4,LMNA,TPM1,DAB2,GNG12,FN1,PTPRM,MYLK,KRT18LAMB1,ADAM9,TIMP1,ITGA3,CD44,MIR21,ITGA5,IGFBP3,NRP1S100A6,ACTN1,ANXA2,TGFB2,THBS1,FOSL1,YAP1,TJP1,EREG,PTPRFTIMP2,EPHA2,KRT8,SNAI2,CTTN,SERPINE1,LAMC2,IGFBP6F2RL2,MMP2,TGFBR2,LAMA4,TIMP3,DKK1,JAG1,AXL,AREG,PTNKRT7,LAMB3,CDH1,COL4A2,SDC1,PKP2,CLDN1,TGFA,CXCL2ITGB4,APP,KRT19,TGFB1I1,PTGS2,LAMA3,COL4A1,EDN1,PLAULOXL2,PPL,CALD1,KLF5,ITGB6,MMP1,PLAT,LOX,CCND1,CTGFTGIF1,TFPI2,TUBB6,COL1A1,CLDN7,TACSTD2,CDH2,GJA1NID1,DSP,SPARC,CDH3,GNG11,EFNA5,IL1A,RHOB,EPCAM,F11R
Signature Genes
Data Driven Cancer
PPI(protein –Protein
Interaction)
PPIs
Graph Statistics• Number of genes from your seed list: 100• Number of intermediate components: 90• Number of interactions in subnetwork created
from seed list: 351• Total components in the background network:
2086 • Total interactions in the background network:
11429
PPI based Disease
Enrichment
TOP GeneCOBAS2.0BioMyndb
DAVID
PPI databasesHPRD, BIND, IntAct, Vidal, MiNT, PID,
BioGrid
DDC
TOPgene,Cobas2.0,Biomyndb,David,Disent,Gsea
Background gene from
linkedcanDB
Algorithm defined
background genes
• Gtp cyclohydrolase i deficiency• Dystonia, dopa-responsive; drd• Epidermolysis bullosa letalis• Cirrhosis, familial• Epidermolysis bullosa, generalized atrophic
benign; gabeb
Background gene based disease
enrichment
Top Ten Diseases from this list based on p-value >0.05
• Idiopathic intracranial hypertension with papilledema• Galactorrhea-Hyperprolactinemia• Chromosome 13q trisomy• Intrahepatic cholangiocarcinoma• Isotretinoin embryopathy like syndrome
• Familial primary gastric lymphoma
LinkedcandbOMIM, TTD, CTD , clinvar, COSMIC, kegg, wikipathway, reactome etc. (32 databases)
Linkedmdbwor(22 databases)
Summery: DDC
• Requirements Integrated dataset for downstream analysis Inferred activities reflect neighbourhood of influence around a gene Can boost signal for survival analysis and assessment of mutation
impact
Proteasome Subunit
NGS(ChIP+RNA seq
Approach)
LinkedSeq(ENCODE,TCGA,SR,
GWAS,GRO-seq, 1000genome etc.)
PSMD9
Cancer Driven Data
Proteasome Subunit
PSMD9
Microarray Approach
LinkedArray(U133Plus2,U133)GEO,EBI Express
Cell line data
Tissue data
Cancer Driven Data Tissue U133plus2 U133A Total
Cancer Normal Cancer Normal
Abdomen 13 0 0 0 13
Adipose 1 59 0 12 72
Adrenal gland 14 5 0 0 19
Bladder 39 14 87 15 155
Blood 4693 639 3130 1099 8974
Brain 785 568 592 1627 3572
Breast 1954 251 2635 91 4931
Cervix 74 12 64 34 184
Colon 1294 206 256 27 1783
Endometrium 72 61 0 9 142
Esophagus 48 9 24 28 109
GIST 64 0 0 0 64
Head and neck 202 14 21 2 239
Heart 0 0 0 41 41
Kidney 573 105 366 66 1110
Liver 182 25 156 52 415
Lung 441 225 582 364 1612
Muscle 0 177 0 331 508
Myometrium 0 0 0 24 24
Ovary 859 21 341 9 1230
Pancreas 132 55 13 8 208
Prostate 308 45 244 83 680
Sarcoma 493 0 0 0 493
Skin 290 28 499 59 876
Small intestine 13 6 0 22 41
Stomach 268 57 46 18 389
Testis 4 6 184 13 207
Thyroid 62 25 44 25 156
Tongue 0 11 0 4 15
Uterus 155 12 0 24 191
Vagina 3 5 0 0 8
Vulva 21 14 0 0 35
Total 13057 2655 9284 4087 29083
LinkedTheraputics :A linked data approach towards connected omics healthcare
Probes
U133Plus2 54,613U133A 22,215
Normal Tissue
Network(U133plus2 –N+U133A-N)
Cancer Network
(U133plus2 –C+U133A-C)
Protein Synonym problem(PSMD9=RPN4=P27)
LinkedMDBWOR(22 databases)
Centrality Measures• Closeness• Betweenness• Eccentricity• Degree• Eigen Vector• Radiality• Shortest path Length• Longest path lengthWeighted
Network with PCC
Clustering of Both Networks (Community Clustering )
Topological Stability based on Tringle Counts ( Normal vs Cancer)Measure of LOSS/GAIN
Linked PathwaysLinkedPathway
KEGG,REACTOMELeading Disease by each cluster/Indirect Indications
Linked Visualization & ReportingLinkedVIZ
LinkedTheraputics :Results Survival profile of PSMD9 With LinkeDTheraputics Platform(Reactome+HPRD+IntAct+NCI)GEO dataset Cancer Typemolecular subclasses of high-grade glioma: prognosis, disease progression, and neurogenesishigh-grade gliomaexperimentally derived metastasis gene expression profile predicts recurrence and death in colon cancer patientscolon cancerdiscovery cohort for genomic predictor of response and survival following neoadjuvant taxane-anthracycline chemotherapy in breast cancerbreast cancermaqc-ii project: multiple myeloma (mm) data set multiple myelomametastasis gene expression profile predicts recurrence and death in colon cancer patients (moffitt samples)colon cancerexpression profile-defined classification of lung adenocarcinoma lung cancervalidation cohort for genomic predictor of response and survival following neoadjuvant taxane-anthracycline chemotherapy in breast cancerbreast cancerexpression data for early stage nsclc lung cancerpredective value of prognosis-related gene expression study in primary bladder cancer bladder cancergene expression data for pathological stage i-ii lung adenocarcinomas lung cancerSurvival PSMD9 based on ReactomeRelapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosislung cancer183 breast tumors from the Helsinki Univerisity Central Hospital with survival informationBreast cancerAnalysis of early primary breast cancer to identify prognostic markers and associated pathways: mRNA and miRNA profilingBreast cancerGene expression data for pathological stage I-II lung adenocarcinomas Lung cancerThe humoral immune system has a key prognostic impact in node-negative breast cancerBreast cancerGene expression of breast cancer tissue in a large population-based cohort of Swedish patientsBreast cancerHuman lung adenocarcinoma lung cancerExpression Profile-Defined Classification of Lung Adenocarcinoma lung cancerPrediction of survival in diffuse large B cell lymphoma treated with chemotherapy plus Rituximabdiffuse large B cell lymphomaSurvival Related Profile, Pathways and Transcription Factors in Ovarian Cancer Ovarian CancerSurvival PSMD9 based on NCI183 breast tumors from the Helsinki Univerisity Central Hospital with survival informationBreast cancerRelapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosislung cancerAnalysis of early primary breast cancer to identify prognostic markers and associated pathways: mRNA and miRNA profilingBreast cancerGene expression data for pathological stage I-II lung adenocarcinomas Lung cancerThe humoral immune system has a key prognostic impact in node-negative breast cancerBreast cancerGene expression of breast cancer tissue in a large population-based cohort of Swedish patientsBreast cancerHuman lung adenocarcinoma lung cancerExpression Profile-Defined Classification of Lung Adenocarcinoma lung cancerSurvival Related Profile, Pathways and Transcription Factors in Ovarian Cancer Ovarian CancerPrediction of survival in diffuse large B cell lymphoma treated with chemotherapy plus Rituximabdiffuse large B cell lymphoma
Multiple data types
• Clinical diagnosis• Treatment history• Histologic diagnosis• Pathologic report/images• Tissue anatomic site• Surgical history• Gene expression/RNA
sequence• Chromosomal copy
number• Loss of heterozygosity• Methylation patterns• miRNA expression• DNA sequence• RPPA (protein)• Subset for Mass Spec
CDD
25* forms of cancer
glioblastoma multiforme(brain)
squamous carcinoma(lung)
serouscystadenocarcinoma(ovarian)
Etc. Etc. Etc.
Biospecimen CoreResource with more than 150 Tissue Source Sites
6 Cancer GenomicCharacterization Centers
3 GenomeSequencingCenters
7 Genome Data Analysis Centers
Data Coordinating Center
Future Medicine Practice in cancer research
Chin et al. 2014,Cell
Motivation
18
TCGA has many high quality primary tumor samples,
but metastasis kills
Which primaries will metastasize?
Image courtesy of wikimedia commons
Overview of pathway-guided approach• Integrate many data sources to gain accurate view of
how genes are functioning in pathways• Predict the functional consequences of mutations by
quantifying the effect on the surrounding pathway• Use pathway signatures to implicate mutations in novel
genes to (re-)focus targeting• Identify critical “Achilles Heels” in the pathways that
distinguish a particular sub-type
Schema
Data
SGPX
Assembly
Chr location:start-end
Cytogenetic band
Disease iD:PharmaGKB
Ensemble ID
SNP Id
GO: gene ontology
COSMIC mutation
GRCh38.p2
X:139,955,72-139,965,520
Xq27.1
ENSG00000117592
rs761610936
GO:0016021Integral membrane component
PA164718516
FASTA seq Cell-lines
Kegg pathway
Molecular Mass
interaction
Proteomes
Modified reside Protein abundance cross organisms
MCF7,HeLa
hsa:347487
39944 Da
UP000005640
Glycosylation Q5JRM2
CXORF66
MNLVICVLLLSIWKNNCMTTNQTNGSSTTGDKPVESMQTKLNYLRRNLLILVGIIIMVFV FICFCYLHYNCLSDDASKAGMVKKKGIAAKSSKTSFSEAKTASQCSPETQPMLSTADKSS DSSSPERASAQS
9606.ENSP00000359571SGPX
equivalent to chromosome X open reading frame 66
COSM1249516
see alsoc.17G>T
Mutation type
same as
Peroxiredoxin 6
UniProtHPA
KEGG
UniProt
Gene cards
EMBL-EBISPDPaxDb
NCBI
dbSNP
HGNC
PharmaGKB
COSMICSwissProt
Ensemble
equivalent class
Acknowledgements
Dr . Ratnesh SahayGroup Leader, eHealth and Life sciences , Insight centre for data analytics @ NUI Galway
Dr . Prasanna VenkatramanPrincipal Investigator, Advanced centre for treatment education and research in cancer, Mumbai, India
Dr . Rangapriya SundarajanSr. Research Associate, Advanced centre for treatment education and research in cancer, Mumbai, India