Date post: | 05-Dec-2014 |
Category: |
Documents |
Upload: | christianperez |
View: | 1,500 times |
Download: | 4 times |
IntOGen, Integrative OncoGenomics for personal cancer genomes
Christian Pérez-Llamas
Biomedical Genomics LabPompeu Fabra University
Biomedical Research Park at Barcelona
IntOGen, Integrative OncoGenomics for personal cancer genomes
Christian Pérez-Llamas
Biomedical Genomics LabPompeu Fabra University
Biomedical Research Park at Barcelona
Oncogenomics data Clinical annotations Biological modules
Transcriptomic alterationsCopy Number alterationsMutations...
InternationalClassificationof Diseasesfor Oncology
FunctionalRegulatoryCancer related...
Integrative methodologies
Cancer related genes identificationCancer related modules identificationCombinations of experiments by ICDOGeneration of cancer specific modules
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
DA
TAS
TAT
IST
ICS
EX
PL
OR
AT
ION
Data management
Overview
Copy Number Analysisfrom Sanger Institute
Copy number alterationsTranscriptomic alterations Mutations
Selection of experiments
Public dataExperiment design: cancer vs normalAt least 20 samples
Annotation of tumour type
International Classification of Diseases for Oncology (ICD-O)Manual curation from publication or descriptionProgenetix already annotated with ICD-O
More than 800 experimentsMore than 25000 samplesAlmost 150 ICD-O tumor types
Data
identification of driver alterations
STEP 1
exp.
1
samples
genes
not alteredaltered
genes
experiment 1
corrected p-value
0.05 10
Cancer related genes identificationStatistics
identification of driver alterations
STEP 1
exp.
1
+
combination of experiments
STEP 2
exp.
2
exp.
3
exp.
n
Cance
r ty
pe A
samples
genes
not alteredaltered
genes
experiment 1
...
corrected p-value
0.05 10
Cancer related genes identificationStatistics
Statistics Cancer related modules identification
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
Exploration
READS
TUMOURSAMPLE
LONG LISTOF ALTERED
GENES
Cancer gene prioritization with personal genomes
MutationsINDELSDif. Expr.
biomart.intogen.org biomart.intogen.org/martservice
RESTfulWeb service
MartView
biomaRt perl python curl
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
Exploration
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
Exploration
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
Exploration
Web discovery tool Gitools
www.gitools.org
Biomart services
biomart.intogen.orgwww.intogen.org
Exploration
IntOGen: Integration and data-mining of multidimensional oncogenomic data
Gundem G, Perez-Llamas C, Jene-Sanz A, Kedzierska A,Islam A,
Deu-Pons J, Furney S and Lopez-Bigas N.
Nature Methods, 7, 92-93 (2010)
More details...
www.gitools.org
biomart.intogen.org
www.intogen.org
International Cancer Genome Consortium
50 cancer types
500 samples each cancer type
About 25000 genomes in total
Data Storage, Analysis & Management
International Cancer Genome Consortium
50 cancer types
500 samples each cancer type
About 25000 genomes in total
samples
not altered
altered
ICGC-CLL genome project
genes
Cancer genomes in the context of IntOGen
Samples
Technology
Alteration
RNA-seq
Dif. Expression:- Upregulated- Downregulated
7 CLL7 normal
(Roderic Guigo lab)
samples
not altered
altered
genes
Cancer genomes in the context of IntOGen
tumours / experiments
genes
IntOGen
corrected p-value
0.05 10
Samples
Technology
Alteration
RNA-seq
Dif. Expression:- Upregulated- Downregulated
7 CLL7 normal
(Roderic Guigo lab)
ICGC-CLL genome project
samples
not altered
altered
genes
Cancer genomes in the context of IntOGen
tumours
genes
IntOGen
corrected p-value
0.05 10
Samples
Technology
Alteration
RNA-seq
Dif. Expression:- Upregulated- Downregulated
7 CLL7 normal
(Roderic Guigo lab)
ICGC-CLL genome project
samples
not altered
altered
genes
samples
path
way
s
Cancer genomes in the context of IntOGen
corrected p-value
0.05 10
tumours
genes
IntOGen
path
way
s
tumours
corrected p-value
0.05 10
corrected p-value
0.05 10
Enrichmentanalysis
Samples
Technology
Alteration
RNA-seq
Dif. Expression:- Upregulated- Downregulated
7 CLL7 normal
(Roderic Guigo lab)
ICGC-CLL genome project
samples
not altered
altered
genes
samples
Cancer genomes in the context of IntOGen
corrected p-value
0.05 10
tumours
genes
IntOGen
tumours
corrected p-value
0.05 10
corrected p-value
0.05 10
Enrichmentanalysis
Samples
Technology
Alteration
RNA-seq
Dif. Expression:- Upregulated- Downregulated
7 CLL7 normal
(Roderic Guigo lab)
ICGC-CLL genome project
path
way
s
path
way
s
Considerations for the next version
Ethical
Technological
Ethical considerations
openaccess
controlledaccess
Data that cannot be usedto identify individuals:age, normalized gene expression, ...
Germline genomic data anddetailed clinical informationassociated to a unique individual
openaccess
controlledaccess
Data that cannot be usedto identify individuals:age, normalized gene expression, ...
Germline genomic data anddetailed clinical informationassociated to a unique individual
Ethical considerations
Technical considerations
User interfaces
Infrastructure
Web servicesBrowserGitools BiomartManagement
HadoopMap-Reduce
HadoopDFS Cascading PIG
Grid Engine Plain files MySQL MongoDBBioinformatics
software
IntOGen core
Dataimporters
Analysismanagement
Datamanagement
Experimentsmanagement
Analysisworkflows
Datamodels
Amazon / Eucalyptus
Technical considerations
User interfaces
Infrastructure
Web servicesBrowserGitools BiomartManagement
HadoopMap-Reduce
HadoopDFS Cascading PIG
Grid Engine Plain files MySQL MongoDBBioinformatics
software
IntOGen core
Dataimporters
Analysismanagement
Datamanagement
Experimentsmanagement
Analysisworkflows
Datamodels
Amazon / Eucalyptus
Genome view
NGS workflows
Web management
Technical considerations
User interfaces
Infrastructure
Web servicesBrowserGitools BiomartManagement
HadoopMap-Reduce
HadoopDFS Cascading PIG
Grid Engine Plain files MySQL MongoDBBioinformatics
software
IntOGen core
Dataimporters
Analysismanagement
Datamanagement
Experimentsmanagement
Analysisworkflows
Datamodels
Amazon / Eucalyptus
Genome view
NGS workflows
Web management
Flexibility●Different ways to access the data●Methods constantly evolving●Methods impl. different languages and infrastructure requirements
●Quantity of data increases●And also the number and complexity of calculations
Scalability
Summary
IntOGen is a novel framework for oncogenomics data integration and analysis
It integrates many tumor types and different types of alterations in a common framework
It explores the data at different levels, from individual experiments to combinations of experiments, and from individual genes to biological modules
It incorporates an intuitive web system designed to be a discovery tool for cancer researchers
I have presented some examples on how to use IntOGen and Gitools to prioritize and compare personal genomes data.
We are adapting IntOGen to store, analyze and visualize next generation sequencing data, which will allow to incorporate data from the ICGC, starting by the Chronic Lymphocytic Leukemia data.
Ethical and technological considerations has to be addressed.
Acknowledgements
Nuria López-Bigas
Gunes Gundem
Jordi Deu-Pons
Khademul Islam
Alba Jené-Sanz
Michael Schroeder
Xavier Rafael
Sophia Derdak
Abel Gonzalez-Pérez
Armand Gutierrez
Biomedical Genomics