Post on 28-Dec-2015
transcript
cceHUBSharing , Exploring and Analyzing Data
An Environment for Collaborative Cancer Research
clinical data observational & scientific data
decision supportcomputation &visualization
Ann Christine CatlinHUBbub
04-06-2011
Master plan for the Cancer Care Engineering Colorectal Cancer Study
1. Blood Sample AcquisitionSample Processing, Annotation, DistributionClinical Patient Data CollectionIU Simon Cancer Center
2. OMIC Laboratory Analysis Data & Knowledge AcquisitionXu Lab Lipidomics IU School of MedicineRaftery Lab Metabolomics PurdueRegnier Lab Glycoproteomics PurdueBindley Lab Global Proteomics PurdueTeegarden Lab Vitamin D PurdueKlaunig Lab Oxidative Stress IU School of Medicine
3. Predictive Modeling Data Synthesis & AnalysisKnowledge Acquisition Zhang Group Integrative Models PurdueSherer Population-based Models VA HospitalChen Biological Network Models IUPUI
4. Visual AnalyticsData Exploration & AnalysisKnowledge AcquisitionEbert Group PURVAC Purdue
5. Iterative Feedback & Validation CCE Research Community
molecular signatures for colorectal cancer that predict susceptibility, treatment response and ultimate treatment outcome
Sharing data, tools, analysis & knowledge
sample collection
clinical data collectionlab data
collection
statistical & modeling tools
lab analysis pipelines
A single portal : sharing data, tools, analysis & knowledge
sample collection
clinical data collection
lab data collection
statistical & modeling tools
lab analysis pipelines
cancer research groups worldwide
A single portal: sharing data, tools, analysis & knowledge
a web environment with that supports
data flowdata sharingdata analysis
for the collaborating cancer research groups of CCE
a web environment with that supports
data flowdata sharingdata analysis
for the collaborating cancer research groups of CCE
Support for clinical data
clin
ical
data
lab
ora
tory
an
aly
sis
p
red
ictiv
e m
od
elin
g
tools
Clinical Research Team and Physicians
Data contribution from clinical team
Patients Diagnosis, Treatments, Surgeries, Lifestyle, Diet, Demographics, …
Samples Collection, Processing, Protocols, Distribution, Tracking, …
Automatic Metadata Processing
Sample DataPatient Data
Clinical Metadata
cceHUB Database
Data Workflow
Support for clinical data
clin
ical
data
lab
ora
tory
an
aly
sis
p
red
ictiv
e m
od
elin
g
tools
Clinical Research Team Physicians
Patients Diagnosis, Treatments, Surgeries, Lifestyle, Diet, Demographics, …
Samples Collection, Processing, Protocols, Distribution, Tracking, …
Automatic Metadata Processing
Sample DataPatient Data
Clinical Metadata
cceHUB Database
Data Flow
Data contribution from clinical team
• nightly pull from hospital e-records• patient data collection• sample tracking• data annotation
• clinical data archive• blood sample bio-repository• patient and sample linkage
• data viewing• data search, filter & explore
• nightly pull from hospital e-records• patient data collection• sample tracking• data annotation
• clinical data archive• blood sample bio-repository• patient and sample linkage
• data viewing• data search, filter & explore
Clinical Data Flow
Clinical data : some stats
DatabaseTotal
PatientsDiagnosis
% DataLifestyle% Data
Cancer/Polyp Patients
Treatment% Data
patients 240 100% 70% 41 / 92 100%
First patient CCE001 enrolled on 04/02/2009 (the day cceHUB went live)Most recent patient CC285 enrolled on 02/15/2011Most recent data : neoadjuvant chemoradiation treatment for patient CCE156 on 04/02/2011Maximum patients enrolled on a single day 09/23/2009 = 9
# web-forms to track patient and sample data flow : 12# accesses to clinical data viewer 04/02/2009 – 05/25/2010 : > 15,000
DatabaseTotal
SamplesTotal
AliquotsSample Tracking
Web-forms# instances cceHUB used
to find missing aliquot
samples 267 5073sample processing sample transfersample storagesample distribution
52(we track sample
barcodes, location, entry person, entry
date)
Support for laboratory data
clin
ical
data
lab
ora
tory
an
aly
sis
p
red
ictiv
e m
od
elin
g
tools
Research Labs
Metabolomics, Lipidomics, Global Proteomics, Glycoproteomics, Vitamin D, Oxidative Stress, Genomics
Clinical DataLab Knowledge BaseRepository Metadata
cceHUB Database
cceHUB Lab Instrument Data Repository
Lab Workflow Knowledge
Data Upload
Sample-Dataset trackingMassive instrument- generated
datasets
Research Labs
Metabolomics, Lipidomics, Global Proteomics, Glycoproteomics, Vitamin D, Oxidative Stress, Genomics
Support for laboratory data
clin
ical
data
lab
ora
tory
an
aly
sis
p
red
ictiv
e m
od
elin
g
tools Clinical Data
Lab Knowledge BaseRepository Metadata
cceHUB Lab Instrument Data Repository
Lab Workflow Knowledge
Data Upload
Sample-Dataset trackingMassive instrument- generated
datasets
cceHUB Database
• ”knowledge base” resources (protocols, sample preparation, instruments, standards, file formats, analysis)
• annotation for lab data files• lab data files tracked to samples/patients• data files upload with provenance• metadata processing
• lab data collections • data view & explore • data access for analysis tools
• ”knowledge base” resources (protocols, sample preparation, instruments, standards, file formats, analysis)
• annotation for lab data files• lab data files tracked to samples/patients• data files upload with provenance• metadata processing
• lab data collections • data view & explore • data access for analysis tools
Laboratory Data Flow
Laboratory data : some stats
Lab #SamplesAnalyzed % Total
Files/Samples
Uploaded
Average File Size
Analysis Toolsat cceHUB
Using cceHUB tools ?
Bindley BiosciencesGlobal Proteomics
193 73% 193 files193 samples
80MB Discovery PipelineResults Visualize/Compare
500 runs through discovery pipeline
TeegardenVitamin D
225 85% 4 files225 samples
< 1MB Vitamin D-Blood Draw -Clinical Data merge for SAS
Yes, DataView
RafteryMetabolomicsGCGC-MS
230 87% 230 files230 samples
1 GB Peak classification and alignmentGCGC-MS Visual Analytics
RafteryMetabolomicsNMR
110 41 % 1 file110 samples
< 1MB
XuLipidomics
143 54% 1 file143 samples
< 1MB Lipidomics-BloodDraw-Clinical Data merge for SAS
Yes, DataView
KlaunigTEAC analysis
259 98% 1 file259 samples
< 1MB TEAC-Blood Draw-Clinical Data merge for SAS
Yes, DataView
KlaunigComet Assay
101 38% 1 file101 samples
< 1MB CometAssay-Blood Draw-Clinical Data merge for SAS
Yes DataView
KlaunigGenotyping Assay
-- POCRE, MaCH genotype imputation
(used by stat group on their own data)
RegnierGlycoproteomics
--
Support for modeling and analysis
Modeling Groups Visual Analytics
Data Synthesis & Analysis Knowledge Acquisition
clin
ical
data
lab
ora
tory
an
aly
sis
p
red
ictiv
e m
od
elin
g
tools Clinical Data
Lab Knowledge BaseRepository Metadata
cceHUB Database
cceHUB Lab Instrument Data Repository
Data to ToolsData from Tools
GCxGC MS Classification and Alignment
LC-MS Discovery Pipeline: Spectrum Deconvolution