Improving gene expression similarity measurement using
pathway-based analytic dimension
Changwon Keum
BMDRC
Accumulated gene expression data in public repository
• GEO, NCBI
# of Experiments sumitted to GEO database
14 113
665972
1489
2318
29262621
0
500
1000
1500
2000
2500
3000
3500
2001 2002 2003 2004 2005 2006 2007 2008
Years
Co
un
ts
Search the database
* Search by annotation
* Search by contents
Dataset vs. individual sample( profile)
Case Control
Microarraydatabase
search
Data set
Individual profiles
Data set level
Similarity measure
Profile level
Similarity measuresearch
Raw gene expression profile based similarity search
• Used by Cellmontage– Spearman correlation coefficient
• Limitation– Cross-platform comparison– Cross-experiment comparison
S1 S2 d
G1 5 1 4
G2 3 2 1
G3 1 3 2
G4 2 4 2
G5 4 5 1
S1 S2
G1 6 30
G2 16 25
G3 22 15
G4 20 10
G5 10 5
Pathway expression profile based similarity measure
G1 G2 G3 .
Pathway1 Pathway2 Pathway3
Pathway1 Pathway2 Pathway3
Pathway1 Pathway2 Pathway3
Step1.Convertingto pathwayExpressionprofile
Step2.SpearmanCorrelationTest
Cell type classification
Sample
Experiment
Platform
Cell type
Sam1 Exp1 Plat1 Breast
Sam2 Exp1 Plat1 Breast
Sam3 Exp1 Plat2 Breast
Sam4 Exp2 Plat3 Breast
Sam5 Exp2 Plat3 Breast
• Samples with cell type – Annotated by Cellmontage group– For 42 cell type with multiple samples
Query
Cross-platform
Cross-experiment
Classification accuracy
CGSEP vs. PEPC
Thalamus (all) Liver(Cross-platform)
Similarity score for TP?
Details of cross-experiment classification
GEFERENCE
• Reference database of gene expression– Search similar gene expression profile– Meta analysis
Marker Validation
Extract samplePatient
GEFERENCE
Gene expression profiling
Search
Matched reference individualwith Clinical information
Acknowledgement
• Jung Hoon Woo• Members at BMDRC• KFDA for funding
Thanks for your attention!!