Post on 27-Dec-2015
transcript
Copyright © 2004 by Limsoon Wong
Plan • Knowledge discovery in brief• Eg 1: Optimizing treatment of childhood ALL• Eg 2: Predicting survivals of patients with
DLBC lymphoma• Concluding remarks
Jonathan’s rules : Blue or CircleJessica’s rules : All the rest
Whose block is this?
Jonathan’s blocks
Jessica’s blocks
What is Knowledge Discovery?
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Limsoon Wong
Some classifiers/learning methods
Steps of Knowledge Discovery • Training data gathering• Feature generation
– k-grams, colour, texture, domain know-how, ...
• Feature selection– Entropy, 2, CFS, t-test, domain know-how...
• Feature integration– SVM, ANN, PCL, CART, C4.5, kNN, ...
Cop
yrig
ht ©
200
4 by
Lim
soon
Won
g
Knowledge Discovery forOptimizing Treatment
of Childhood ALL
Image credit: Yeoh et al, 2002
Childhood ALL• Major subtypes: T-ALL,
E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50,
• Diff subtypes respond differently to same Tx
• Over-intensive Tx – Development of
secondary cancers– Reduction of IQ
• Under-intensiveTx – Relapse
• The subtypes look similar
• Conventional diagnosis– Immunophenotyping– Cytogenetics– Molecular diagnostics
• Unavailable in most ASEAN countries
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Jinyan Li and Limsoon Wong
Single-Test Platform ofMicroarray & Knowledge Discovery
training data collection
feature selection
Image credit: Affymetrix
feature generation
feature integration
Conventional Tx:• intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr
Our optimized Tx:• high intensity to 10%• intermediate intensity to 40%• low intensity to 50%• costs US$100m/yr
Copyright © 2004 by Jinyan Li and Limsoon Wong
•High cure rate of 80%• Less relapse
• Less side effects• Save US$51.6m/yr
Impact
Cop
yrig
ht ©
200
4 by
Lim
soon
Won
g
Knowledge Discovery forPredicting Survival of Patients with DLBC
Lymphoma
Image credit: Rosenwald et al, 2002
Copyright © 2004 by Limsoon Wong
Diffuse Large B-Cell Lymphoma• DLBC lymphoma is the
most common type of lymphoma in adults
• Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients
DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy
• Intl Prognostic Index (IPI) – age, “Eastern Cooperative
Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ...
• Not good for stratifying DLBC lymphoma patients for therapeutic trials
Use gene-expression profiles to predict outcome of chemotherapy?
Knowledge Discovery from Gene Expression of “Extreme” Samples
“extreme”sampleselection
knowledgediscovery from gene expression
240 samples
80 samples26 long-
term survivors
47 short-term survivors
7399genes
84genes
T is long-term if S(T) < 0.3
T is short-term if S(T) > 0.7
p-value of log-rank test: < 0.0001Risk score thresholds: 0.7, 0.5, 0.3
Kaplan-Meier Plot for 80 Test Cases
(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009)
No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted
Merit of “Extreme” Samples
• Develop systems to recognize protein peptides that bind MHC molecules• Develop systems to recognize hot spots in viral antigens
Predict Epitopes,Find Vaccine Targets
• Vaccines are often the only solution for viral diseases
• Finding & developing effective vaccine targets (epitopes) is slow and expensive process
Dragon’s 10x reduction of TSS recognitionfalse positives
Recognize Functional Sites,Help Scientists
• Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts
• Data mining of bio seqs to find rules to recognize & understand functional sites
• Knowledge extraction system to process free text • extract protein names• extract interactions
Understand Proteins,Fight Diseases
• Understanding function & role of protein needs organised info on interaction pathways
• Such info are often reported in scientific paper but are seldom found in structured db
Copyright © 2004 by Limsoon Wong
Benefits of Bioinformatics• To the patient:
– Better drug, better treatment
• To the pharma:– Save time, save cost, make more $
• To the scientist:– Better science
Copyright © 2004 by Limsoon Wong
References • A. Yeoh et al, “Classification, subtype discovery, and
prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002
• A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346:1937--1947, 2002
• H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382--392