Post on 13-Jan-2016
description
transcript
Hierarchical multilabel classification trees for gene function prediction
Leander SchietgatHendrik Blockeel
Jan StruyfKatholieke Universiteit Leuven (Belgium)
Amanda ClareUniversity of Aberystwyth (Wales)
Sašo DžeroskiJožef Stefan Institute Ljubljana (Slovenia)
Probabilistic Modeling and Machine Learning in Structural and Systems Biology
Tuusula, Finland, 17-18 June 2006
Overview
The application gene function prediction
The machine learning context hierarchical multilabel classification
Decision trees for HMC the algorithm: Clus-HMC
Experimental results
Conclusions2/21
PMSB
2006
Gene Function Prediction
Task Given a data set with descriptions of
genes and the functions they have Learn a model that can predict for a
new gene what functions it performs
Genes can have multiple functions
These functions are hierarchically organised3/21
PMSB
2006
c1 c3c2
c21 c22
Machine Learning
Classifier predicts for unseen instances the
class to which they belong learned with already classified
training examples Different techniques
decision trees support vector machines bayesian networks …4/21
PMSB
2006
Hierarchical Multilabel Classification Normal classification setting
only predicts a single class
HMC predict multiple classes at once classes are organized in a hierarchy
Hierarchy constraint instances of a class must be
instances of its superclasses5/21
PMSB
2006
Two HMC approaches
1. Learn model for each class and combine the predictions
Advantage a lot of machine learning algorithms
available
Disadvantages efficiency skewed class distributions hierarchical relationships
6/21
PMSB
2006
m1 m2 mn
c1? c2? cn?
…
…
Two HMC approaches (c’ted)2. Learn a single model that
predicts all the classes together Advantages
faster to learn easier to interpret hierarchy constraint
automatically imposed selection of features
relevant for all classes Disadvantage
may have worse predictive performance
M
[c1, c2, …, cn]
7/21
PMSB
2006
Related work on HMC Barutcuoglu et al. (2006)
learn classes separately with SVM’s and combine the predictions with Naïve Bayes
Clare (2003) extension of C4.5 decision tree method that
learns all classes together A lot of work in the area of text classification
Rousu et al. (2005) give an overview on SVM-methods that learn a single model for all classes
PMSB
2006
Gene function prediction
Text classification
Approach 1 Barutcuoglu et al. …
Approach 2 Clare …
8/21
Why decision trees?
fast to build fast to use accurate predictions easy to interpret
Gene ND HS … MF?G1 25 29 … G2 32 40 … +G3 19 0 … G4 44 45 … +… … … … …
Nitrogen depletion <= -2.74?
Heat shock > 1.28?
yes no
yes no
training examples
9/21
PMSB
2006
+++
+++
+ + ����
Positive
Positive Negative
Decision trees for HMC
The Clus system created by Jan Struyf propositional DT learner, implemented in
Java uses ideas of:
C4.5 [Quinlan93] and CART [Breiman84] Predictive Clustering Trees [Blockeel98]
Heuristic for HMC look for test that minimizes the intra-
cluster variance (= generalisation of CART)
PMSB
2006
10/21
can be used for HMC (Clus-HMC) …
… as well as binary classification (Clus-SC ~ CART)
Decision trees for HMC (c’ted)
…
2 n1
c1? c2? cn?
c1 c1,c21,c22
c2,c21,c22 c1c1,c2,c21 c1,c3
PMSB
2006
11/21
Saccharomyces cerevisiae or baker’s/brewer’s yeast
MIPS FunCat hierarchy 250 functions of yeast genes
12 datasets [Clare03] Sequence structure (seq) Phenotype growth (pheno) Secondary structure (struc) Homology search (hom) Microarray data
cellcycle, church, derisi, eisen, gasch1, gasch2, spo, expr (all)
Experiments in yeast functional genomics
1 METABOLISM
1/1 amino acid metabolism1/2 nitrogen and sulfur metabolisms
…
2 ENERGY
2/1 glycolysis and gluconeogenesis
…12/21
PMSB
2006
Example run
each leaf contains multiple classes
which classes to predict?
problem: different class frequencies
use of threshold
precision-recall curves: independent of a specific threshold
PMSB
2006
nitrogen_depletion > 5
Name A1 A2 … An 1 … 5 5/1 … 40 40/3 40/16 …G1 … … … … x x x x xG2 … … … … x x x x G3 … … … … x x G4 … … … … x x xG5 … … … … x x xG6 … … … … x x x… … … … … … … … … … … … … … … …
description functions
13/21
37C_to_25C_shock > 1.28
{1,5,5/1,3,3/5}
{5,5/1,40,40/3}
{1,5}
{40,40/3,40/16}
{5,5/1,40}
{40,40/3, 40/16}
{1,5,5/1,3,3/5}
{1,5}
{5,5/1,40}{5,5/1,40, 40/3}
{40,40/16}
{40,40/16}
{5,5/1,40}
{5,5/1,40}
40,40/3,40/16
5,5/1,40,40/3
1,5,5/1,3,3/5 p=0%
40,40/3,40/16
5,5/1,40 1,5 p=50%
40,40/16 5,5/1,40 1,5 p=100%
Predictions
Comparison of Clus-HMC with [Clare03]
Average precision-recall curves
PMSB
2006
14/21
PRECISION
= proportion of (instance, class) predictions that is correct
RECALL
= proportion of true (instance, class) cases that are predicted
Extracting rules
e.g. predictions for class 40/3 in “gasch1” dataset
IF Nitrogen_Depletion_8_h <= -2.74 AND
Nitrogen_Depletion_2_h > -1.94 AND
1point5_mM_diamide_5_min > -0.03 AND
1M_sorbitol___45_min_ > -0.36 AND
37C_to_25C_shock___60_min > 1.28
THEN 40,40/3
Precision: 0.97
Recall: 0.15
PMSB
2006
15/21
HMC vs. single classification Tree sizes
on average HMC tree: 24 nodes SC tree: 33 nodes (250 of such trees)
Time to grow trees single SC tree is grown faster than single
HMC but 250 single trees have to be built HMC on average 37 times faster
Predictive performance next slide
PMSB
2006
16/21
HMC vs. single classification Average precision-recall curves
PMSB
2006
17/21
Explanation of the results The classes are not independent
different trees for different classes actually share structure
explains some complexity reduction achieved by Clus-HMC
one class carries information on other classes
this increases the signal-to-noise ratio provides better guidance when learning the
tree (explaining good predictive performance)
avoids overfitting (explaining further reduction of tree size)
this was confirmed empirically
PMSB
2006
18/21
Conclusions
HMC decision trees are a useful tool for gene function prediction fast to learn high interpretability
Compared to regular tree learning, HMC tree learning: is even faster yields trees that:
are smaller are easier to interpret have equal or better predictive performance
PMSB
2006
19/21
Further work
Comparison to other HMC learning algorithms kernel methods studied by Rousu et al.
and Barutcuoglu et al. other suggestions are welcome!
Use more advanced hierarchy such as Gene Ontology thousands of classes, spread over 19
levels how to handle the part_of relationship?
if a function A is part-of a function B then does a gene with function A also have function B?
gene “has” function B X vs. gene “is involved” in function B
PMSB
2006
20/21
Questions?
PMSB
2006
21/21