Post on 27-Jan-2015
description
transcript
Enzyme Annotation using Conditional RankingAlgorithms
Michiel Stock
Faculty of Bioscience EngineeringGhent University
6th of June 2014
KERMIT
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 1 / 14
Outline
1 From Structure to Function
2 Ranking Enzymes
3 Learning to Rank
4 Results
5 Conclusion
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 2 / 14
From Structure to Function
What bioinformatics is (often) about
Bioinformatics for proteins
Using biological knowledge and statistical models to map informationfrom a low level (e.g. protein structure) to a higher level (e.g. molecularfunction).
Sequence Structure Function
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 3 / 14
From Structure to Function
The data set
Data:
two data sets of ca. 1600enzymes with 21different functions
five different similaritymeasures of the activesite
active site of anenzyme:
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 4 / 14
From Structure to Function
The enzyme commission number
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 5 / 14
Ranking Enzymes
Quantifying enzyme function similarity
EC 2.7.7.12
EC 4.2.3.90
EC ?.?.?.?EC 2.7.7.34
EC 4.6.1.11
EC 2.7.1.12
1
0
0
3
0
2
02
0
zondag, 13 mei 2012
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 6 / 14
Ranking Enzymes
Conditional ranking of enzymes
Ranking enzymes
For an unannotated enzyme, rank the annotated enzymes so that thetop has a similar function w.r.t. the query.
Minimize ranking error:number of switches neededfor a perfect ranking
Example: suppose one has anenzyme with unknownfunction: EC ?.?.?.?
1 EC 2.7.7.12
2 EC 2.7.7.12
3 EC 2.7.7.34
4 EC 2.7.1.12
5 EC 2.7.7.34
6 EC 4.2.3.90
7 EC 1.14.11
8 EC 4.6.1.11
⇒ EC 2.7.7.12
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 7 / 14
Learning to Rank
Learning the catalytic similarity
pair of enzymes:e = (v , v ′)
label ye ∈ {0, 1, 2, 3, 4}:the catalytic similarity
five different structuralsimilarities: Kφ(v , v ′)
A B C D E F GA 4 4 0 0 0B 4 4 0 0 0C 0 0 4 2 1D 0 0 2 4 3E 0 0 1 3 4FG
Enzymes
Enzymes
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 8 / 14
Learning to Rank
Pairwise features with the Kronecker product
( , )
( , )( , )
( , )
( , )
( , )
Object kernel Pairwise kernel Learning!algorithm
…
SVM!RLS!…
The Kronecker kernel is defined as:
KΦ((v , v ′), (v , v ′)) = KΦ(e, e) = Kφ(v , v)Kφ(v ′, v ′)
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 9 / 14
Learning to Rank
Basic pairwise models
Use training data T = {(e, ye)} to fit a model:
h(e) =∑e∈T
aeKΦ(e, e).
The function h ∈ H can be fitted using the following optimisation problem:
A(T ) = arg minh∈H
L(h,T ) + λ||h||2H.
For conditional ranking we choose an approximation of the rank loss.
This problem has time complexity O(n3), with n the number of enzymes.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 10 / 14
Results
Qualitative improvement in the enzyme similarities
Example for CavBase structural similarity:
Ground truthSupervisedUnsupervised
Lighter color = higher similarity
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 11 / 14
Results
Improvement of the ROC curves
ROC curves for the five different structural similarity measures:unsupervised and supervised
False positive rate
Ave
rage
true
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
CB sup.FP sup.LPCS sup.MCS sup.SW sup.CB unsup.FP unsup.LPCS unsup.MCS unsup.SW unsup.
ROC curve for the different enzyme similarity measurements of data set I
Improve
ment
Increase of AUC from ca. 0.7 to more than 0.8!Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 12 / 14
Conclusion
General conclusions
1 enzyme function prediction can nicely be cast in a conditional rankingframework
2 supervised ranking is a clear improvement upon the baseline
3 efficient enough for many bioinformatics applications
4 can be generalised to many other settings
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 13 / 14
Conclusion
Acknowledgements
Ghent University
Bernard De BaetsWillem Waegeman
University of Turku
Tapio PahikkalaAntti Airola
University of Marburg
Thomas FoberEyke Hullermeier
Want to know more?[1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regularized least-squares algorithms for
conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013.
[2] M. Stock, T. Fober, E. Hullermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman.Identification of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biologyand Bioinformatics, page Accepted for publication, 2014.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 14 / 14