Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | randell-hancock |
View: | 220 times |
Download: | 0 times |
Integrating Ontological Prior Integrating Ontological Prior Knowledge into Relational Knowledge into Relational
LearningLearning for Protein Function Predictionfor Protein Function Prediction
Stefan ReckowStefan ReckowMax Planck Institute of PsychiatryMax Planck Institute of Psychiatry
Volker TrespVolker TrespSiemens, Corporate TechnologySiemens, Corporate Technology
Page 2
Proteins and Protein Proteins and Protein OntologiesOntologies
Page 3
Protein and Protein FunctionsProtein and Protein Functions
motivationmotivation• proteins – molecular machines in any organismproteins – molecular machines in any organism• understanding protein function is essential for understanding protein function is essential for
all areas of bio-sciencesall areas of bio-sciences• diverse sources of knowledge about proteinsdiverse sources of knowledge about proteins
challengeschallenges• experimental determination of functions difficult experimental determination of functions difficult
and expensiveand expensive• homologies can be misleadinghomologies can be misleading• most proteins have several functionsmost proteins have several functions
Page 4
Protein function predictionProtein function prediction
catalytic activity (catalyzes a reaction)
isomerase activity
intramolecular oxidoreductase activity
intramolecular oxidoreductase activity, interconverting aldoses and ketoses
triose-phosphate isomerase activity (catalyzes a very specific reaction)
speci
fici
ty
What function does this protein have?
Page 5
““Function” OntologiesFunction” Ontologies
function
energy transcription cell fate
glycolysis fermentation respiration cell growth cell death
aerobic anaerobic
ontologies are a way of bringing order in the function of ontologies are a way of bringing order in the function of proteinsproteins
an ontology is a description of concepts of a domain and an ontology is a description of concepts of a domain and their relationships their relationships
hierarchical representation (subclass-relationship)hierarchical representation (subclass-relationship)• treetree• directed, acyclic graphdirected, acyclic graph
Page 6
Complex
Cytoskeleton Proteasome Intracellular transport
Actin filaments Microtubules 10 nm filaments Clathrin
Intermediate filaments Septin filaments
Golgi transport
complex: structure formed by a group of two or more proteins to complex: structure formed by a group of two or more proteins to perfom certain functions concertedlyperfom certain functions concertedly
““Complex” OntologyComplex” Ontology
Page 7
Ontologies as Great Source of Prior Ontologies as Great Source of Prior Knowledge in Machine LearningKnowledge in Machine Learning
A considerable amount of community effort is invested in A considerable amount of community effort is invested in designing ontologiesdesigning ontologies
Typically this prior knowledge is deterministic (logical Typically this prior knowledge is deterministic (logical constraints)constraints)
Machine Learning should be able to exploit this knowledgeMachine Learning should be able to exploit this knowledge
• Interactions of proteins is an important information for predicting Interactions of proteins is an important information for predicting function: statistical relational learning function: statistical relational learning
Page 8
Statistical Relational Statistical Relational Learning with the IHRMLearning with the IHRM
Page 9
SRL generalizes standard Machine Learning to domains SRL generalizes standard Machine Learning to domains where relations between entities (and not just entity where relations between entities (and not just entity attributes) play a significant roleattributes) play a significant role
Examples: PRM, DAPER, MLN, RMN, RDNExamples: PRM, DAPER, MLN, RMN, RDN
The IHRM is an easily applicable general model, performs a The IHRM is an easily applicable general model, performs a cluster analysis of relational domains and requires no cluster analysis of relational domains and requires no structural learningstructural learning
Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In Proc. 22nd UAI, 2006 Proc. 22nd UAI, 2006
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006). Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. AAAI 2006Learning systems of concepts with an infinite relational model. AAAI 2006
Statistical Relational Learning Statistical Relational Learning (SRL)(SRL)
Page 10
Standard Latent Model for Protein Standard Latent Model for Protein Mixture ModelsMixture Models
1Z
1A
2Z
2A
Protein1Protein1 Protein2Protein2
In a Bayesian approach, we can permit an infinite number of In a Bayesian approach, we can permit an infinite number of states in the latent variables and achieve a Dirichlet Process states in the latent variables and achieve a Dirichlet Process Mixture Model (DPM)Mixture Model (DPM)
Advantage: the model only uses a finite number of those states; Advantage: the model only uses a finite number of those states; thus no time consuming structural optimization is requiredthus no time consuming structural optimization is required
Page 11
Infinite Hidden Relational Model Infinite Hidden Relational Model (IHRM)(IHRM)
1Z
1A 2Z
2A
3Z
3A
2,1R3,2R
2,1RProtein1Protein1
Protein2Protein2
Protein3Protein3
interactinteract
interactinteractinteractinteract
• Permits us to include protein-protein interactions into Permits us to include protein-protein interactions into the model the model
Page 12
Ground NetworkGround Network
Z1
motif complex function
motif complex function
motif
complex
function
Z2
interactZ3
interact
interact
Page 13
Experimental ResultsExperimental ResultsKDD Cup 2001KDD Cup 2001 Yeast genome dataYeast genome data 1243 genes/proteins: 862 (training) / 381 (test) 1243 genes/proteins: 862 (training) / 381 (test) AttributesAttributes
• ChromosomeChromosome• MotifMotif (351) [1-6]: A gene might contain one or more characteristic motifs (351) [1-6]: A gene might contain one or more characteristic motifs
(information about the amino acid sequence of the protein)(information about the amino acid sequence of the protein)• EssentialEssential• Structural classStructural class (24) [1-2] The protein coded by the gene might belong to (24) [1-2] The protein coded by the gene might belong to
one or more structural categories (24) [1-2]one or more structural categories (24) [1-2]• PhenotypePhenotype (11)[1-6] observed phenotypes in the organism (11)[1-6] observed phenotypes in the organism• InteractionInteraction• ComplexComplex (56)[1-3] The expression of the gene can complex with others to (56)[1-3] The expression of the gene can complex with others to
form a larger proteinform a larger protein• FunctionFunction (14)[1-4] (cell growth, cell organization, transport, … ) (14)[1-4] (cell growth, cell organization, transport, … )
genes were anonymousgenes were anonymous
Page 14
ResultsResultsROC curve
Comparison with Supervised Models
IHRMIHRM 93.1693.16
Krogel et Krogel et al.al.
93.6393.63
SVMSVM 93.4893.48
ModelModel AccuracyAccuracy
Page 15
IHRM ResultIHRM Result
Node: geneLink: interaction Color: cluster.
Page 16
Integrating Ontological Integrating Ontological Prior Knowledge into Prior Knowledge into
the IHRMthe IHRM
Page 17
Integration of ontologiesIntegration of ontologies
Deductive closure
Page 18
Integration of ontologiesIntegration of ontologies
Zi
motif function
signal peptidase actin filaments microtubules
independent concepts
dependent concepts
cytoskeletontransloconcomplex
Page 19
Experiments: Including Experiments: Including “Complex” Ontology“Complex” Ontology
Data collected from CYGD of MIPSData collected from CYGD of MIPS 1000 genes/proteins: 800 (Training) / 200 (Test)1000 genes/proteins: 800 (Training) / 200 (Test) AttributesAttributes
• chromosome, motif, essential, structural class, phenotype, interaction, chromosome, motif, essential, structural class, phenotype, interaction, complex, functioncomplex, function
interactions from DIPinteractions from DIP usage of ontological knowledge on usage of ontological knowledge on complex complex
• five levels of hierarchalfive levels of hierarchal• in our model 258 nodes (concepts) using 66 top level categoriesin our model 258 nodes (concepts) using 66 top level categories• every protein has at least one complex annotationevery protein has at least one complex annotation• After including ontological constraints: about three annotations per After including ontological constraints: about three annotations per
protein on averageprotein on average
Page 20
ResultsResults800 (training) / 200 (test) 200 (training) / 200 (test)
w/o ontology: 0.895
with ontology: 0.928
w/o ontology: 0.832
with ontology: 0.894AUC
Page 21
ResultsResultsexplicit modeling of dependencies
Page 22
ResultsResults
•proteins acting in cell division•control proteins •"Septins“: Septins have several roles throughout the cell cycle and carry out essential functions in cytokinesis
•The three highlighted proteins fit into this cluster ( "cell fate" and "cell type differentiation“)
•proteins concerned with secretion and transportation
•The "Golgi apparatus" works together with the "endoplasmatic reticulum (ER)" as the transport and delivery system of the cell.
•"SNARE" proteins help to direct material to the correct destination
•Test proteins also "cellular transport"
•Grey: in test set
Page 23
ResultsResultssampling convergence
Page 24
ResultsResultsDistribution of proteins in the clusters
Page 25
ResultsResults
•Tasks occurring during DNA replication
•The former singleton "DNA polymerase", as a main actor in replication, obviously is assigned the correct cluster here
•Cellular Transport Cluster•The former singleton "Clathrin light chain", as a major constituent of coated vesicles (a component for transport) fits into this cluster quite well
•Grey: former singletons
Page 26
ConclusionConclusion application of the IHRM to function prediction application of the IHRM to function prediction
• competitive with competitive with supervised learning supervised learning methodsmethods• insights into the solutioninsights into the solution
advantages of integrating ontological knowledgeadvantages of integrating ontological knowledge• improvement of the clustering structureimprovement of the clustering structure• robustness: stable results with varying parameterizationrobustness: stable results with varying parameterization• deductive closure prior to learning is a general powerful principledeductive closure prior to learning is a general powerful principle
future challengesfuture challenges• usage of several or more complex ontologiesusage of several or more complex ontologies• further analysis of further analysis of dependent dependent vs.vs. independent independent concepts concepts
Acknowledgements: Acknowledgements: Karsten Borgwardt (MPIs Tübingen); Hans-Peter Kriegel (LMU)Karsten Borgwardt (MPIs Tübingen); Hans-Peter Kriegel (LMU)