Prof. Michael Schroeder Biotec/Dept. of ComputingTU [email protected] Biotec
Reasoning on the Web:Theory, Challenges, and
Applications in Bioinformatics
By Michael Schroeder, Biotec, 2003 2
Contents
Motivation Beyond the web: Rules, Reasoning, Semantics,
Ontologies Semantics of Deduction Rules
Argumentation Semantics Fuzzy Reasoning
Reaction rules Vivid Agents Prova
Applications in Bioinformatics
By Michael Schroeder, Biotec, 2003 3
The Web
A great success story, but… it’s the web for humans, not machines
Many areas, such as biology, have fully embraced the web Human genome project is only tip of the iceberg More than 500 tools and databases online
LLNEYLEEVE EYEEDE
By Michael Schroeder, Biotec, 2003 4
Example: Pubmed
>12.000.000 literature abstracts Great resource if one
knows what one is looking for
“Kox1” has 17 hits
But “diabetes” will produce >200.000
Often need to automatically process abstracts
By Michael Schroeder, Biotec, 2003 5
Results of PubMed Lorenz P, Transcriptional repression mediated by
the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44.
Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31....
Author
Title
YearJournal
However, to a machine things look different!
By Michael Schroeder, Biotec, 2003 6
Results of PubMed
....
Solution: tag data (XML)
By Michael Schroeder, Biotec, 2003 7
Results of PubMed <author> </
author><title>
. </title>
<journal> </journal><year><year> <author> </
author><title>
. </title>
<journal> </journal><year><year>
...
However, to a machine things look different!
By Michael Schroeder, Biotec, 2003 8
Results of PubMed
...
Solution: use ontologies(Semantic Web)
By Michael Schroeder, Biotec, 2003 9
GeneOntology
Biologists have recognised the problem of semantic inter-operability between disparate information sources
GeneOntology (GO) is effort to provide common vocabulary for molecular biology
GO has >10.000 terms in three branches “function”, “process”, “localisation”
By Michael Schroeder, Biotec, 2003 10
GeneOntology Has 13 levels Width broadens to level 6 (3885 terms wide) then shrinks Number of leaves per levels broadens to level 6 (1223 leaves) then
shrinks Average term has 4 words Maximal term has 29 words:
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Breadth of GOOxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors
By Michael Schroeder, Biotec, 2003 11
Motivation Summary
Web in the old days HTML (for humans)
Web these days HTML XML, Ontologies (for machines)
Web of the future HTML XML, Ontologies rules, reasoning, semantics access to computational resources (a la grid-computing)
By Michael Schroeder, Biotec, 2003 12
Open Problems
Part I: Theory of rules and reasoning on the web: Knowledge representation: Which level of expressiveness? Semantics: How to guarantee inter-operability Reasoning: Fuzzy reasoning and unification Reactivity: Vivid agents
Part II: Applications of rules and reasoning on the web: Integration and querying of information sources
Integration: transmembrane prediction tools Integration: protein structure DB and structure classification
Consistency checking Ontology: If A is B and B is C, then the ontology should not
explicitly mention A is C, as it is already implicit Annotation: Do different tools agree or disagree?
By Michael Schroeder, Biotec, 2003 13
The wider Picture: www.RuleML.org
Goal: develop Web language for rules using XML markup,
formal semantics, and
efficient implementations.
Rules: derivation rules, transformation rules, and reaction rules.
RuleML can thus specify queries and inferences in Web ontologies, mappings between Web ontologies, and dynamic Web behaviors of workflows, services, and agents.
Currently, some 30 international members and close collaboration with W3C
By Michael Schroeder, Biotec, 2003 14
The wider Picture: REWERSE Reasoning on the Web with Rules and Semantics FP6 Network of Excellence with nearly 30 partners
Working groups on Infrastructure and Applications Composition Typing Policies Querying Reactivity and evolution
Personalised Web sites Calendar systems Bioinformatics
By Michael Schroeder, Biotec, 2003 15
Part I: Theory
Motivation: Expressive Knowledge Representation Part I.a: Argumentation as LP semantics
Notions of attack and justified arguments Hierarchy of semantics Proof procedure
Part I.b: Fuzzy unification and argumentation Fuzzy negation Fuzzy argumentation Fuzzy unification
Part I.c: Vivid Agents
By Michael Schroeder, Biotec, 2003 16
Part I.a: A Hierarchy of Semantics
RuleML caters for different degrees of knowledge representation
A hierarchy of semantics is required to guarantee inter-operation.
Analogy: In HTML, <b>Michael</b> will be interpreted differently in Netscape (Michael) and the text-based browser Lynx (Michael).
Problem: How can we guarantee inter-operability between different interpretations of rules?
By Michael Schroeder, Biotec, 2003 17
Knowledge representation
Pete earns 500.000$ p.a. earns(pete,500000).
Cross the street if there are no cars cross not car cross car
The fridge is quite cheap cheap(fridge):70%
Does Mike live in Londn? address(mike,london) = address(mike,londn): 95%
By Michael Schroeder, Biotec, 2003 18
Knowledge System Cube
rFB
fDB
fdFB
rDB
dDB
fdDB
dFB
fFB
r: relational f: fuzzy d: deductive
DB: database FB: factbase
ded
uct
ive
negation
fuzz
y
By Michael Schroeder, Biotec, 2003 19
Part I.a:Argumentation as semantics for Extended Logic Programs
rFB
fDB
fdFB
rDB
dDB
fdDB
dFB
fFB
f: fuzzy d: deductive
DB: database FB: factbase
ded
uct
ive
negation
fuzz
y
By Michael Schroeder, Biotec, 2003 20
Extended Logic Programming
Logic Programming with 2 negations Default negation:
not p : true if all attempts to prove p fail. Explicit negation:
p : falsehood of a literal may be stated explicitly. Coherence principle:
p not p
By Michael Schroeder, Biotec, 2003 21
Argumentation Interaction between agents in order to
gain knowledge revise existing knowledge convince the opponent solve conflicts
Elegant way to define semantics for (extended) logic programming Dung Kowalski, Toni, Sadri Prakken & Sartor Etc.
By Michael Schroeder, Biotec, 2003 22
Arguments
An argument is a partial proof, with implicitly negated literals as assumptions.
Argument = sequence of rules
By Michael Schroeder, Biotec, 2003 23
Attacking arguments Two fundamental kinds of attack:
A undercuts B = A invalidates premise of B P: Let’s go to the lake as it is not snowing anymore O: Hang, it is snowing
A rebuts B = A contradicts B P: Let’s go to the lake as it is not snowing O: Let’s not, as I’ve got to prepare my talk
Derived notions of attack used in Literature:
A attacks B = A u B or A r B
A defeats B = A u B or (A r B and not B u A)
A strongly attacks B = A a B and not B u A
A strongly undercuts B = A u B and not B u A
By Michael Schroeder, Biotec, 2003 24
Proposition: Hierarchy of attacks
Undercuts = u
Strongly undercuts = su = u - u -1
Strongly attacks = sa = (u r ) - u -1
Defeats = d = u ( r - u -1)
Attacks = a = u r
By Michael Schroeder, Biotec, 2003 25
Fixpoint Semantics Argumentation:
game between proponent and opponent argument A is acceptable if opponent’s x-attack is countered by
proponent’s y-attack, which proponent already accepted earlier. Acceptable
Let x,y be notions of attack. An argument A is x,y-acceptable w.r.t. a set of arguments S iff
for every argument B, such that (B,A) x, there is a C S such that (C,B) y
Fixpoint semantics Fx/y (S) = { A | A is x,y-acceptable w.r.t. S }
x/y-justified arguments = Least Fixpoint of Fx/y.
x/y-overruled arguments = x-attacked by a justified argument. x/y-defensible iff neither justified nor overruled
By Michael Schroeder, Biotec, 2003 26
Theorem: Relationship of semantics Weakening opponent or strengthening proponent increases justified
arguments Different notions of acceptability give rise to different argumentation
semantics
sa/u=sa/d=sa/a
sa/su=sa/sa
d/su=d/u=d/a=d/d=d/sa
u/su=u/u
su/su
su/u
su/a=su/d
su/sa
u/a=u/d=u/sa
a/su=a/u=a/a=a/d=a/sa
Dung’s groundedargumentation semantics
Prakken and Sartor’ssemantics w/o priorities
WFSX
If opponent is allowed to attack,type of defense does not matter
If opponent is allowed defeat,type of defense does not matter
If opponent is allowed undercut,defense with (a,u,sa) or without(su,u) rebut makes a difference
By Michael Schroeder, Biotec, 2003 27
Proof procedure Dialogues:
x/y-dialogue is sequence of moves such that Proponent and Opponent alternate Players cannot repeat arguments Opponent x-attacks Proponent’s last argument Proponent y-attacks Opponent’s last argument
Player wins dialogue if other player cannot move Argument A is provably justified if proponent wins all branches of
dialogue tree with root A Concrete implementation SLXA:
Since u/a=u/d=u/sa=WFSX
compute justified arguments with top-down proof procedure SLXA for WFSX [Alferes, Damasio, Pereira]
SLXA can be adapted for other notions
By Michael Schroeder, Biotec, 2003 28
Part I.b:Fuzzy unification and argumentation
rFB
fDB
fdFB
rDB
dDB
fdDB
dFB
fFB
r: relational f: fuzzy d: deductive
DB: database FB: factbase
ded
uct
ive
negation
fuzz
y
By Michael Schroeder, Biotec, 2003 29
Classical Fuzzy Logic
Solution: Truth values in [0,1] instead of {0,1}. Assertions:
p:V (p a formula, V a truth value). Conjunction:
p:V, q:W p q : min(V,W) Disjunction:
p:V, q:W p q : max(V,W) Inference:
p q1, …, qn ; q1:V1, …, qn:Vn p : min(V1, …, Vn)
By Michael Schroeder, Biotec, 2003 30
Fuzzy Negation
Classical fuzzy negation: L:V L: 1-V (Zadeh)
Our setting (fuzzy adaptation of WFSX): L:V and L:V’ with V’ 1-V possible L and L not directly related.
By Michael Schroeder, Biotec, 2003 31
Fuzzy Coherence Principle
If L:V and V > 0, and not L:V’,
then V’ > V. “If there is some explicit evidence that L is false, then there is
at least the same evidence that L is false by default.”
If L:V and V > 0,
then not L: 1.
By Michael Schroeder, Biotec, 2003 32
Law of excluded... ...contradiction ...middle
p p :V V > 0 possible Contradictory programs!
not p p : V V > 0 possible By coherence principle!
Contradiction removal
not p p : V V > 0
p p : V V = 0 possible p is unknown
By Michael Schroeder, Biotec, 2003 33
Strength of an argument
Strength of an argument: Fact: value is given Rule: minimum of body literals Argument: Conclusion
Least fuzzy value of the facts contributing to the argument.
By Michael Schroeder, Biotec, 2003 34
Theorems
Theorem (Soundness and Completeness)There is a justified argument of strength V for L
iffThere is a successful T-tree of truth value V for L
Theorem (Conservative Extension)
Argumentation semantics is a conservative extension of WFSX.
By Michael Schroeder, Biotec, 2003 35
Application: Fuzzy unification
Open systems: knowledge and ontologies may not match interaction with humans “Does Mike live in Londn?”
Approach: address(mike,london) = address(mike,londn): 95% adapt unification algorithm
(normalised edit distance over trees net) embed into argumentation framework
By Michael Schroeder, Biotec, 2003 36
Finding Mismatches: Edit distance
Edit distance between strings A and B: minimal number of delete, add, replace operations to
convert A into B. efficient implementation with dynamic programming
Example: e(address,adresse)=2, e(007,aa7)=2
Normalise: ne(A,B) = e(A,B) / max{ |A|, |B| }
Trees: net = sum of all mismatches divided by sum of all
max lengths
By Michael Schroeder, Biotec, 2003 37
Fuzzy unification and arguments
net is conservative extension of MGU (most general unifier)
net(t,t’) ne(t,t’)
Adapt definition of argument for fuzzy unificationV-argument: for all L in a body, there is L’ in head such
that net(L,L’) 1-VA V-undercuts B if A contains not L and B’s head is L’ and
net(L,L’) 1-VA V-rebuts B if A’s head is L and B’s head is L’ and
net(L,L’) 1-V
Adapt previous definitions accordingly
By Michael Schroeder, Biotec, 2003 38
Comparison: Argumentation
Our framework allows us to relate existing and new argumentation semantics: Dung= a/su=a/u=a/a=a/d=a/sa Prakken&Sartor = d/su=d/u=d/a=d/d=d/sa WFSX = u/a = u/d = u/sa Dung Prakken&Sartor WFSX
Proof Theory and Top-down Proof Procedure adapted from Alferes, Damasio, Pereira’s SLXA
By Michael Schroeder, Biotec, 2003 39
Comparison: Fuzzy Argumentation
Wagner: Scale: -1 to +1 Unlike WFSX, he relates F and F: F: -V iff F:V
We adopted his interpretation for not: not F:1 if F:V, V>0
Relates his work to stable models, but there is no top-down proof procedure for stable models [Alferes&Pereira]
Our approach conservatively extends WFSX, hence we can adapt proof procedure SLXA
By Michael Schroeder, Biotec, 2003 40
Comparison: Fuzzy unification
Arcelli, Formato, Gerla define abstract fuzzy unification/resolution framework cannot deal with missing parameters (common
problem [Fung et al.]) no conservative extension of classical unification we use concrete distance: edit distance
Evaluated idea on bioinfo DB
By Michael Schroeder, Biotec, 2003 41
Conclusion “A database needs two kinds of negation” (Wagner) Argumentation is an elegant way of defining semantics Our framework allows classification of various new and
existing semantics Efficient top-down proof procedure for justified arguments Argumentation as basis for belief revision (REVISE) We cover the whole knowledge system cube including
fuzzy argumentation Defined fuzzy unification, which is useful in open systems
By Michael Schroeder, Biotec, 2003 42
Part I.c: Vivid Agent
A vivid agent is a software-controlled system, whose state is represented by a knowledge base and whose behaviour is represented by
action- and reaction rules
Actions are planned and executed to achieve a goal Reactions are triggered by events
Epistemic RR: Effect <- Event, Cond Physical RR: Action, Effect <- Event, Cond Interaction RR: Msg, Effect <- Event, Cond
By Michael Schroeder, Biotec, 2003 43
Vivid Agent
KB
Reaction Rules
PerceptionReaction
Cycle
Believes/Updates
KB
GoalsAction rules
Planner
Believes
Goals
Intentions
Believes
Interface
Events
By Michael Schroeder, Biotec, 2003 44
Agent State and Transition Semantics
Agent State: Event queue, Plan queue, Goal queue, Knowledge base
Transition semantics Perception
Add event to agent’s event queue Reaction
Pop event from event queue, execute reactions including update of knowledge base
Plan execution Execute action of plan in plan queue
Replanning If action fails, replan
Planning Pop goal from goal queue and generate plan
By Michael Schroeder, Biotec, 2003 45
Implementation in Prova
Original Implementation in PVM-Prolog Course-grain parallelism (PVM) for each agent and
Prolog threads for an agent’s components
Currently: Prova is a Java-based rule engine
easy integration of all kinds of data sources. e.g., database, web services, etc.
By Michael Schroeder, Biotec, 2003 46
Part II: Application to Bioinformatics NSF and EU’s strategic research workshop found that
bioinformatics could play the role for the semantic web, which physics played for the web.
Why? Masses of information Masses of publicly accessible online information
(e.g. 8000 abstracts per month and over 500 tools) Data (more and more often) published in XML Data standards are accepted and actively developed Much valuable information scattered (as production cheap and
hence not centralised) Systemsintegration and interoperation prime concern (e.g.
GeneOntology)
LLNEYLEEVE EYEEDE
By Michael Schroeder, Biotec, 2003 47
Example: Information Agents for…
… Protein interactions PDB, SCOP
… Protein annotation TOPPred, HMMTOP,…
Information source Wrapper Mediator Facilitator
Mediator
Source
Source Source
Wrapper
Wrapper Wrapper
Facilitator
By Michael Schroeder, Biotec, 2003 48
Example 1: Protein Interaction:
PDB: Protein structures SCOP: Structure classification
By Michael Schroeder, Biotec, 2003 49
Example 1: PSIMAP: Structural Interactions
By Michael Schroeder, Biotec, 2003 50
Example 1: Protein Interaction: How it is currently done
PDB: 15 Gigabyte in flat files SCOP: 3 flat files
How? Download PDB, SCOP files Think up DB schema and populate MySQL DB Run some Perl scripts on various machines, that
grind through the data and analyse it Run some Java to visualise results
Problem: “Business logic” not separated
By Michael Schroeder, Biotec, 2003 51
How our Prova system can run execute
Declarative and executable specifications Interaction(Superfamliy1, Superfamliy2) if
PDB(Protein), Domain(Protein,Domain1), Domain(Protein,Domain2), SCOP Superfamily(Domain1, Superfamily1), SCOP Superfamily(Domain2, Superfamily2), InteractionDD(Domain1,Domain2, 5 Ang, 5 Residues)
Separation of information integration workflow Easier to maintain
Platform independence, because of Java Flexible, optimized execution
Query optimization and load-balancing of computations
Local or remote computation.
Might be held locally in file, remotely from a DB,
through a web service, on the grid, etc.
By Michael Schroeder, Biotec, 2003 52
Actual Prova Code
% ACTUAL PROVA CODE
% Given the open database connection DB
% and a unique protein identifier in Protein
% Data Bank PDB_ID, test whether the provided
% domains with IDs PXA and PXB interact
% (have at least 5 atoms within 5 angstroms)
scop_dom2dom(DB,PDB_ID,PXA,PXB) :-
access_data(pdb,PDB_ID,Protein),
scop_dom_atoms(DB,Protein,PXA,DomainA),
scop_dom_atoms(DB,Protein,PXB,DomainB),
DomainA.interacts(DomainB).
By Michael Schroeder, Biotec, 2003 53
Caching% Two alternative rules for either retrieving data % from the cache or accessing the data from its % original location and caching it.access_data(Type,ID,Data,CacheData) :- % Attempt to retrieve the data Data=CacheData.get(ID), % Success, Data (whatever object it is) is returned !.
access_data(Type,ID,Data,CacheData) :- % Retrieve the data from its location and update
the cache retrieve_data_general(Type,ID,Data), update_cache(Type,ID,Data,CacheData).
By Michael Schroeder, Biotec, 2003 54
Example 2: GoPubmed
By Michael Schroeder, Biotec, 2003 55
Consistency of GO
Simple example: Parsimony: If A is-a C is explicitly stated in the
ontology, it should be possible to derive it implicitly
I.e. Don’t state A is-a C if you have already A is-a B and B is-a C
Done with Prova
By Michael Schroeder, Biotec, 2003 56
Towards functional annotation through GoPubmed
Protein Name/Enzyme activity hydrolase
kinase transferase lyase isomerase one other
Pyruvate kinase M1 isozyme X
X X X X oxireductase
CAMP dpt protein kinase type II regulatory chain X
X X X cyclase
Galactokinase X
X X X X
Tropomyosin bêta chain X
X X X
HnRNP DO X
X X X helicase
By Michael Schroeder, Biotec, 2003 57
Example 3: Consistent Integration of Protein Annotation
By Michael Schroeder, Biotec, 2003 58
Conflicts
By Michael Schroeder, Biotec, 2003 59
Host
Host
HostHost
DispatcherAnalyser Analyser
Analyser
Info object
Info object
Info object
Info object Info object
Info object
Info object
Info object
EditToTrEMBL (Steffen Möller, EBI): automate annotation of DNA sequences by combining results of various tools and databases, which are online
Example: Edit2TrEMBL
By Michael Schroeder, Biotec, 2003 60
Challenge Uncertain, incomplete, vague,
contradictory information Wrappers domains overlap: How
can mediator resolve conflicts? How can mediator integrate
information consistently? How can mediator improve info
quality using overlapping info and inconsistencies
Mediator contains conflict resolution component
Semantic conflict resolution requires domain knowledge to identify conflicts
We use extended logic programming
Mediator
Source
Source Source
Wrapper
Wrapper Wrapper
Facilitator
Common Problem:Overlapping information
can lead to inconsistencies
Solution:Semantic consistency
checking
By Michael Schroeder, Biotec, 2003 61
Modelling domain knowledge Facts, Rules, Assumptions, Integrity Constraints
For example: The length of transmembrane regions is limited:
false if ft(AccNo,transmembrane,From,To), To-From >25false if ft(AccNo,transmembrane,From,To), To-From <15
Maximal difference in membrane bordersfalse if ft(Agent1,Acc,transmembrane,From1,To1), ft(Agent2,Acc,transmembrane,From2,To2), (From1>From2,From1<To2;To1>From2,To1<To2),
(abs(From2-From1)>4;abs(To2-To1)>4).
Assessment of predictions:probability(ft(tmhmm,p12345,transmem,6,26), 0.5)
By Michael Schroeder, Biotec, 2003 62
REVISE REVISE detects conflicting arguments and
computes minimal set of assumptions, which removes conflict
Dropping these assumptions yields minimal consistent annotation of all predictions
Minimality is based on probabilities given as part of predictions
alternative: cardinality, set-inclusion
By Michael Schroeder, Biotec, 2003 63
Expression Space:Space Explorer
Pathway Space:
BioNetExplorer
Interaction Space:PSIMAP
Literature Space:Classification Server
Vision: A semantic Grid for Bioinformatics
By Michael Schroeder, Biotec, 2003 64
Conclusion Advanced applications on the web, will require rules
and reasoning Part I:
Argumentation is an elegant way of defining semantics Classification of various new and existing semantics Fuzzy reasoning and unification Reactivity with vivid agents and prova
Part II: Bioinformatics requires a semantic web and the
semantic web requires bioinformatics
By Michael Schroeder, Biotec, 2003 65
Acknowledgment
Ralf Schweimeier (Argumentation semantics) Panos Dafas, Dan Bolser (PSIMAP) Steffen Moeller (Edit2Trembl) David Gilbert (Fuzzy Unification) Ralph Delfs, Alexander Kozlenkov (Go, Prova) Carlos Damasio (REVISE)
More information at comas.soi.city.ac.uk
Email: [email protected]