BioSigNet: Reasoning and Hypothesizing about Signaling Networks
Nam Tran
Main points
Biomedical databases: structured data and queries.
http://cbio.mskcc.org/prl/ Next step: knowledge bases and reasoning. Kinds of reasoning, incomplete knowledge How can existing knowledge be revised, expanded?
Hypothesis formation Experimental verifications
Knowledge based reasoning
Various kinds of reasoning Prediction – side effects Planning – designing therapies Explanation – reasoning about unobserved aspects Consistency checking – correctness of ontologies
Additional facets/nuances Reasoning with incomplete knowledge. Reasoning with defaults. Ease of updating knowledge (elaboration tolerance)
Hypothesis formation
If: our observations can not be explained by our existing
knowledge? or the explanations given by our existing knowledge
are invalidated by experiments? Then: Our knowledge needs to be augmented or revised? How? Can we use a reasoning system to predict some
hypothesis that one can verify through experimentation?
Hypothesis space
Knowledge base
No cancer
Cancer
p53
UV leads_to cancer High UV
(K,I) |= O
Motivation -- summary
Goal: To emulate the abstract reasoning done by biologists, medical researchers, and pharmacology researchers.
Types of reasoning: prediction, explanation and planning.
Current system biology approaches: mostly prediction.
Incomplete knowledge constantly needs to be updated -> Hypothesis formation
Overview of our approach
Represent signal network as a knowledge base that describes actions/events (biological interactions, processes). effect of these actions/events. triggering conditions of the actions/events.
To query using the knowledge base: Prediction; explanation; planning.
Hypothesizing to discover new knowledge BioSigNet-RRH: Biological Signal Network –
Representation, Reasoning and Hypothesizing
Foundation behind our approach Research on representing and reasoning
about dynamic systems (space shuttles, mobile robots, software agents) causal relations between properties of the world effects of actions (when can they be executed) goal specification action-plans
Research on knowledge representation, reasoning and declarative problem solving – the AnsProlog language.
Representing signal networks as a Knowledge Base
Alphabet: Actions/Events: bind(ligand,receptor) Fluents: high(ligand), high(receptor)
Statements: Effect axioms:
bind(ligand,receptor) causes bound(ligand,receptor) if con.
high(other_ligand) inhibits bind(lig,receptor) if cond. Trigger conditions:
high(ligand), high(receptor) triggers bind(ligand,receptor)
Initial observations, Queries, Entailment Entailment: (K,I) |= Q
Given K: the knowledge base of binding I: initially high(ligand), high(receptor)
Conclude Q = eventually bound(ligand,receptor)
Given K: the knowledge base of binding I’: initially high(ligand), high(receptor), high(other_ligand)
Conclude Q
Importance of a formal semantics Besides defining prediction, explanation and
planning, it is also useful in identifying: Under what restrictions the answer given by a
given algorithm will be correct. (soundness!) Under what restrictions a given algorithm will find
a correct answer if one exists. (completeness!)
● bind(TNF-,TNFR1) causes trimerized(TNFR1)
● trimerized(TNFR1) triggers bind(TNFR1,TRADD)
Prediction
Given some initial conditions and observations, to predict how the world would evolve or predict the outcome of (hypothetical) interventions.
● Initial Condition
– bind(TNF-α,TNF-R1) occurs at 0
● Query
– predict eventually apoptosis
● Answer: Unknown!
– Incomplete knowledge about the TRADD’s bindings.
– Depends on if bind(TRADD, RIP) happened or not!
● Initial Condition
– bind(TNF-α,TNF-R1) occurs at t0
● Observation
– TRADD’s binding with TRAF2, FADD, RIP
● Query
– predict eventually apoptosis
● Answer: Yes!
Explanation
Given initial condition and observations, to explain why final outcome does not match expectation.
Relation to diagnosis.
● Initial condition:
– bound(TNF-,TNFR1) at t0
● Observation:
– bound(TRADD, TRAF2) at t1
● Query: Explain apoptosis
● One explanation:
– Binding of TRADD with RIP
– Binding of TRADD with FADD
Planning
Given initial conditions, to plan interventions to achieve a goal.
Application in drug and therapy design.
Planning requirements
In addition to the knowledge about the pathway we need additional information about possible interventions such as: What proteins can be introduced What mutations can be forced.
Planning example
Defining possible interventions: intervention intro(DN-TRAF2) intro(DN-TRAF2) causes present(DN-TRAF2) present(DN-TRAF2) inhibits bind(TRAF2,TRADD) present(DN-TRAF2) inhibits interact(TRAF2,NIK)
Initial condition: bound(NFκB,IκB) at 0 bind(TNF-α,TNF-R1) at 0
Goal: to keep NFκB remain inactive. Query:
plan always bound(NFκB,IκB) from 0
Future Works! Further development of the language
To better approximate cellular systems Delay triggers Granularities of representation Continuous processes, hybrid systems Concurrency, durative actions
Scaled-up implementation Kohn’s map Networks in Reactome and other repositories
Ontologies Integration with BioPax
Hypothesis space
Knowledge base
No cancer
Cancer
p53
UV leads_to cancer High UV
(K,I) |= O
Issues in this tiny example
Hypothesis formation:
Theory: UV leads to cancer.
Observation: wild-type p53 resists the UV effect.
Hypothesis: p53 is a tumor-suppressor. Elaboration tolerance:
How do we update/revise “UV leads to cancer”? Defaults and non-monotonic reasoning:
Normally UV leads to cancer.
UV does not lead to cancer if p53 is present.
Construction of hypothesis space Present: manual construction, using research literature Future: integration of multiple data sources
Protein interactions Pathway databases Biological ontologies
……..
Provide cues, hunches such as
A may interact with B: action interact(A,B)
A-B interaction may have effect C:
interact(A,B) causes C
Generation of hypotheses
Enumeration of hypotheses Search: computing with Smodels (an
implementation of AnsProlog) Heuristics
A trigger statement is selected only if it is the only cause of some action occurrence that is needed to explain the novel observations.
An inhibition statement is selected only if it is the only blocker of some triggered action at some time.
Maximizing preferences of selected statements
Generation … (cont’): heuristics Knowledge base K
a causes g b causes g
Initial condition I = { intially f } Observation O = { eventually g } (K,I) does not entail O Hypothesis space: to expand K with rules among
f triggers a f triggers b
Hypotheses: { f triggers a }, or { f triggers b }
Case study: p53 network
Tumor suppression by p53
p53 has 3 main functional domains N terminal transactivator domain Central DNA-binding domain C terminal domain that recognizes DNA damage
Appropriate binding of N terminal activates pathways that lead to protection of cell from cancer.
Inappropriate binding (say to Mdm2) inhibits p53 induced tumor suppression.
p53 knowledge base
Stress high(UV ) triggers upregulate(mRNA(p53))
Upregulation of p53 upregulate(mRNA(p53)) causes high(mRNA(p53)) high(mRNA(p53)) triggers translate(p53) translate(p53) causes high(p53)
p53 knowledge base (cont.)
Tumor suppression by p53 high(p53) inhibits growth(tumor)
p53 knowledge base (cont’)
Interaction between Mdm2 and p53 high(p53), high(mdm2) triggers bind(p53,mdm2) bind(p53,mdm2) causes bound(dom(p53,N)) bind(p53,mdm2) causes high([p53 : mdm2]), bind(p53,mdm2) causes ¬high(p53),¬high(mdm2)
Hypothesis formation
Experimental observation: I = { initially high(UV), high(mdm2), high(ARF) } O = { eventually ~ tumorous }
(K,I) does not entail O Need to hypothesize the role of ARF.
Constructing hypothesis space
Levels of ARF and p53 correlate high(ARF) triggers upregulate(mRNA(p53)) high(p53) triggers upregulate(mRNA(ARF))
Interactions of ARF with the known proteins bind(p53,ARF) causes bound(dom(p53,N))
Constructing …(cont’)
Influence of X (=ARF) on other interactions high(ARF) triggers upreg(mRNA(p53)) high(ARF) triggers translate(p53) high(ARF) triggers bind(p53,mdm2)
Constructing …(cont’)
Hypothesis
high(UV) triggers upregulate(mRNA(ARF)) high(ARF), high(mdm2) triggers bind(ARF,mdm2)
Future Works
Automatic construction of hypothesis space Extraction of facts like protein interactions … Integration of knowledge from different sources
Consistency-based integration (HyBrow) Ontologies
Heuristics for hypothesis search Ranking of hypotheses Make use of “number” data like microarray?