Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | linette-melton |
View: | 223 times |
Download: | 0 times |
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes
by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King
Presenter: Mark H. Rich
2/7/2003
University of Wisconsin - Madison
CS 838 Learning and Modeling Biological Networks
Discovering Gene Function
Yeast (S. cerevisiae) has 6,000 protein-encoding genes
Only 60% can be assigned function with confidence
The cell is a bio-chemical machine
Logic can help us discover these metabolic functions and networks
ASE-Progol
Robot Scientist
BackgroundKnowledge
AnalysisLearningEngine
ResultsExperimentSelection
NewKnowledge
Outline
Introduction
Abduction and Active Learning
Functional Genomics
Metabolism in Logic
Experiments
Results
Logic in AI
DeductionGiven facts with sound and complete proof theory, show that other facts can be proven
InductionGiven positive and negative examples of facts and background knowledge, find hypothesis that explains difference between positives and negatives
Abduction and TCIE
Given a theory and partial facts, discover what facts are missing to form one consistent hypothesis
Lateral Thinking PuzzlesPresented with a confusing situation
There is an Oracle that knows what happened
You can only ask yes or no questions
The Mysterious Package
One day a man received a parcel in the post. Carefully packed was a human arm. He examined it, repacked it and then sent it on to another man. The second man also carefully examined the arm before taking it to the woods and burying it. Why did they do this?
The Mysterious PackageWas the arm cut off intentionally?Is the arm’s person still alive?Is he a doctor?Did the three men know each other?Are the other men also missing an arm?Were they ever stuck on a desert island with no food, make a pact to each cut off an arm to eat and survive, but were rescued before the doctor could cut off his own arm, and the doctor later fulfilled his commitment? YES!
Lateral Thinking Lessons
Certain questions are valuable and lead to large leaps of information . . .
How do we form hypotheses?
How can we pick good questions?probability that question leads to consistent hypotheses
cost of asking question
We want to find quickest cheapest path to consistent hypotheses
Hypothesis Generation
Use contra-positives for inverse entailment
Background Knowledgehasbeak(X) :- bird(X).bird(X) :- vulture(X).
Examplehasbeak(tweety).
Hypothesesbird(tweety).bird(X).vulture(tweety).vulture(X).
Trial Selection Theorye1 e2 e3 e4
H1 0 1 1 1
H2 1 1 0 1
H3 1 0 1 1
e1
e2H1
H2 H3
One possible trial path
t f
t f
Hypothesis Probability
Each trial partitions H into {H[t],H[t’]}
Assuming optimal encoding scheme…
Prior probability of each hypothesis
Compression is rounded f measure
€
p(hi | E) =2Compression(hi |E )
2Compression(h |E )
h∈H
∑
€
f = E + −E +
p(l1 + l2 + n)
Experiment Cost
Ct is the cost of a trial t
€
EC(H,T) ≈ mint∈T
Ct + p(t)(meant '∈(T −t )Ct ')JH t[ ]
+(1− p(t))(meant '∈(T −t )Ct ' )JH t [ ]
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
€
p(t) = p(h)h∈H t[ ]
∑
€
JH = − p(h) log2(p(h))⎣ ⎦h∈H
∑
Functional Genomics
Want to learn gene-enzyme mappingGenes encode for
Enzymes that catalyze reactions between
Metabolites to eventually create
Amino Acid Products
Perform auxotrophic growth experiments to determine phenotype
Functional Genomics: Simple
A, B and C are EnzymesX is ubiquitous metabolite, Y and Z optionalIf we knock out gene2, we need to add nutrient Z to produce Trpwant to learn codes(gene2, B, [Y], [Z]) but only ask:
pheno_effect(gene2,[Y]) is false
pheno_effect(gene2,[Z]) is true
pheno_effect(gene2,[Y,Z]) is true
X Y Z Trp
gene1 gene2 gene3
A B C
Aromatic amino acid pathway
aromatic amino acidsenzymesmetabolites
Metabolism in Logic
Hypotheses:codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]).
codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]).
etc ...
Background Knowledge:enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]).
enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]).
etc ...
generated_by_other_pathways([‘C00002’, ‘C00005’, ‘C00006’, ... , ‘C03356’]).
ends([‘C00078’, ‘C00079’, ‘C00082’]).
Metabolism in Logic
What the Oracle answers:
phenotypic_effect(ORF, Growth_medium):-
generated_by_other_pathways(Ubiquitous_metabolites),
union(Ubiquitous_metabolites, Growth_medium, Starts),
connected(Starts, Wild_products),
ends(Ends),
subset(Wild_products, Ends),
enz(Enzyme, Reactants, Products),
encodes(ORF, Enzyme, Reactants, Products),
connected_without_this_step(Starts, Mutant_products,
Enzyme, Reactants, Products),
not(subset(Mutant_products, Ends)).
ExperimentsLearn function of 17 genes by removing ORFGrowth Media
13 optional nutrients, at most 3 at a time378 possible experiments for each ORF
Cost of Optional NutrientsDetermined from www.sigmaaldrich.com catalog
Strategies for ComparisonRandomNaïve CheapestASE-Progol
Experiments
Remove all codes(…) factsLoop
Generate random sample of trialsGenerate hypotheses using Theory Completion by Inverse EntailmentFind minimum EC(H,T) trial and performAdd results to known examples
until hypotheses consistent with trials
Results:Cost
Results: Time
Conclusions and Future WorkASE-Progol finds hypotheses inexpensively and quickly5 of 17 genes had only negative examples… why? Look into inhibitors and nonmonotonic logics.Limited answers to yes/no. Probabilities?Can this be applied to gene regulatory networks, using microarray technology?What other networks have similar frameworks?