Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | suzanna-kelly |
View: | 221 times |
Download: | 1 times |
Introduction to ILP
ILP = Inductive Logic Programming
= machine learning logic programming
= learning with logic
Introduced by Muggleton in 1992
(Machine) Learning
• The process by which relatively permanent changes occur in behavioral potential as a result of experience. (Anderson)
• Learning is constructing or modifying representations of what is being experienced. (Michalski)
• A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. (Mitchell)
Machine Learning Techniques
• Decision tree learning
• Conceptual clustering
• Case-based learning
• Reinforcement learning
• Neural networks
• Genetic algorithms
• and… Inductive Logic Programming
Why ILP ? - Structured data
Seed example of East-West trains (Michalski)
What makes a train to go eastward ?
Why ILP ? – Structured data
Mutagenicity of chemical molecules(King, Srinivasan, Muggleton, Sternberg, 1994)
What makes a molecule to be mutagenic ?
Why ILP ? – multiple relations
This is related to structured data
Train Car
t1 c11
t1 c12
t1 c13
t1 c14
t2 c21
… …
Car Length Shape Axes Roof …
c11 short rectangle 2 none …
c12 long rectangle 3 none …
c13 short rectangle 2 peaked …
c14 long rectangle 2 none …
c21 short rectangle 2 flat …
… … … … … …
has_car car_properties
Why ILP ? – multiple relations
Genealogy example:• Given known relations…
– father(Old,Young) and mother(Old,Young)– male(Somebody) and female(Somebody)
• …learn new relations– parent(X,Y) :- father(X,Y).– parent(X,Y) :- mother(X,Y).– brother(X,Y) :-
male(X),father(Z,X),father(Z,Y).
Most ML techniques can’t use more than 1 relatione.g.: decision trees, neural networks, …
Why ILP ? – logical foundation
• Prolog = Programming with Logicis used to represent:– Background knowledge (of the domain): facts– Examples (of the relation to be learned): facts– Theories (as a result of learning): rules
• Supports 2 forms of logical reasoning– Deduction– Induction
Prolog - definitions
• Variables: X, Y, Something, Somebody• Terms: arthur, 1, [1,2,3]• Predicates: father/2, female/1
• Facts:– father(christopher,victoria).– female(victoria).
• Rules:– parent(X,Y) :- father(X,Y).
Logical reasoning: deduction
From rules to facts…
B T |- E
mother(penelope,victoria).
mother(penelope,arthur).
father(christopher,victoria).
father(christopher,arthur).
parent(X,Y) :- father(X,Y).
parent(X,Y) :- mother(X,Y).
parent(penelope,victoria).
parent(penelope,arthur).
parent(christopher,victoria).
parent(christopher,arthur).
Logical reasoning: induction
From facts to rules…
B E |- T
mother(penelope,victoria).
mother(penelope,arthur).
father(christopher,victoria).
father(christopher,arthur).
parent(X,Y) :- father(X,Y).
parent(X,Y) :- mother(X,Y).
parent(penelope,victoria).
parent(penelope,arthur).
parent(christopher,victoria).
parent(christopher,arthur).
Induction of a classifieror Concept Learning
Most studied task in Machine Learning
Given:– background knowledge B– a set of training examples E– a classification c C for each example e
Find: a theory T (or hypothesis) such that
B T |- c(e), for all e E
Induction of a classifier: example
Example of East-West trains• B: relations has_car and car_properties
(length, roof, shape, etc.)
ex.: has_car(t1,c11), shape(c11,bucket)• E: the trains t1 to t10• C: east, west
Why ILP ? - Structured data
Seed example of East-West trains (Michalski)
What makes a train to go eastward ?
Induction of a classifier: example
Example of East-West trains• B: relations has_car and car_properties
(length, roof, shape, etc.)
ex.: has_car(t1,c11)• E: the trains t1 to t10• C: east, west
• Possible T: east(T) :-
has_car(T,C), length(C,short), roof(C,_).
Induction of a classifier: example
Example of mutagenicity• B: relations atom and bond
ex.: atom(mol23,atom1,c,195). bond(mol23,atom1,atom3,7).
• E: 230 molecules with known classification• C: active and nonactive w.r.t. mutagenicity
• Possible T:
active(Mol) :-
atom(Mol,A,c,22), atom(Mol,B,c,10),bond(Mol,A,B,1).
c22
c10
Learning as search
Given:– Background knowledge B
– Theory Description Language T
– Positives examples P (class +)
– Negative examples N (class -)
– A covering relation covers(B,T,e)
Find: a theory that covers– all positive examples (completeness)
– no negative examples (consistency)
Learning as search
• Covering relation in ILPcovers(B,T,e) B T |- e
• A theory is a set of rules• Each rule is searched separately (efficiency)• A rule must be consistent (cover no
negatives), but not necessary complete• Separate-and-conquer strategy
– Remove from P the examples already covered
Space exploration
Strategy?• Random walk
– Redundancy, incompleteness of the search
• Systematic according to some ordering– Better control => no redundancy, completeness
– The ordering may be used to guide the search towards better rules
What kind of ordering?
Generality ordering
• Rule 1 is more general than rule 2=> Rule 1 covers more examples than rule 2
– If a rule is consistent (covers no negatives)then every specialisation of it is consistent too
– If a rule is complete (covers all positives)then every generalisation of it is complete too
• Means to prune the search space• 2 kinds of moves: specialisation and generalisation• Common ILP ordering: θ-subsumption
Generality orderingparent(X,Y):-
parent(X,Y):- female(X) parent(X,Y) :- father(X,Y)
parent(X,Y) :- female(X), mother(X,Y)
parent(X,Y) :- female(X), father(X,Y)
consistent rulespecialisation
Search biases
“Bias refers to any criterion for choosing one generalization over another other than strict consistency with the
observed training instances.” (Mitchell)
• Restrict the search space (efficiency)• Guide the search (given domain knowledge)• Different kinds of bias
– Language bias
– Search bias
– Strategy bias
• Choice of predicates:roof(C,flat) ? roof(C) ? flat(C) ?
• Types of predicates :east(T) :- roof(T), roof(C,3)
• Modes of predicates :east(T) :- roof(C,flat)
east(T) :- has_car(T,C), roof(C,flat)
• Discretization of numerical values
Language bias
Search bias
The moves direction in the search space• Top-down
– start: the empty rule (c(X) :- .)– moves: specialisations
• Bottom-up– start: the bottom clause (~ c(X) :- B.)– moves: generalisations
• Bi-directional
Strategy bias
Heuristic search for a best rule• Hill-climbing:
– Keep only one rule– efficient but can miss global maximum
• Beam search:– also keep k rules for back-tracking– less greedy
• Best-first search:– keep all rules– more costly but complete search
A generic ILP algorithm
procedure ILP(Examples)
Initialize(Rules, Examples)
repeat
R = Select(Rules, Examples)
Rs = Refine(R, Examples)
Rules = Reduce(Rules+Rs, Examples)
until StoppingCriterion(Rules, Examples)
return(Rules)
A generic ILP algorithm
• Initialize(Rules,Examples): initialize a set of theories as the search starting points
• Select(Rules,Examples): select the most promising candidate rule R
• Refine(R,Examples): returns the neighbours of R (using specialisation or generalisation)
• Reduce(Rules,Examples): discard unpromising theories (all but one in hill-climbing, none in best-first search)
ILPnet2 – www.cs.bris.ac.uk/~ILPnet2/
Network of Excellence in ILP in Europe
• 37 universities and research institutes
• Educational materials
• Publications
• Events (conferences, summer schools, …)
• Description of ILP systems
• Applications
ILP systems
• FOIL (Quinlan and Cameron-Jones 1993): top-down hill-climbing search
• Progol (Muggleton, 1995): top-down best-first search with bottom clause
• Golem (Muggleton and Feng 1992): bottom-up hill-climbing search
• LINUS (Lavrac and Dzeroski 1994): propositionalisation
• Aleph (~Progol), Tilde (relational decision trees), …
ILP applications
• Life sciences– mutagenecity, predicting toxicology– protein structure/folding
• Natural language processing– english verb past tense– document analysis and classification
• Engineering– finite element mesh design
• Environmental sciences– biodegradability of chemical compounds
The end
A few books on ILP…• J. Lloyd. Logic for learning:
learning comprehensible theories from structured data. 2003.
• S. Dzeroski and N. Lavrac, editors. Relational Data Mining. September 2001.
• L. De Raedt, editor. Advances in Inductive Logic Programming. 1996.
• N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. 1994.