Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | reynard-ellis |
View: | 218 times |
Download: | 0 times |
Inductive Learning Inductive Learning (1/2)(1/2)
Decision Tree MethodDecision Tree Method
Russell and Norvig: Chapter 18, Sections 18.1 through 18.4Chapter 18, Sections 18.1 through 18.3
CS121 – Winter 2003
QuotesQuotes
“Our experience of the world is specific, yet we are able to formulate general theories that account for the past and predict the future” Genesereth and Nilsson, Logical Foundations of AI, 1987 “Entities are not to be multiplied without necessity”Ockham, 1285-1349
Learning AgentLearning Agent
environmentagent
?
sensors
actuators
Problemsolver
Learningelement
Critic Percepts
Actions
KB
ContentsContents
Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method
Function-based inductive learning Neural nets
ContentsContents
Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method
+ why inductive learning works
Function-based inductive learning Neural nets
1
2
3
Inductive Learning Inductive Learning FrameworksFrameworks
1. Function-learning formulation
2. Logic-inference formulation
Function-Learning Function-Learning FormulationFormulation
Goal function fTraining set: (xi, f(xi)), i = 1,…,n
Inductive inference: Find a function h that fits the point well
Neural nets
f(x)
x
Logic-Inference Logic-Inference FormulationFormulationBackground knowledge KBTraining set (observed knowledge) such that KB and {KB, }is satisfiable Inductive inference: Find h(inductive hypothesis) such that {KB, h} is satisfiable KB,h
Usually, not a sound inference
h = is a trivial,but uninteresting solution (data caching)
Rewarded Card ExampleRewarded Card Example
Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”Background knowledge KB: ((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set :REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S])
Rewarded Card ExampleRewarded Card Example
Background knowledge KB: ((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set :REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S])
Possible inductive hypothesis:h (NUM(r) BLACK(s) REWARD([r,s]))
There are several possible inductive hypotheses
Learning a PredicateLearning a Predicate
Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)
Example: CONCEPT describes the precondition of an action, e.g., Unstack(C,A) • E is the set of states• CONCEPT(x)
HANDEMPTYx, BLOCK(C) x, BLOCK(A) x, CLEAR(C) x, ON(C,A) xLearning CONCEPT is a step toward learning the action
Learning a PredicateLearning a Predicate
Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … (e.g., NUM, RED) Training set: values of CONCEPT for some combinations of values of the observable predicates
A Possible Training SetA Possible Training SetEx. # A B C D E CONCEP
T
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether an observable predicate A, …, E is pertinent or not
Learning a PredicateLearning a Predicate
Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … (e.g., NUM, RED) Training set: values of CONCEPT for some combinations of values of the observable predicates Find a representation of CONCEPT in the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x) A(x) (B(x) v C(x))
Learning the concept of an Learning the concept of an ArchArch
These aren’t: (negative examples)
ARCH(x) HAS-PART(x,b1) HAS-PART(x,b2) HAS-PART(x,b3) IS-A(b1,BRICK) IS-A(b2,BRICK) MEET(b1,b2) (IS-A(b3,BRICK) v IS-A(b3,WEDGE)) SUPPORTED(b3,b1) SUPPORTED(b3,b2)
These objects are arches:(positive examples)
Example setExample set
An example consists of the values of CONCEPT and the observable predicates for some object x A example is positive if CONCEPT is True, else it is negative The set E of all examples is the example set The training set is a subset of E
a small one!
An hypothesis is any sentence h of the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates The set of all hypotheses is called the hypothesis space H An hypothesis h agrees with an example if it gives the correct value of CONCEPT
Hypothesis SpaceHypothesis Space
It is called a space becauseit has some internal structure
+
++
+
+
+
+
++
+
+
+ -
-
-
-
-
-
-
- --
-
-
Example set X{[A, B, …, CONCEPT]}
Inductive Learning Inductive Learning SchemeScheme
Hypothesis space H{[CONCEPT(x) S(A,B, …)]}
Training set Inductive
hypothesis h
Size of Hypothesis Size of Hypothesis SpaceSpace
n observable predicates 2n entries in truth table In the absence of any restriction (bias), there are hypotheses to choose from n = 6 2x1019 hypotheses!
22n
Rewarded Card ExampleRewarded Card Example
Background knowledge KB:((r=1) v … v (r=10)) NUM([r,s])((r=J) v (r=Q) v (r=K)) FACE([r,s])((s=S) v (s=C)) BLACK([r,s])((s=D) v (s=H)) RED([r,s])
Training set :REWARD([4,C]) REWARD([7,C]) REWARD([2,S])
REWARD([5,H]) REWARD([J ,S])
Possible inductive hypothesis:h (NUM(x) BLACK(x) REWARD(x))
h1 NUM(x) BLACK(x) REWARD(x)h2 BLACK([r,s]) (r=J) REWARD([r,s])h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S])
REWARD([r,s])h3 ([r,s]=[5,H]) ([r,s]=[J,S]) REWARD([r,s])agree with all the examples in the training set
Multiple Inductive Multiple Inductive HypothesesHypotheses
Need for a system of preferences – called a bias – to compare possible hypotheses
Keep-It-Simple (KIS) BiasKeep-It-Simple (KIS) Bias
Motivation If an hypothesis is too complex it may not be worth
learning it (data caching might just do the job as well) There are much fewer simple hypotheses than complex
ones, hence the hypothesis space is smaller
Examples: Use much fewer observable predicates than suggested
by the training set Constrain the learnt predicate, e.g., to use only “high-
level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax (e.g., conjunction of literals)
If the bias allows only sentences S that areconjunctions of k << n predicates picked fromthe n observable predicates, then the size of H is O(nk)
Putting Things TogetherPutting Things Together
Object set
Goal predicate
Observable predicates
Example set X
Training set
Testset
Inducedhypothesis h
Learningprocedure L
Evaluationyes
no
Bias
Hypothesisspace H
Predicate as a Decision Predicate as a Decision TreeTree
The predicate CONCEPT(x) A(x) (B(x) v C(x)) can be represented by the following decision tree:
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
Example:A mushroom is poisonous iffit is yellow and small, or yellow, big and spotted• x is a mushroom• CONCEPT = POISONOUS• A = YELLOW• B = BIG• C = SPOTTED
Predicate as a Decision Predicate as a Decision TreeTree
The predicate CONCEPT(x) A(x) (B(x) v C(x)) can be represented by the following decision tree:
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
Example:A mushroom is poisonous iffit is yellow and small, or yellow, big and spotted• x is a mushroom• CONCEPT = POISONOUS• A = YELLOW• B = BIG• C = SPOTTED• D = FUNNEL-CAP• E = BULKY
Training SetTraining SetEx. # A B C D E CONCEP
T
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
TrueTrueTrueTrueFalseTrue13
FalseTrueFalseFalseTrueTrue12
FalseFalseFalseFalseTrueTrue11
TrueTrueTrueTrueTrueTrue10
TrueTrueFalseTrueTrueTrue9
TrueTrueFalseTrueFalseTrue8
TrueFalseTrueFalseFalseTrue7
TrueFalseFalseTrueFalseTrue6
FalseTrueTrueFalseFalseFalse5
FalseFalseFalseTrueFalseFalse4
FalseTrueTrueTrueTrueFalse3
FalseFalseFalseFalseTrueFalse2
FalseTrueFalseTrueFalseFalse1
CONCEPTEDCBAEx. #
Possible Decision TreePossible Decision TreeD
CE
B
E
AA
A
T
F
F
FF
F
T
T
T
TT
Possible Decision TreePossible Decision TreeD
CE
B
E
AA
A
T
F
F
FF
F
T
T
T
TT
CONCEPT (D (E v A)) v (C (B v ((E A) v A)))
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
CONCEPT A (B v C)
KIS bias Build smallest decision tree
Computationally intractable problem greedy algorithm
Getting StartedGetting Started
True: 6, 7, 8, 9, 10,13False: 1, 2, 3, 4, 5, 11, 12
The distribution of the training set is:
Ex. # A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
Getting StartedGetting Started
True: 6, 7, 8, 9, 10,13False: 1, 2, 3, 4, 5, 11, 12
The distribution of training set is:
Without testing any observable predicate, wecould report that CONCEPT is False (majority rule) with an estimated probability of error P(E) = 6/13
Getting StartedGetting Started
True: 6, 7, 8, 9, 10,13False: 1, 2, 3, 4, 5, 11, 12
The distribution of training set is:
Without testing any observable predicate, wecould report that CONCEPT is False (majority rule)with an estimated probability of error P(E) = 6/13
Assuming that we will only include one observable predicate in the decision tree, which predicateshould we test to minimize the probability or error?
Assume It’s AAssume It’s A
A
True:False:
6, 7, 8, 9, 10, 1311, 12 1, 2, 3, 4, 5
T F
If we test only A, we will report that CONCEPT is Trueif A is True (majority rule) and False otherwise
The estimated probability of error is:Pr(E) = (8/13)x(2/8) + (5/8)x0 = 2/13
Assume It’s BAssume It’s B
B
True:False:
9, 102, 3, 11, 12 1, 4, 5
T F
If we test only B, we will report that CONCEPT is Falseif B is True and True otherwise
The estimated probability of error is:Pr(E) = (6/13)x(2/6) + (7/13)x(3/7) = 5/13
6, 7, 8, 13
Assume It’s CAssume It’s C
C
True:False:
6, 8, 9, 10, 131, 3, 4 1, 5, 11, 12
T F
If we test only C, we will report that CONCEPT is Trueif C is True and False otherwise
The estimated probability of error is:Pr(E) = (8/13)x(3/8) + (5/13)x(1/5) = 4/13
7
Assume It’s DAssume It’s D
D
T F
If we test only D, we will report that CONCEPT is Trueif D is True and False otherwise
The estimated probability of error is: Pr(E) = (5/13)x(2/5) + (8/13)x(3/8) = 5/13
True:False:
7, 10, 133, 5 1, 2, 4, 11, 12
6, 8, 9
Assume It’s EAssume It’s E
E
True:False:
8, 9, 10, 131, 3, 5, 12 2, 4, 11
T F
If we test only E we will report that CONCEPT is False,independent of the outcome
The estimated probability of error is unchanged: Pr(E) = (8/13)x(4/8) + (5/13)x(2/5) = 6/13
6, 7
So, the best predicate to test is A
Choice of Second Choice of Second PredicatePredicate
A
T F
The majority rule gives the probability of error Pr(E|A) = 1/8and Pr(E) = 1/13
C
True:False:
6, 8, 9, 10, 1311, 127
T FFalse
Final TreeFinal Tree
A
CTrue
True
True BTrue
TrueFalse
False
FalseFalse
False
A?
B?
C?True
True
True
True
FalseTrueFalse
FalseFalse
False
L CONCEPT A (C v B)
Learning a Decision TreeLearning a Decision Tree
DTL(,Predicates)1. If all examples in are positive then return True2. If all examples in are negative then return False3. If Predicates is empty then return failure4. A most discriminating predicate in Predicates5. Return the tree whose:
- root is A, - left branch is DTL(+A,Predicates-A), - right branch is DTL(-A,Predicates-A)
Subset of examples that satisfy A
Noise in training set!May return majority rule,
instead of failure
Using Information TheoryUsing Information Theory
Rather than minimizing the probability of error, most existing learning procedures try to minimize the expected number of questions needed to decide if an object x satisfies CONCEPT This minimization is based on a measure of the “quantity of information” that is contained in the truth value of an observable predicate
Miscellaneous IssuesMiscellaneous Issues
Assessing performance: Training set and test set Learning curve
size of training set% c
orr
ect
on t
est
set 100
Typical learning curve
Miscellaneous IssuesMiscellaneous Issues
Assessing performance: Training set and test set Learning curve
Overfitting Tree pruning
Risk of using irrelevantobservable predicates togenerate an hypothesis
that agrees with all examplesin the training set
Terminate recursion wheninformation gain is too small
The resulting decision tree + majority rule may not classify correctly all examples in the training set
Miscellaneous IssuesMiscellaneous Issues
Assessing performance: Training set and test set Learning curve
Overfitting Tree pruning Cross-validation
Missing data
The value of an observablepredicate P is unknown foran example x. Then construct a decision tree for both valuesof P and select the value that ends up classifying x in the largest class
Select threshold that maximizesinformation gain
Miscellaneous IssuesMiscellaneous Issues
Assessing performance: Training set and test set Learning curve
Overfitting Tree pruning Cross-validation
Missing data Multi-valued and continuous attributes
These issues occur with virtually any learning method
Multi-Valued AttributesMulti-Valued AttributesWillWait predicate (Russell and Norvig)
Patrons?
Hungry?
Type?
FriSat?
No
None
False
FullSome
False
False
False
Yes
True
True
True
ThaiItalian Burger
Multi-valued attributes
Applications of Decision Applications of Decision TreeTree
Medical diagnostic / Drug design Evaluation of geological systems for assessing gas and oil basins Early detection of problems (e.g., jamming) during oil drilling operations Automatic generation of rules in expert systems