Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 2 times |
MACHINE MACHINE LEARNINGLEARNING
2
What is learning?What is learning?
A computer program learns if it A computer program learns if it improves its performance at some improves its performance at some task through experience (T. task through experience (T. Mitchell, 1997)Mitchell, 1997)
Any change in a system that allows Any change in a system that allows it to perform better (Simon 1983)it to perform better (Simon 1983)
3
What do we learnWhat do we learn::DescriptionsRules how to
recognize/classify objects, states, events
Rules how to transform an initial situation to achieve a goal (final state)
4
How do we learnHow do we learn:: Rote learning - storage of computed information. Taking advice from others. (Advice may need to be
operationalized.) Learning from problem solving experiences -
remembering experiences and generalizing from them. (May add efficiency but not new knowledge.)
Learning from examples. (May or may not involve a teacher.)
Learning by experimentation and discovery. (Decreasing burden on teacher, increasing burden on learner.)
5
Approaches to Machine Approaches to Machine LearningLearning
• Symbol-based• Connectionist Learning• Evolutionary learning
6
Inductive Symbol-Based Inductive Symbol-Based Machine LearningMachine Learning
Concept LearningConcept Learning
Version space searchVersion space search Decision trees: ID3 algorithmDecision trees: ID3 algorithm Explanation-based learningExplanation-based learning Supervised learningSupervised learning Reinforcement learningReinforcement learning
7
Version space search for Version space search for concept learningconcept learning
Concepts – describe classes of Concepts – describe classes of objectsobjects
Concepts consist of feature setsConcepts consist of feature sets Operation on concept Operation on concept
descriptionsdescriptions Generalization:Generalization: Replace a feature with Replace a feature with
a variablea variable Specialization:Specialization: Instantiate a variable Instantiate a variable
with a featurewith a feature
8
Positive and Negative Positive and Negative examples of a conceptexamples of a concept
The concept description has to The concept description has to match all positive examplesmatch all positive examples
The concept description has to The concept description has to be false for the negative be false for the negative examplesexamples
9
Plausible descriptionsPlausible descriptions
The version space represents all the alternative plausible descriptions of the concept
A plausible description is one that is applicable to all known positive examples and no known negative example.
10
Basic IdeaBasic Idea
Given:Given: A representation languageA representation language A set of positive and negative examples A set of positive and negative examples
expressed in that languageexpressed in that language
Compute:Compute: A concept description that is A concept description that is consistent with all the positive examples consistent with all the positive examples and none of the negative examplesand none of the negative examples
11
HypothesesHypotheses
The version space contains two sets of hypotheses:
G – the most general hypotheses that match the training data
S – the most specific hypotheses that match the training data
Each hypothesis is represented as a vector of values of the known attributes
12
Example of Version spaceExample of Version space
Consider the task to obtain a description of the concept: Japanese Economy car.
The attributes under consideration are:
Origin, Manufacturer, Color, Decade, Origin, Manufacturer, Color, Decade, TypeType
training data:
Positive ex: (Japan, Honda, Blue, 1980, Economy)
Positive ex: (Japan, Honda, White, 1980, Economy)
Negative ex: (Japan, Toyota, Green, 1970, Sports)
13
Example continuedExample continued
The most general hypothesis that matches the positive data and does not match the negative data, is:
(?, Honda, ?, ?, Economy) the symbol ‘?’ means that the attribute may take any value
The most specific hypothesis that matches the positive examples is:
(Japan, Honda, ?,1980, Economy)
14
Algorithm: Candidate Algorithm: Candidate eliminationelimination
Initialize G to contain one element: the most general description (all features are variables).
Initialize S to empty. Accept a new training example.
15
Process positive examplesProcess positive examples
Remove from G any descriptions that do not cover the example.
Generalize S as little as possible so that the new training example is covered.
Remove from S all elements that cover negative examples.
16
Process negative examplesProcess negative examples
Remove from S any descriptions that cover the negative example. Specialize G as little as possible so that the negative example is not covered.
Remove from G all elements that do not cover the positive examples.
17
Algorithm continuedAlgorithm continuedContinue processing new training examples,
until one of the following occurs: Either Either SS or or GG become empty become empty, there are no
consistent hypotheses over the training space. Stop.
SS and and GG are both singleton sets are both singleton sets. if they are identical, output their value and stop. if they are different, the training cases were
inconsistent. Output this result and stop. No more training examples. G has several
hypotheses. The version space is a disjunction of hypotheses. If
for a new example the hypotheses agree, then we can classify the example. If they disagree we can take the majority vote
18
Learning the concept of Learning the concept of "Japanese economy car""Japanese economy car"
Features: Origin, Manufacturer, Color, Decade, Type
POSITIVE EXAMPLE: (Japan, Honda, Blue, 1980, Economy)
Initialize G to singleton set that includes everything
Initialize S to singleton set that includes first positive example G = {(?, ?, ?, ?, ?)}
S = {(Japan, Honda, Blue, 1980, Economy)}
19
Example continuedExample continued
NEGATIVE EXAMPLE: (Japan, Toyota, Green, 1970, Sports)
Specialize G to exclude negative example G = {(?, Honda, ?, ?, ?), (?, ?, Blue, ?, ?) (?, ?, ?, 1980, ?) (?, ?, ?, ?, Economy)} S = {(Japan, Honda, Blue, 1980, Economy)}
20
Example continuedExample continued
POSITIVE EXAMPLE: (Japan, Toyota, Blue, 1990, Economy)
Remove from G descriptions inconsistent with positive example
Generalize S to include positive example G = { (?, ?, Blue, ?, ?)
(?, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}
21
Example continuedExample continued
NEGATIVE EXAMPLE: (USA, Chrysler, Red, 1980, Economy)
Specialize G to exclude negative example (but staying within version space, i.e., staying consistent with S)
G = {(?, ?, Blue, ?, ?) (Japan, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}
22
Example continuedExample continued
POSITIVE EXAMPLE: (Japan, Honda, White, 1980, Economy)
Remove from G descriptions inconsistent with positive example
Generalize S to include the positive example
G = {(Japan, ?, ?, ?, Economy)} S = {(Japan, ?, ?, ?, Economy)} S = G, both singleton => done!
23
Decision treesDecision trees
A decision tree is a structure that represents a procedure for classifying objects based on their attributes.
Each object is represented as a set of attribute/value pairs and a classification.
24
ExampleExample
A set of medical symptoms might be represented as follows:
Cough Fever Weight Pain Classification Mary no yes normal throat flu Fred no yes normal abdomen appendicitis Julie yes yes skinny none flu Elvis yes no obese chest heart disease
The system is given a set of training instances along with their correct classifications and develops a decision tree based on these examples.
25
AttributesAttributes
If a crucial attribute is not represented, then no decision tree will be able to learn the concept.
If two training instances have the same representation but belong to different classes, then the attribute set is said to be inadequate. It is impossible for the decision tree to distinguish the instances.
26
ID3 Algorithm (ID3 Algorithm (Quinlan, 1986)
ID3(R, C, S) // R – list of attributes, // C – categorical attribute, S - examples
If all examples from S belong to the same class Cj , return a leaf labeled Cj
If R is empty return a node with the most frequent value of C
Else select the select the “best” decision attribute “best” decision attribute AA in R with values in R with values vv1, 1, vv2, 2,
…, …, vn vn for next nodefor next node divide the training set divide the training set SS into into SS1, …, 1, …, SnSn according to values according to values
vv1,…,1,…,vnvn Call ID3 (R – {A}, C, S1), ID3(R – {A}, C, S2), … ID3(R – {A}, Call ID3 (R – {A}, C, S1), ID3(R – {A}, C, S2), … ID3(R – {A},
C, Sn), i.e. recursively build subtrees C, Sn), i.e. recursively build subtrees TT1, …, 1, …, TnTn for for SS1, …, 1, …, SnSn Return a Return a node labellednode labelled AA with children the subtrees with children the subtrees T1, T2, … T1, T2, …
TnTn
27
EntropyEntropy SS - a sample of training examples
Entropy (S ) = expected number of bits needed to encode the classification of an arbitrary member of S
Information theory: optimal length code assigns-log2 p bits to message having probability p
Generally for c different classesEntropy(S)
c(- pi * log2 pi)
28
Entropy of the Training Entropy of the Training SetSet
T : a set of records partitioned into C1, C2, …, Ck on the bases of the categorical attribute C.
Probability of each class Pi = Ci / T Info(T) = -p1*Log(P1) - … - Pk*log(Pk)
Info (T) is the information needed to classify an Info (T) is the information needed to classify an element.element.
29
How much helpful is an How much helpful is an attribute?attribute?
X : a non-categorical attribute, T = {T1,…,Tn} is the split of T according to X
The entropy of each Tk is:
Info(Tk) = - (Tk1 / Tk)* log(Tk1 / Tk) - …
- (T kc / Tk)*log(Tkc / Tk )
where c is the number of partitions in Tk produced by
the categorical attribute C
For any k, Info(Tk) reflects how the categorical attribute C splits the set Tk
30
Information GainInformation Gain
Info(X,T) = T1/T * Info(T1) +
T2/T * Info(T2) +
…. + Tn /T * Info(Tn)
Gain(X,T) = Info(T) – Info(X,T) =
Entropy(T) - i (Ti/T)*Entropy(Ti)
31
Information GainInformation Gain Gain(X,T) - the expected reduction in
entropy caused by partitioning the examples of T according to the attribute X.
Gain(X,T) - a measure of the effectiveness of an attribute in classifying the training data
The best attribute has maximal Gain(X,T)
32
Example (1)Example (1)
NameName HairHair HeightHeight WeightWeight LotionLotion Result
Sarah blonde average light nosunburned (positive)
Dana blonde tall average yesnone (negative)
Alex brown short average yes none
Annie blonde short average no sunburned
Emily red average heavy no sunburned
Pete brown tall heavy no none
John brown average heavy no none
Katie blonde short light yes none
33
Example (2)Example (2) Attribute “hair”
Blonde: T1 = {Sara, Dana, Annie, Katie}Brown: T2 = {Alex, Pete, John}Red: T3 = { Emily}
T1 is split by C into 2 sets: T11 = {Sarah, Annie}, T12 = {Dana, Katie}
Info(T1) = - 2/4 * log(2/4) – 2/4* log(2/4) = -log(1/2) = 1 In a similar way we compute Info(T2) = 0, Info(T3) = 0
Info(‘hair’,T) = T1/T * Info(T1) + T2/T * Info(T2) + T3 /T *Info(T3)
= 4/8 * Info(T1) + 3/8* Info(T2) + 1/8 * Info(T3) =
= 4/8 * 1 = 0.50
This happens to be the best attribute
34
Example (3)Example (3)
red
yes
Hair color
blonde
brown
Lotion
no
sunburn
none
sunburn
none
35
Split RatioSplit Ratio
GainRatio(D,T) = GainRatio(D,T) =
Gain(D,T) / SplitInfo(D,T)Gain(D,T) / SplitInfo(D,T)
where where SplitInfo(D,T)SplitInfo(D,T) is the is the information due to the split of T when information due to the split of T when D is considered categorical attributeD is considered categorical attribute
36
Split Ratio TreeSplit Ratio Tree
brown
blondered
lotion
no
yes
Color
nonenone
sunburn
none
37
More Training Examples