+ All Categories
Home > Documents > MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at...

MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at...

Date post: 22-Dec-2015
Category:
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
37
MACHINE MACHINE LEARNING LEARNING
Transcript
Page 1: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

MACHINE MACHINE LEARNINGLEARNING

Page 2: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

2

What is learning?What is learning?

A computer program learns if it A computer program learns if it improves its performance at some improves its performance at some task through experience (T. task through experience (T. Mitchell, 1997)Mitchell, 1997)

Any change in a system that allows Any change in a system that allows it to perform better (Simon 1983)it to perform better (Simon 1983)

Page 3: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

3

What do we learnWhat do we learn::DescriptionsRules how to

recognize/classify objects, states, events

Rules how to transform an initial situation to achieve a goal (final state)

Page 4: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

4

How do we learnHow do we learn:: Rote learning - storage of computed information. Taking advice from others. (Advice may need to be

operationalized.) Learning from problem solving experiences -

remembering experiences and generalizing from them. (May add efficiency but not new knowledge.)

Learning from examples. (May or may not involve a teacher.)

Learning by experimentation and discovery. (Decreasing burden on teacher, increasing burden on learner.)

Page 5: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

5

Approaches to Machine Approaches to Machine LearningLearning

• Symbol-based• Connectionist Learning• Evolutionary learning

Page 6: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

6

Inductive Symbol-Based Inductive Symbol-Based Machine LearningMachine Learning

Concept LearningConcept Learning

Version space searchVersion space search Decision trees: ID3 algorithmDecision trees: ID3 algorithm Explanation-based learningExplanation-based learning Supervised learningSupervised learning Reinforcement learningReinforcement learning

Page 7: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

7

Version space search for Version space search for concept learningconcept learning

Concepts – describe classes of Concepts – describe classes of objectsobjects

Concepts consist of feature setsConcepts consist of feature sets Operation on concept Operation on concept

descriptionsdescriptions Generalization:Generalization: Replace a feature with Replace a feature with

a variablea variable Specialization:Specialization: Instantiate a variable Instantiate a variable

with a featurewith a feature

Page 8: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

8

Positive and Negative Positive and Negative examples of a conceptexamples of a concept

The concept description has to The concept description has to match all positive examplesmatch all positive examples

The concept description has to The concept description has to be false for the negative be false for the negative examplesexamples

Page 9: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

9

Plausible descriptionsPlausible descriptions

The version space represents all the alternative plausible descriptions of the concept

A plausible description is one that is applicable to all known positive examples and no known negative example.

Page 10: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

10

Basic IdeaBasic Idea

Given:Given: A representation languageA representation language A set of positive and negative examples A set of positive and negative examples

expressed in that languageexpressed in that language

Compute:Compute: A concept description that is A concept description that is consistent with all the positive examples consistent with all the positive examples and none of the negative examplesand none of the negative examples

Page 11: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

11

HypothesesHypotheses

The version space contains two sets of hypotheses:

G – the most general hypotheses that match the training data

S – the most specific hypotheses that match the training data

Each hypothesis is represented as a vector of values of the known attributes

Page 12: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

12

Example of Version spaceExample of Version space

Consider the task to obtain a description of the concept: Japanese Economy car.

The attributes under consideration are:

Origin, Manufacturer, Color, Decade, Origin, Manufacturer, Color, Decade, TypeType

training data:

Positive ex: (Japan, Honda, Blue, 1980, Economy)

Positive ex: (Japan, Honda, White, 1980, Economy)

Negative ex: (Japan, Toyota, Green, 1970, Sports)

Page 13: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

13

Example continuedExample continued

The most general hypothesis that matches the positive data and does not match the negative data, is:

(?, Honda, ?, ?, Economy) the symbol ‘?’ means that the attribute may take any value

The most specific hypothesis that matches the positive examples is:

(Japan, Honda, ?,1980, Economy)

Page 14: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

14

Algorithm: Candidate Algorithm: Candidate eliminationelimination

Initialize G to contain one element: the most general description (all features are variables).

Initialize S to empty. Accept a new training example.

Page 15: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

15

Process positive examplesProcess positive examples

Remove from G any descriptions that do not cover the example.

Generalize S as little as possible so that the new training example is covered.

Remove from S all elements that cover negative examples.

Page 16: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

16

Process negative examplesProcess negative examples

Remove from S any descriptions that cover the negative example. Specialize G as little as possible so that the negative example is not covered.

Remove from G all elements that do not cover the positive examples.

Page 17: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

17

Algorithm continuedAlgorithm continuedContinue processing new training examples,

until one of the following occurs: Either Either SS or or GG become empty become empty, there are no

consistent hypotheses over the training space. Stop.

SS and and GG are both singleton sets are both singleton sets. if they are identical, output their value and stop. if they are different, the training cases were

inconsistent. Output this result and stop. No more training examples. G has several

hypotheses. The version space is a disjunction of hypotheses. If

for a new example the hypotheses agree, then we can classify the example. If they disagree we can take the majority vote

Page 18: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

18

Learning the concept of Learning the concept of "Japanese economy car""Japanese economy car"

Features: Origin, Manufacturer, Color, Decade, Type

POSITIVE EXAMPLE: (Japan, Honda, Blue, 1980, Economy)

Initialize G to singleton set that includes everything

Initialize S to singleton set that includes first positive example G = {(?, ?, ?, ?, ?)}

S = {(Japan, Honda, Blue, 1980, Economy)}

Page 19: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

19

Example continuedExample continued

NEGATIVE EXAMPLE: (Japan, Toyota, Green, 1970, Sports)

Specialize G to exclude negative example G = {(?, Honda, ?, ?, ?), (?, ?, Blue, ?, ?) (?, ?, ?, 1980, ?) (?, ?, ?, ?, Economy)} S = {(Japan, Honda, Blue, 1980, Economy)}

Page 20: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

20

Example continuedExample continued

POSITIVE EXAMPLE: (Japan, Toyota, Blue, 1990, Economy)

Remove from G descriptions inconsistent with positive example

Generalize S to include positive example G = { (?, ?, Blue, ?, ?)

(?, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}

Page 21: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

21

Example continuedExample continued

NEGATIVE EXAMPLE: (USA, Chrysler, Red, 1980, Economy)

Specialize G to exclude negative example (but staying within version space, i.e., staying consistent with S)

G = {(?, ?, Blue, ?, ?) (Japan, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}

Page 22: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

22

Example continuedExample continued

POSITIVE EXAMPLE: (Japan, Honda, White, 1980, Economy)

Remove from G descriptions inconsistent with positive example

Generalize S to include the positive example

G = {(Japan, ?, ?, ?, Economy)} S = {(Japan, ?, ?, ?, Economy)} S = G, both singleton => done!

Page 23: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

23

Decision treesDecision trees

A decision tree is a structure that represents a procedure for classifying objects based on their attributes.

Each object is represented as a set of attribute/value pairs and a classification.

Page 24: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

24

ExampleExample

A set of medical symptoms might be represented as follows:

Cough Fever Weight Pain Classification Mary no yes normal throat flu Fred no yes normal abdomen appendicitis Julie yes yes skinny none flu Elvis yes no obese chest heart disease

The system is given a set of training instances along with their correct classifications and develops a decision tree based on these examples.

Page 25: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

25

AttributesAttributes

If a crucial attribute is not represented, then no decision tree will be able to learn the concept.

If two training instances have the same representation but belong to different classes, then the attribute set is said to be inadequate. It is impossible for the decision tree to distinguish the instances.

Page 26: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

26

ID3 Algorithm (ID3 Algorithm (Quinlan, 1986)

ID3(R, C, S) // R – list of attributes, // C – categorical attribute, S - examples

If all examples from S belong to the same class Cj , return a leaf labeled Cj

If R is empty return a node with the most frequent value of C

Else select the select the “best” decision attribute “best” decision attribute AA in R with values in R with values vv1, 1, vv2, 2,

…, …, vn vn for next nodefor next node divide the training set divide the training set SS into into SS1, …, 1, …, SnSn according to values according to values

vv1,…,1,…,vnvn Call ID3 (R – {A}, C, S1), ID3(R – {A}, C, S2), … ID3(R – {A}, Call ID3 (R – {A}, C, S1), ID3(R – {A}, C, S2), … ID3(R – {A},

C, Sn), i.e. recursively build subtrees C, Sn), i.e. recursively build subtrees TT1, …, 1, …, TnTn for for SS1, …, 1, …, SnSn Return a Return a node labellednode labelled AA with children the subtrees with children the subtrees T1, T2, … T1, T2, …

TnTn

Page 27: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

27

EntropyEntropy SS - a sample of training examples

Entropy (S ) = expected number of bits needed to encode the classification of an arbitrary member of S

Information theory: optimal length code assigns-log2 p bits to message having probability p

Generally for c different classesEntropy(S)

c(- pi * log2 pi)

Page 28: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

28

Entropy of the Training Entropy of the Training SetSet

T : a set of records partitioned into C1, C2, …, Ck on the bases of the categorical attribute C.

Probability of each class Pi = Ci / T Info(T) = -p1*Log(P1) - … - Pk*log(Pk)

Info (T) is the information needed to classify an Info (T) is the information needed to classify an element.element.

Page 29: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

29

How much helpful is an How much helpful is an attribute?attribute?

X : a non-categorical attribute, T = {T1,…,Tn} is the split of T according to X

The entropy of each Tk is:

Info(Tk) = - (Tk1 / Tk)* log(Tk1 / Tk) - …

- (T kc / Tk)*log(Tkc / Tk )

where c is the number of partitions in Tk produced by

the categorical attribute C

For any k, Info(Tk) reflects how the categorical attribute C splits the set Tk

Page 30: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

30

Information GainInformation Gain

Info(X,T) = T1/T * Info(T1) +

T2/T * Info(T2) +

…. + Tn /T * Info(Tn)

Gain(X,T) = Info(T) – Info(X,T) =

Entropy(T) - i (Ti/T)*Entropy(Ti)

Page 31: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

31

Information GainInformation Gain Gain(X,T) - the expected reduction in

entropy caused by partitioning the examples of T according to the attribute X.

Gain(X,T) - a measure of the effectiveness of an attribute in classifying the training data

The best attribute has maximal Gain(X,T)

Page 32: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

32

Example (1)Example (1)

NameName HairHair HeightHeight WeightWeight LotionLotion Result

Sarah blonde average light nosunburned (positive)

Dana blonde tall average yesnone (negative)

Alex brown short average yes none

Annie blonde short average no sunburned

Emily red average heavy no sunburned

Pete brown tall heavy no none

John brown average heavy no none

Katie blonde short light yes none

Page 33: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

33

Example (2)Example (2) Attribute “hair”

Blonde: T1 = {Sara, Dana, Annie, Katie}Brown: T2 = {Alex, Pete, John}Red: T3 = { Emily}

T1 is split by C into 2 sets: T11 = {Sarah, Annie}, T12 = {Dana, Katie}

Info(T1) = - 2/4 * log(2/4) – 2/4* log(2/4) = -log(1/2) = 1 In a similar way we compute Info(T2) = 0, Info(T3) = 0

Info(‘hair’,T) = T1/T * Info(T1) + T2/T * Info(T2) + T3 /T *Info(T3)

= 4/8 * Info(T1) + 3/8* Info(T2) + 1/8 * Info(T3) =

= 4/8 * 1 = 0.50

This happens to be the best attribute

Page 34: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

34

Example (3)Example (3)

red

yes

Hair color

blonde

brown

Lotion

no

sunburn

none

sunburn

none

Page 35: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

35

Split RatioSplit Ratio

GainRatio(D,T) = GainRatio(D,T) =

Gain(D,T) / SplitInfo(D,T)Gain(D,T) / SplitInfo(D,T)

where where SplitInfo(D,T)SplitInfo(D,T) is the is the information due to the split of T when information due to the split of T when D is considered categorical attributeD is considered categorical attribute

Page 36: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

36

Split Ratio TreeSplit Ratio Tree

brown

blondered

lotion

no

yes

Color

nonenone

sunburn

none

Page 37: MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

37

More Training Examples


Recommended