+ All Categories
Home > Documents > Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon [email protected].

Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon [email protected].

Date post: 18-Jan-2016
Category:
Upload: griselda-johnston
View: 219 times
Download: 0 times
Share this document with a friend
41
Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon [email protected]
Transcript
Page 1: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Decision Trees

10-601 Recitation1/17/08

Mary [email protected]

Page 2: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Announcements•HW 1 out- DTs and basic probability

• Due Mon, Jan 28 at start of class

•Matlab

• High-level language, specialized for matrices

• Built-in plotting software, lots of math libraries

• On campus lab machines

• Interest in tutorial?

•Smiley Award Plug

Page 3: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.
Page 4: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

AttendClass?RainingRaining

Is10601Is10601 Yes

True False

True False

Yes MateriaMateriall

New Old

Before1Before100

No

Yes

True False

No

Represent as a logical expression.

Page 5: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

AttendClass?RainingRaining

Is10601Is10601 Yes

True False

True False

Yes MateriaMateriall

New Old

Before1Before100

No

Yes

True False

No

Represent as a logical expression.

AttendClass = Yes if:(Raining = False) OR(Is10601 = True) OR(Material = New AND

Before10 =False)

Page 6: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Split decisions

•There are other trees logically equivalent.

•How do we know which one to use?

Page 7: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Split decisions

•There are other trees logically equivalent.

•How do we know which one to use?

•Depends on what is important to us.

Page 8: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Information Gain• Classically we rely on “information gain”,

which uses the principle that we want to use the least number of bits, on average, to get our idea across.

• Suppose I want to send a weather forecast with 4 possible outcomes: Rain, Sun, Snow, and Tornado. 4 outcomes = 2 bits.

• In Pittsburgh there’s Rain 90% of the time, Snow 5%, Sun 4.9%, and Tornado .01%. So if you assign Rain to a 1-bit message, you rarely send >1 bit.

Page 9: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Entropy

Page 10: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Entropy

Rain Is10601 Before10 Material Attend

+ + - New +

+ - + New +

+ - Old -

+ - -

- - +

- +

- +

- +

Set S has 6 positive, 2 negative examples.

H(S) = -.75 log2(.75) - .25 log2(.25) =

Page 11: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Conditional Entropy

“The average number of bits it would take to encode a message Y, given knowledge of X”

Page 12: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Conditional Entropy

Rain Is10601 Before10 Material Attend

+ + - New +

+ - + New +

+ - Old -

+ - -

- - +

- +

- +

- +

H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) +

H(Attend|Rain=F)*P(Rain=F)

Page 13: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Conditional Entropy

Rain Is10601 Before10 Material Attend

+ + - New +

+ - + New +

+ - Old -

+ - -

- - +

- +

- +

- +

H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)=

1 * 0.5 + 0 * 0.5 = 0.5

Entropy of this set = 1

Entropy of this set = 0

Page 14: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Information Gain

“How much conditioning on attribute A increases our knowledge (decreases entropy) of

S.

IG(S,A) = H(S) - H(S|A)

Page 15: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Information GainIG(Attend,Rain) =

H(Attend) - H(Attend|Rain)=

.8113 - .5 = .3113

Rain Is10601 Before10 Material Attend

+ + - New +

+ - + New +

+ - Old -

+ - -

- - +

- +

- +

- +

Page 16: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

What about this?

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601Yes Yes

True False

True False

Yes No

For some dataset, could we ever build this DT?

Page 17: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

What about this?

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601Yes Yes

True False

True False

Yes No

For some dataset, could we ever build this DT?

What if you were taking 20 classes, and it rains 90% of

the time?

Page 18: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

What about this?

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601Yes Yes

True False

True False

Yes No

For some dataset, could we ever build this DT?

What if you were taking 20 classes, and it rains 90% of

the time?

If most information is gained from If most information is gained from Material or Before10, we won’t Material or Before10, we won’t

ever need to traverse to 10-601.ever need to traverse to 10-601.So even a bigger tree (node-wise) So even a bigger tree (node-wise) may be “simpler”, for some sets may be “simpler”, for some sets

of data.of data.

Page 19: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

•Until further pruning is harmful,

•For each node n in trained tree T,

•Let Tn’ be T without n (and descendents). Assign removed node to be “best choice” under that traversal.

•Record error of Tn’ on validation set.

•Let T= Tk’ where Tk’ is pruned tree with best performance on validation set.

Page 20: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation.

Page 21: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation.Let’s test this node...

Page 22: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation. Text

Suppose that most examples where

Material=New and Before10=True are

“Yes”. Our new subtree has “Yes”

here.

Yes

Page 23: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation. Text

Suppose that most examples where

Material=New and Before10=True are

“Yes”. Our new subtree has “Yes”

here.

Yes

Now, test this tree!

Page 24: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation. Text

Suppose that most examples where

Material=New and Before10=True are

“Yes”. Our new subtree has “Yes”

here.

Yes

Now, test this tree!

Page 25: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

For each node, record performance on

validation set of tree without node.

Suppose our initial tree has 0.7

accurate performance on

validation. Text

Suppose that most examples where

Material=New and Before10=True are

“Yes”. Our new subtree has “Yes”

here.

Yes

Suppose we get accuracy of 0.73 on this pruned tree. Repeat the test procedure by removing a

different node from the original tree...

Page 26: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

Try this tree (with a different node pruned)...

Page 27: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

New Old

True False

Yes Yes

True False

NoRainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

Try this tree (with a different node pruned)...

Now, test this tree and record its accuracy.

Page 28: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Node-based pruning

RainingRaining

MateriaMateriall

Before1Before100

New Old

True False

Yes Yes

True False

NoRainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

Try this tree (with a different node pruned)...

Now, test this tree and record its accuracy.

Once we test all possible Once we test all possible prunings, modify our tree T prunings, modify our tree T

with the pruning that has the with the pruning that has the best performance.best performance.

Repeat the entire pruning Repeat the entire pruning selection procedure on new selection procedure on new

T, replacing T each time with T, replacing T each time with the best performing pruned the best performing pruned tree, until we no longer gain tree, until we no longer gain

anything by pruning.anything by pruning.

Page 29: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruning

RainingRaining

MateriaMateriall

Before1Before100

Is10601Is10601

New Old

True False

Yes Yes

True False

True False

Yes No

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes No

1. Convert tree to rules, one for each leaf:

IF Material=Old AND Raining = False THEN

Attend = Yes

IF Material=Old AND Raining=True AND Is601=True THEN

Attend=Yes...

Page 30: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruning

2. Prune each rule. For instance, to prune this rule:

IF Material=Old AND Raining = F THEN Attend = T

Test potential rule without preconditions on validation set, compare to performance of original rule on set.

IF Material=OLD THEN Attend=TIF Raining=F THEN Attend = T

Page 31: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruning

Suppose we got the following accuracy for each rule:

IF Material=Old AND Raining = F THEN Attend = T -- 0.6IF Material=OLD THEN Attend=T -- 0.5IF Raining=F THEN Attend = T -- 0.7

Page 32: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruning

Suppose we got the following accuracy for each rule:

IF Material=Old AND Raining = F THEN Attend = T -- 0.6IF Material=OLD THEN Attend=T -- 0.5IF Raining=F THEN Attend = T -- 0.7

Then, we would keep the best one and drop the others.

Page 33: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruningRepeat for next rule, comparing the original

rule with each rule with one precondition removed.

IF Material=Old AND Raining=T AND Is601=T then Attend=TIf Material=Old AND Raining=T then Attend=TIf Material=Old AND Is601=T then Attend=TIf Raining=T and Is601=T then Attend=T

Page 34: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruningRepeat for next rule, comparing the original

rule with each rule with one precondition removed.

IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6If Material=Old AND Raining=T then Attend=T-- 0.7If Material=Old AND Is601=T then Attend=T-- 0.3If Raining=T and Is601=T then Attend=T-- 0.65

Page 35: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruningRepeat for next rule, comparing the original rule

with each rule with one precondition removed.

IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6If Material=Old AND Raining=T then Attend=T-- 0.7If Material=Old AND Is601=T then Attend=T-- 0.3If Raining=T and Is601=T then Attend=T-- 0.65

If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf.If Material=Old AND Raining=T then Attend=T-- 0.7If Material=Old then Attend=T-- 0.3If Raining = T then Attend = T-- 0.2

Page 36: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruningRepeat for next rule, comparing the original rule

with each rule with one precondition removed.

IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6If Material=Old AND Raining=T then Attend=T-- 0.75If Material=Old AND Is601=T then Attend=T-- 0.3If Raining=T and Is601=T then Attend=T-- 0.65

If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf.If Material=Old AND Raining=T then Attend=T-- 0.75If Material=Old then Attend=T-- 0.3If Raining = T then Attend = T-- 0.2

Well, maybe Well, maybe not this time!not this time!

Page 37: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Rule-based pruning

Once we have done the same pruning procedure for each rule in the tree....

3. Order the ‘kept rules’ by their accuracy, and do all subsequent classification with that priority.

-IF Material=Old AND Raining=T THEN Attend=T-- 0.75-IF Raining=F THEN Attend = T -- 0.7-....(and so on for other pruned rules)...

(Note that you may wind up with a differently-structured DT than before, as discussed in class)

Page 38: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Adding randomness

Rain Is601Materia

lBefore

10Attend

?

T F ??? F

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes MateriaMateriall

New Old

Before1Before100

No

Yes

True False

No

What if you didn’t know if you had new material? For instance, you wanted

to classify this:

Page 39: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Adding randomness

Rain Is601Materia

lBefore

10Attend

?

T F ??? F

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes MateriaMateriall

New Old

Before1Before100

No

Yes

True False

No

What if you didn’t know if you had new material? For instance, you wanted

to classify this:

where to go?

You could look at training set, and see that when Rain=T an 10601=F, p fraction of the examples had new material. Then flip a p-biased coin

and descend the appropriate branch. But that might not be the

best idea. Why not?

Page 40: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Adding randomness

Also, you may have missing data in the training set.

Rain Is601Materia

lBefore

10Attend

?

T F ??? F T

There are also methods to deal with this using probability.

“Well, 60% of the time when Rain and not 601, there’s new material (when we know

there is new material). So we’ll just randomly select 60% of rainy, non-601

examples where we don’t know the material, to be old material.

RainingRaining

Is10601Is10601 Yes

True False

True False

Yes ??

Page 41: Decision Trees 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu.

Adventures in Probability

• That approach tends to work well. Still, we may have the following trouble.

• What if there aren’t very many training examples where Rain = True and 10601=False? Wouldn’t we still want to use examples where Rain=False to get the missing value?

• Well, it “depends”. Stay tuned for lecture next week!


Recommended