+ All Categories
Home > Documents > Decision Trees

Decision Trees

Date post: 14-Mar-2016
Category:
Upload: erica-nolan
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Decision Trees. Decision Tree. DT Inducer. What’s the DT?. Root. Outlook!, humidity, wind, temp. Version Space VS Decision Tree. - PowerPoint PPT Presentation
Popular Tags:
34
Decision Trees ID Hair Height Weight Lotion Result Sarah Blonde Average Light No Sunburn Dana Blonde Tall Average Yes none Alex Brown Tall Average Yes None Annie Blonde Short Average No Sunburn Emily Red Average Heavy No Sunburn Pete Brown Tall Heavy No None John Brown Average Heavy No None Katie Blonde Short Light Yes None
Transcript
Page 1: Decision Trees

Decision Trees

ID Hair Height Weight Lotion ResultSarah Blonde Average Light No SunburnDana Blonde Tall Average Yes noneAlex Brown Tall Average Yes NoneAnnie Blonde Short Average No SunburnEmily Red Average Heavy No SunburnPete Brown Tall Heavy No NoneJohn Brown Average Heavy No NoneKatie Blonde Short Light Yes None

Page 2: Decision Trees

Decision Tree

Page 3: Decision Trees

DT Inducer

Page 4: Decision Trees

Day Outlook Temp Humidity Wind Play?

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak No

D14 Rain Mild High Strong No

What’s the DT?

Page 5: Decision Trees

Root

Outlook!, humidity, wind, temp

Page 6: Decision Trees

Version Space VS Decision Tree

ID3 searches a complete hypothesis space (any finite-valued discrete function). It searches incompletely using hill climbing with the heuristic: Preferring shorter trees with high information gain attributes closer to the root (Inductive bias)

Version spaces search an incomplete hypothesis space completely. Inductive bias arises from the bias in the hypothesis representation

Page 7: Decision Trees

Issues

How deep? Continuous attributes? Missing attribute values? Attributes with different costs?

Page 8: Decision Trees

Overfitting

Tree grows deep enough to perform well on training data. But There may be noise in the data Not enough examples

Over-fitting: h overfits the data if h’ does worse on training examples but does better over all instances

Page 9: Decision Trees
Page 10: Decision Trees

Handling Overfitting

Stop growing tree earlier Post pruning (works better in practice)

Page 11: Decision Trees

Methods

Construct tree then use a validation set – a separate set of examples, distinct from training set to evaluate the utility of post-pruning nodes

Construct Tree with all available data then use statistical tests to determine whether expanding or pruning a node is likely to produce an improvement only on the training example or the entire instance distribution

Page 12: Decision Trees

Training and validation

2/3 used for training 1/3 used for validation (Should be a large

enough sample) Validation set is a safety check Validation set is unlikely to contain the same

random errors and coincidental regularities as the training set.

Page 13: Decision Trees
Page 14: Decision Trees

Reduced error pruning

Pruning a node: Remove subtree rooted at that node – make it a leaf node, and give it the most common classification of training examples at that node Consider each node for pruning Remove node only if the pruned tree performs no worse

than original Iterate and stop when further pruning decreases decision

accuracy on validation set

Page 15: Decision Trees

From Trees to Rules

Traverse DT from root to each leaf Each such path defines a rule

Example If

?x hair color is blonde ?x uses lotion

Then Nothing happens

Page 16: Decision Trees

Rules from DT

If ?x hair color is blonde ?x uses no lotion

Then ?x turns red

If ?x hair color is red

Then ?x turns red

If ?x hair color is blonde ?x uses lotion

Then Nothing happens

If ?x hair color is dark

Then Nothing happens

Page 17: Decision Trees

Rule Pruning

Eliminate unnecessary antecedents. Consider: If ?x hair color is blonde ?x uses lotion Then nothing happens Suppose we eliminate the first antecedent (blonde)

The rule triggers for each person who uses lotion If ?x uses lotion Then nothing happens Data shows that nothing happens to anyone using lotion!

Might as well drop the first antecedent since it makes no difference!

Page 18: Decision Trees

Contingency tables

Formalizing the intuition. For those who used lotion:

No change Sunburned

Blonde 2 0

Not Blonde 1 0

For those who used lotion, it does not matter if they are blonde or not blonde, they do not get sunburned.

Page 19: Decision Trees

Contingency tables

Formalize the intuition. For those who are blonde:No change Sunburned

lotion 2 (Dane, Katie) 0

No lotion 0 2 (Sarah, Annie)

For those who are blonde, it does matter whether or not they use lotion. Two of those who use lotion get sunburned and two do not.

Page 20: Decision Trees

Contingency tables

If ?x is blonde and does not use lotion then ?x turns red Eliminate ?x is blonde if ?x does not use lotion then ?x turns red For those who do not use lotion:

No change SunburnedBlonde 0 2Not Blonde 2 1

Looks like ?x is blonde is important

Page 21: Decision Trees

Contingency tables

If ?x is blonde and does not use lotion then ?x turns red Eliminate ?x does not use lotion if ?x is blonde then ?x turns red For those who are blonde:

No change SunburnedNo lotion 0 2Lotion 2 0

Looks like ?x does not use lotion is important

Page 22: Decision Trees

Contingency tables

If ?x is redhead Then ?x turns red Eliminate ?x is redhead Rule always fires! Look at Everyone:

No change SunburnedRedhead 0 1Not redhead 5 2

Evidently red hair is important

Page 23: Decision Trees

Contingency tables

If ?x is dark haired Then Nothing happens Eliminate ?x is dark haired Rule always fires! Look at Everyone:

No change SunburnedDark hairedNot Dark haired

Is being dark haired important?

Page 24: Decision Trees

Contingency tables

If ?x is dark haired Then Nothing happens Eliminate ?x is dark haired Rule always fires! Look at Everyone:

No change SunburnedDark haired 3 0Not Dark haired 2 3

Is being dark haired important?

Page 25: Decision Trees

Eliminate Unnecessary rules

If ?x is blonde; ?x uses no lotion Then ?x is sunburned

If ?x uses lotion Then ?x Nothing happens If ?x is redhead Then ?x is sunburned If ?x is dark haired Then Nothing happens

Can we come up with a default rule that eliminates the need for some of the above rules?

Page 26: Decision Trees

Eliminate Unnecessary rules

If ?x uses lotion Then ?x Nothing happens If ?x is dark haired Then Nothing happens

If no other rule applies Then Nothing happens

Page 27: Decision Trees

Default rules

If ?x is blonde; ?x uses no lotion Then ?x is sunburned

If ?x is redhead Then ?x is sunburned

If no other rule applies Then ?x gets sunburned

Page 28: Decision Trees

Eliminate rules using defaults

Heuristic 1: Choose the default rule that eliminates/replaces as many other rules as possible Both default rules eliminate 2 other rules cannot use

this heuristic Heuristic 2 (to choose among rules identified by

heuristic 1): Choose the default rule that covers the most common consequent 5 not sunburned, 3 sunburned Choose: If no other rule applies Then nothing happens

Page 29: Decision Trees

General Procedure:

Fisher’s exact test cab be used to do this check

Page 30: Decision Trees

Why rules?

Distinguish between different contexts: Each rule can be considered separately for antecedent

pruning. Contrast this with DT. You can remove a node or not?

But there may be many contexts through that node. That is, there may be many rules that go through that node and removing the node means you remove ALL rules that go through that node

If we consider each rule separately, we can consider all contexts of a DT node separately.

Page 31: Decision Trees

Why Rules?

In DTs nodes near root are more “important” than nodes near leaves.

Rules avoid the distinction between attribute tests near the root and those later on. We can avoid thinking about re-organizing the

tree if we have to remove the root node!

Page 32: Decision Trees

Why rules?

Rules may be more readable

Page 33: Decision Trees

Continuous valued attributes

Temp 40 48 60 72 80 90

Play? No No Yes Yes Yes No

•Sort examples according to continuous values

•Identify adjacent examples that differ in target values (play?)

•Pick one of (48 + 60)/2 and (90 + 80)/2 by evaluating disorder of the new feature “tempGT54” and “tempGT85”

•Can also come up with multiple intervals. Use both above?

Page 34: Decision Trees

Feature selection

Search through space of feature subsets for a subset that maximizes performance


Recommended