+ All Categories
Home > Documents > Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues...

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues...

Date post: 03-Jan-2016
Category:
Upload: donald-green
View: 217 times
Download: 0 times
Share this document with a friend
24
Decision Trees Definition Definition Mechanism Mechanism Splitting Function Splitting Function Issues in Decision-Tree Learning Issues in Decision-Tree Learning Avoiding overfitting through pruning Avoiding overfitting through pruning Numeric and missing attributes Numeric and missing attributes
Transcript
Page 1: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Decision Trees

• DefinitionDefinition

• MechanismMechanism

• Splitting FunctionSplitting Function

• Issues in Decision-Tree LearningIssues in Decision-Tree Learning

• Avoiding overfitting through pruningAvoiding overfitting through pruning

• Numeric and missing attributes Numeric and missing attributes

Page 2: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Example of a Decision Tree

Example: Learning to classify stars. Example: Learning to classify stars.

LuminosityLuminosity

MassMass

Type AType A Type BType B

Type CType C

> T1> T1<= T1<= T1

> T2> T2<= T2<= T2

Page 3: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Short vs Long Hypotheses

We mentioned a top-down, greedy approach to constructing We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over decision trees denotes a preference of short hypotheses over long hypotheses. long hypotheses.

Why is this the right thing to do?Why is this the right thing to do?

Occam’s Razor: Prefer the simplest hypothesis that fits the data.Occam’s Razor: Prefer the simplest hypothesis that fits the data.

Back since William of Occam (1320). Back since William of Occam (1320). Great debate in the philosophy of science. Great debate in the philosophy of science.

Page 4: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Issues in Decision Tree Learning

Practical issues while building a decision tree can Practical issues while building a decision tree can be enumerated as follows:be enumerated as follows:

1)1) How deep should the tree be?How deep should the tree be?2)2) How do we handle continuous attributes?How do we handle continuous attributes?3)3) What is a good splitting function?What is a good splitting function?4)4) What happens when attribute values are missing?What happens when attribute values are missing?5)5) How do we improve the computational efficiency?How do we improve the computational efficiency?

Page 5: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

How deep should the tree be? Overfitting the Data

A tree A tree overfitsoverfits the data if we let it grow deep enough so that it the data if we let it grow deep enough so that itbegins to capture “aberrations” in the data that harm the predictivebegins to capture “aberrations” in the data that harm the predictivepower on unseen examples: power on unseen examples:

sizesize

t2t2

t3t3

hum

idit

yhu

mid

ity

Possibly just noise, butPossibly just noise, butthe tree is grown largerthe tree is grown largerto capture these examplesto capture these examples

Page 6: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Overtting the Data: Definition

Assume a hypothesis space H. We say a hypothesis Assume a hypothesis space H. We say a hypothesis hh in H overfits in H overfitsa dataset D if there is another hypothesis a dataset D if there is another hypothesis hh’ in H where ’ in H where hh has better has betterclassification accuracy than classification accuracy than hh’ on D but worse classification accuracy’ on D but worse classification accuracythan than hh’ on D’. ’ on D’.

0.5

0.6

0.7

0.8

0.9

1.0

0.5

0.6

0.7

0.8

0.9

1.0

Size of the treeSize of the tree

training datatraining data

testing datatesting data

overfittingoverfitting

Page 7: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Causes for Overtting the Data

What causes a hypothesis to overfit the data?What causes a hypothesis to overfit the data?

1)1) Random errors or noiseRandom errors or noise Examples have incorrect class label or Examples have incorrect class label or incorrect attribute values.incorrect attribute values.

2)2) Coincidental patternsCoincidental patterns By chance examples seem to deviate from a pattern due to By chance examples seem to deviate from a pattern due to the small size of the sample. the small size of the sample.

Overfitting is a serious problem that can cause Overfitting is a serious problem that can cause strong performance degradation. strong performance degradation.

Page 8: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Solutions for Overtting the Data

There are two main classes of solutions: There are two main classes of solutions:

1)1) Stop the tree early before it begins to overfit the data.Stop the tree early before it begins to overfit the data. + In practice this solution is hard to implement because it+ In practice this solution is hard to implement because it is not clear what is a good stopping point. is not clear what is a good stopping point.

2) Grow the tree until the algorithm stops even if the overfitting 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing problem shows up. Then prune the tree as a post-processing step. step. + This method has found great popularity in the machine + This method has found great popularity in the machine learning community. learning community.

Page 9: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Decision Tree Pruning

1.) Grow the tree to learn the 1.) Grow the tree to learn the training datatraining data

2.) Prune tree to avoid overfitting2.) Prune tree to avoid overfitting the datathe data

Page 10: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Methods to Validate the New Tree

Training and Validation Set ApproachTraining and Validation Set Approach

Divide dataset D into a training set TR and a Divide dataset D into a training set TR and a validation set TEvalidation set TE Build a decision tree on TRBuild a decision tree on TR Test pruned trees on TE to decide the best final tree. Test pruned trees on TE to decide the best final tree.

Dataset DDataset D Training TRTraining TR

Validation TEValidation TE

Page 11: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Training and Validation

There are two approaches:There are two approaches:

A.A. Reduced Error PruningReduced Error PruningB.B. Rule Post-PruningRule Post-Pruning

Dataset DDataset D Training TR (normally 2/3 of D)Training TR (normally 2/3 of D)

Validation TE (normally 1/3 of D)Validation TE (normally 1/3 of D)

Page 12: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Reduced Error Pruning

Main Idea:Main Idea:

1) Consider all internal nodes in the tree. 1) Consider all internal nodes in the tree. 2)2) For each node check if removing it (along with the subtree For each node check if removing it (along with the subtree below it) and assigning the most common class to it does below it) and assigning the most common class to it does not harm accuracy on the validation set.not harm accuracy on the validation set.3)3) Pick the node n* that yields the best performance and prune Pick the node n* that yields the best performance and prune its subtree.its subtree.4) Go back to (2) until no more improvements are possible.4) Go back to (2) until no more improvements are possible.

Page 13: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Example

Original TreeOriginal Tree

Possible trees after pruning:Possible trees after pruning:

Page 14: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Example

Pruned TreePruned Tree

Possible trees after 2Possible trees after 2ndnd pruning: pruning:

Page 15: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Example

Process continues until no improvement is observedProcess continues until no improvement is observedon the validation set:on the validation set:

0.5

0.6

0.7

0.8

0.9

1.0

0.5

0.6

0.7

0.8

0.9

1.0

Size of the treeSize of the tree

validation datavalidation data

Stop pruning the treeStop pruning the tree

Page 16: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Reduced Error Pruning

Disadvantages:Disadvantages:

If the original data set is small, separating examples away for If the original data set is small, separating examples away for validation may leave you with few examples for training.validation may leave you with few examples for training.

Dataset DDataset D Training TRTraining TR

Testing TETesting TE

Small datasetSmall dataset

Training set is too Training set is too small and so is the small and so is the validation setvalidation set

Page 17: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Rule Post-Pruning

Main Idea:Main Idea:

1) Convert the tree into a rule-based system. 1) Convert the tree into a rule-based system.

2)2) Prune every single rule first by removing redundant Prune every single rule first by removing redundant conditions.conditions.

3) Sort rules by accuracy. 3) Sort rules by accuracy.

Page 18: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Example

x1x1

x2x2 x3x3

AA BB AA CC

1100

111100 00

Original treeOriginal tree

Rules:Rules:~x1 & ~x2 -> Class A~x1 & ~x2 -> Class A~x1 & x2 -> Class B~x1 & x2 -> Class Bx1 & ~x3 -> Class Ax1 & ~x3 -> Class Ax1 & x3 -> Class Cx1 & x3 -> Class C

Possible rules after pruningPossible rules after pruning(based on validation set):(based on validation set):~x1 -> Class A~x1 -> Class A~x1 & x2 -> Class B~x1 & x2 -> Class B ~x3 -> Class A~x3 -> Class Ax1 & x3 -> Class Cx1 & x3 -> Class C

Page 19: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Advantages of Rule Post-Pruning

The language is more expressive.The language is more expressive.

Improves on interpretability.Improves on interpretability.

Pruning is more flexible.Pruning is more flexible.

In practice this method yields high accuracy performance.In practice this method yields high accuracy performance.

Page 20: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Decision Trees

• DefinitionDefinition

• MechanismMechanism

• Splitting FunctionsSplitting Functions

• Issues in Decision-Tree LearningIssues in Decision-Tree Learning

• Avoiding overfitting through pruningAvoiding overfitting through pruning

• Numeric and missing attributesNumeric and missing attributes

Page 21: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Discretizing Continuous Attributes

Example: attribute temperature.Example: attribute temperature.

1) Order all values in the training set1) Order all values in the training set2) Consider only those cut points where there is a change of class2) Consider only those cut points where there is a change of class3) Choose the cut point that maximizes information gain3) Choose the cut point that maximizes information gain

temperaturetemperature

97 97.5 97.6 97.8 98.5 99.0 99.2 100 102.2 102.6 103.297 97.5 97.6 97.8 98.5 99.0 99.2 100 102.2 102.6 103.2

Page 22: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Claude Shannon

1916 – 20011916 – 2001

Funded information theory on 1948 with his paper:Funded information theory on 1948 with his paper:““A Mathematical Theory of Communication”A Mathematical Theory of Communication”

Awarded the Alfred Noble American Institute of Awarded the Alfred Noble American Institute of American Engineers Award for his master’s thesis.American Engineers Award for his master’s thesis.

Worked at MIT, Bell Labs. Worked at MIT, Bell Labs.

Met with Alan Turing, Marvin Minsky, John von Neumann, and Albert Einstein. Met with Alan Turing, Marvin Minsky, John von Neumann, and Albert Einstein.

Creator of the “Ultimate Machine”. Creator of the “Ultimate Machine”.

Page 23: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Missing Attribute Values

We are at a node We are at a node nn in the decision tree. in the decision tree.Different approaches:Different approaches:

1)1) Assign the most common value for that attribute in node Assign the most common value for that attribute in node nn..2)2) Assign the most common value in Assign the most common value in nn among examples with the among examples with the same classification as same classification as XX. . 3)3) Assign a probability to each value of the attribute based on the Assign a probability to each value of the attribute based on the frequency of those values in node frequency of those values in node nn. Each fraction is propagated. Each fraction is propagated down the tree. down the tree.

Example: Example: XX = (luminosity > T1, mass = ?) = (luminosity > T1, mass = ?)

Page 24: Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

Summary

Decision-tree induction is a popular approach to classification Decision-tree induction is a popular approach to classification that enables us to interpret the output hypothesis.that enables us to interpret the output hypothesis. The hypothesis space is very powerful: all possible DNF formulas.The hypothesis space is very powerful: all possible DNF formulas. We prefer shorter trees than larger trees.We prefer shorter trees than larger trees. Overfitting is an important issue in decision-tree induction.Overfitting is an important issue in decision-tree induction. Different methods exist to avoid overfitting like reduced-error Different methods exist to avoid overfitting like reduced-error pruning and rule post-processing.pruning and rule post-processing. Techniques exist to deal with continuous attributes and missing Techniques exist to deal with continuous attributes and missing attribute values.attribute values.


Recommended