Date post: | 26-Jan-2015 |
Category: |
Education |
Upload: | enplus-advisors-inc |
View: | 105 times |
Download: | 0 times |
Hands on Classification:Decision Trees and Random Forests
Daniel Gerlanc, Managing DirectorEnplus Advisors, [email protected]
Predictive Analytics Meetup GroupMachine Learning WorkshopDecember 2, 2012
© Daniel Gerlanc, 2012. All rights reserved.
If you’d like to use this material for any purpose, please contact [email protected]
What You’ll Learn
•Intuition behind decision trees and random forests
•Implementation in R
•Assessing the results
Dataset
•Chemical Analysis of Italian Wines
•http://www.parvus.unige.it/
•178 records, 14 attributes
Follow along
> library(mlclass)> data(wine)> str(wine)'data.frame': 178 obs. of 14 variables: $ Type : Factor w/ 2 levels "Grig","No": 2 2 2 2 2 2 2 2 2 2 ... $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ... $ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
What are Decision Trees?
•Model for partitioning an input space
What’s partitioning?
See rf-1.R
Create the 1st split.
G
Not G
See rf-1.R
G
Not G
G
Create the 2nd Split
See rf-1.R
G
Not G
G
Create more splits…
Not G
I drew this one in.
Another view of partitioning
See rf-2.R
Use R to do the partitioning.
tree.1 <- rpart(Type ~ ., data=wine)prp(tree.1, type=4, extra=2)
• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.
See rf-2.R
Make predictions on a test dataset
predict(tree.1, data=wine, type=“vector”)
How’d it do?
Guessing: 60.11%
CART: 94.38% Accuracy • Precision: 92.95% (66 / 71)• Sensitivity/Recall: 92.95% (66 / 71)
Actual
Predicted Grig no
Grig (1)66 (3) 5
No (2) 5 (4) 102
Decision Tree Problems
•Overfitting the data
•May not use all relevant features
•Perpendicular decision boundaries
Random Forests
One Decision Tree
Many Decision Trees (Ensemble)
Random Forest Fixes
•Overfitting the data
•May not use all relevant features
•Perpendicular decision boundaries
Building RF
For each tree:
Sample from the data
At each split, sample from the available variables
Bootstrap Sampling
Sample Attributes at each split
Motivations for RF
•Create uncorrelated trees
•Variance reduction
•Subspace exploration
Random Forestsrffit.1 <- randomForest(Type ~ ., data=wine)
See rf-3.R
RF Parameters in RMost important parameters are:
Variable
Description Default
ntree Number of Trees 500
mtry Number of variables to randomly select at each node
• square root of # predictors for classification
• # predictors / 3 for regression
nodesize
Minimum number of records in a terminal node
• 1 for classification• 5 for regression
sampsize
Number of records to select in each bootstrap sample
• 63.2%
How’d it do?
Guessing Accuracy: 60.11%
Random Forest: 98.31% Accuracy • Precision: 95.77% (68 / 71)• Sensitivity/Recall: 100% (68 / 68)
Actual
Predicted Grig No
Grig (1)68 (3) 3
No (2) 0 (4) 107
Tuning RF: Grid Search
See rf-4.R
Th
is is
the d
efa
ult
.
Tuning is Expensive
•Polynomial in the number of tuning parameters:
•Plus repeated model fitting in cross-validation
Benefits of RF
•Good performance with default settings
•Relatively easy to make parallel
•Many implementations
•R, Weka, RapidMiner, Mahout
References
• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.
• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm