Date post: | 12-Aug-2015 |
Category: |
Technology |
Upload: | akisato-kimura |
View: | 130 times |
Download: | 1 times |
1-page summary
β’ A method for refining a pre-trained random forestβ Comparable to RF with much more nodes of decision treesβ Better than RF with the same size of decision trees
3
Random forest
β’ An ensemble of decision trees trained by bootstrap sampling & random feature selection
Decision treeππππ
οΏ½π¦π¦ππ = πππ‘π‘(ππππ)(returns a MAP prediction) 4
Random forest
β’ An ensemble of decision trees trained by bootstrap sampling & random feature selection
ππππ
οΏ½π¦π¦ππ = ππ1(ππππ)(returns a MAP prediction)
ππππ
οΏ½π¦π¦ππ = ππ2(ππππ)
ππππ
οΏ½π¦π¦ππ = ππππ(ππππ)
Random forest
5
Reformulation of a decision tree
β’ A prediction ππ can be divided into 2 components
Decision treeππππ
οΏ½π¦π¦ππ = πππ‘π‘(ππππ)(returns a MAP prediction)
πππ‘π‘ ππππ = 0, 0, 1, 0 ππ
Indicating which path to a leaf node is selected,can be represented by a binary vector.
π€π€π‘π‘(π¦π¦) = 0.2, 0.5, 0.8, 0.1 ππ
Storing a posterior probability of π¦π¦ at each leaf node,can be represented by a real vector.
πππ‘π‘ ππππ = arg maxπ¦π¦
π€π€π‘π‘(π¦π¦) β πππ‘π‘(ππππ)
Indicator vector
Leaf vector
6
Reformulation of a random forest
β’ A prediction ππ can be divided into 2 components
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.2 0.5 0.8 0.1 0.3 0.7 0.1 0.2 0.1 0.1 0.5 0.3ππ π¦π¦
Random forest ππ ππππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππππ)
7
Look like a SVM classifier
Global refinement
β’ Optimize a leaf vector (weights) ππ(π¦π¦),while maintaining the indicator vector (structure) Ξ¦(π₯π₯)
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.1 0.3 0.9 0.1 0.1 0.8 0.1 0.2 0.1 0.1 0.7 0.1οΏ½ππ π¦π¦
Random forest ππ ππππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππππ)
8
Global refinement
β’ Optimize a leaf vector (weights) ππ(π¦π¦),while maintaining the indicator vector (structure) Ξ¦(π₯π₯)
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.1 0.3 0.9 0.1 0.1 0.8 0.1 0.2 0.1 0.1 0.7 0.1οΏ½ππ π¦π¦
Random forest ππ ππππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππππ)
This optimization can be regarded as a linear classification problem,where an indicator vector Ξ¦(ππ) is a new representation of a sample ππ.
[Note] In standard random forest, the trees are independently optimized.This optimization effectively utilizes complementary information among trees.
9
Global refinement
β’ Optimize a leaf vector (weights) ππ(π¦π¦),while maintaining the indicator vector (structure) Ξ¦(π₯π₯)
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.1 0.3 0.9 0.1 0.1 0.8 0.1 0.2 0.1 0.1 0.7 0.1οΏ½ππ π¦π¦
Random forest ππ ππππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππππ)
This optimization can be regarded as a linear classification problem,where an indicator vector Ξ¦(ππ) is a new representation of a sample ππ.
A sample Ξ¦(π₯π₯) is highly sparse Liblinear well suits this problem.
It can be easily extended to a regression problem.
10
Global pruning
β’ Adjacent leaves with nearly-zero weights ππ(π¦π¦)do not contribute to the final result. merge them.
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.1 0.3 0.9 0.1 0.1 0.8 0.1 0.2 0.1 0.1 0.7 0.1οΏ½ππ π¦π¦
Random forest ππ ππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππ)
0
0.1
0
0.1
11
Global pruning
β’ Adjacent leaves with nearly-zero weights ππ(π¦π¦)do not contribute to the final result. merge them.
ππππ ππππ ππππ
0 0 1 0 0 1 0 0 0 0 1 0Ξ¦ ππππ0.1 0.3 0.9 0.1 0.1 0.8 0.1 0.2 0.1 0.1 0.7 0.1οΏ½ππ π¦π¦
Random forest ππ ππ = arg maxπ¦π¦
ππ(π¦π¦) β Ξ¦(ππ)
0
0.1
0
0.1
1. Optimize leaf vectors ππ π¦π¦ βπ¦π¦2. Prune a certain percentage of insignificant leaves
(significance = sum of elements in leaf vectors)3. Update indicator vectors Ξ¦(π₯π₯) for all the training samples4. Repeat 1-3 until satisfying certain criterion, e.g.
a. the size of the random forest is smaller than predefined,b. the prediction accuracy achieves best on a validation set.
12
Experimental results
β’ ADF/ARF - alternating decision (regression) forest [Schulter+ ICCV13]β’ Refined-A - Proposed method with the βaccuracyβ criterionβ’ Refined-E - Proposed method with βover-pruningβ
(Accuracy is comparable to the original RF, but the size is much smaller.)β’ Metrics - Error rate for classification, RMSE for regression.β’ # trees = 100, max. depth = 10, 15 or 25 depending on the size of the training data.β’ 60% for training, 40% for testing. 14
Parameter analysis
β’ The proposed method achieved better performancesthan RFs with the same tree parameters (e.g. the number and depth of trees)
15
(for MNIST data)
Parameter analysis
β’ The proposed method accelerates both training and testing steps
16
(for MNIST data)
Number of dimensions used on each node splitting
Number of samples used in each decision tree
Best for RFBest for proposed
Time for testingfast slow Time for trainingfast slow
Less sensitive More samples needed
Applications
β’ Kinect body part classification
β’ Age regression from face images
17
Task-specific features