Post on 23-Jan-2021
transcript
Artificial intelligence
Decision trees
PRISM - Nicolas Sutton-Charani
18/01/2021
1 / 52
Artificial intelligence
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
2 / 52
Artificial intelligence
Introduction
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
3 / 52
Artificial intelligence
Introduction
What is a decision tree ?
attribute J1
attribute J2
labelprediction
labelprediction
attribute J3
attribute J4
labelprediction
labelprediction
labelprediction
4 / 52
Artificial intelligence
Introduction
What is a decision tree ?
attribute J1
attribute J2
labelprediction
labelprediction
attribute J3
attribute J4
labelprediction
labelprediction
labelprediction
5 / 52
Artificial intelligence
Introduction
What is a decision tree ? → supervised learning
attribute J1
attribute J2
labelprediction
values
labelprediction
values
values
attribute J3
attribute J4
labelprediction
values
labelprediction
values
values
labelprediction
values
values
6 / 52
Artificial intelligence
Introduction
A little history
!4machine learning (or data mining) decision trees6= decision theory decision trees
7 / 52
Artificial intelligence
Introduction
Types of decision trees
type of class label
I numerical → regression tree
I nominal → classification tree
type of algorithm (→ structure)
I CART : statistics, binary tree
I C4.5 : computer science, small tree
8 / 52
Artificial intelligence
Use of decision trees
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
9 / 52
Artificial intelligence
Use of decision trees
Prediction
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
10 / 52
Artificial intelligence
Use of decision trees
Prediction
Classification treesWill the badminton match take place ?
11 / 52
Artificial intelligence
Use of decision trees
Prediction
Classification treesWhat fruit is it ?
12 / 52
Artificial intelligence
Use of decision trees
Prediction
Classification treesWhat he/she come to my party ?
13 / 52
Artificial intelligence
Use of decision trees
Prediction
Classification treesWill they wait ?
14 / 52
Artificial intelligence
Use of decision trees
Prediction
Classification treesWho will win the US presidential election ?
15 / 52
Artificial intelligence
Use of decision trees
Prediction
Regression treesWhat grade will a student get (given his homework averagegrade) ?
16 / 52
Artificial intelligence
Use of decision trees
Interpretability : Descriptive data analysis
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
17 / 52
Artificial intelligence
Use of decision trees
Interpretability : Descriptive data analysis
Data analysis tool
Trees are very interpretable : attributes spaces partitioning
→ a tree can be resumed by its leaves which define a law mixture
→ wonderful collaboration tool with experts
!4 INSTABILITY ← overfitting
18 / 52
Artificial intelligence
Learning of decision trees
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
19 / 52
Artificial intelligence
Learning of decision trees
Formalism
Learning dataset (supervised learning) x1, y1...
xN , yN
=
x11 . . . xJ1 y1...
......
x1N . . . xJN yN
samples are assu-med to be i.i.d
I Attributes X = (X 1, . . . ,X J) ∈ X = X 1 × · · · × X J
I Spaces X j can be categorical or numerical
I Class label Y ∈ Ω = ω1, . . . , ωK (∈ RK for regression)
Tree
PH = t1, . . . , tH and πh = P(th) ≈ |th|N
with |th| = #i : xi ∈ th
20 / 52
Artificial intelligence
Learning of decision trees
Recursive partitioning
21 / 52
Artificial intelligence
Learning of decision trees
Recursive partitioning
22 / 52
Artificial intelligence
Learning of decision trees
Recursive partitioning
23 / 52
Artificial intelligence
Learning of decision trees
Recursive partitioning
24 / 52
Artificial intelligence
Learning of decision trees
Learning principleI Start with all the dataset in the initial nodeI Chose the best splits (on attributes) in order to get pure
leaves
Classification trees
purity = homogeneity in term of class labels
I CART → Gini impurity : i(th) =K∑
k=1pk (1− pk )
I ID3, C4.5 → Shanon entropy : i(th) = −K∑
k=1pk log2(pk )
whith
pk = P(Y = ωk |th)
Regression trees
purity = low variance of class labels
→ i(th) = Var(Y |th) = 1|th|
∑xi∈th
(yi − E(Y |th))2 with E(Y |th) = 1|th|
∑xi∈th
yi
25 / 52
Artificial intelligence
Learning of decision trees
Impurity measures
26 / 52
Artificial intelligence
Learning of decision trees
Purity criteria
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
27 / 52
Artificial intelligence
Learning of decision trees
Purity criteria
Purity criteria
leafto split
th
Impurity measure + tree structure → criteria
CART, ID3 : purity gain
C4.5 : information gain ratio
Regression trees
CART : Variance minimisation28 / 52
Artificial intelligence
Learning of decision trees
Purity criteria
Purity criteria
attribute ?
prediction ?
values ?
prediction ?
values ?th
tL tR
Impurity measure + tree structure → criteria
CART, ID3 : purity gain → ∆i = i(th)− πLi(tL)− πR i(tR)C4.5 : information gain ratio → IGR = ∆i
H(πL,πR)
Regression trees
CART : Variance minimisation → ∆i = i(th)− πLi(tL)− πR i(tR)
29 / 52
Artificial intelligence
Learning of decision trees
Stopping criteria
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
30 / 52
Artificial intelligence
Learning of decision trees
Stopping criteria
Stopping criteria (pre-pruning)
For all leaves thh=1,...,H and their potential children :
I leaves purity : ∃k ∈ 1, . . . ,K : pk = 1
I leaves and children sizes : |th| ≤ minLeafSize
I leaves and children weights : πh = |th|t0≤ minLeafProba
I leaves number : H ≥ maxNumberLeaves
I tree depth : depth(PH) ≥ maxDepth
I purity gain : ∆i ≤ minPurityGain
31 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
32 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
Learning algorithm
Result: Learnt tree
Start with all the learning data in an initial node (single leaf);
while Stopping criteria not verified for all leaves dofor each splitable leaf do
compute the purity gains obtained from all possiblesplit;
endSPLIT : select the split achieving the maximum purity gain;
endprune the obtained tree;
Recursive partitioning33 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Training Examples – [9+,5-]
34 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Selecting Next Attribute
35 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Selecting Next Attribute
36 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Selecting Next Attribute
37 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Best Attribute - Outlook
38 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Ssunny
39 / 52
Artificial intelligence
Learning of decision trees
Learning algorithm
ID3 - Results
40 / 52
Artificial intelligence
Pruning of decision trees
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
41 / 52
Artificial intelligence
Pruning of decision trees
Overfitting
42 / 52
Artificial intelligence
Pruning of decision trees
Overfitting
Remark : decision trees do not need variable selection ordimension reduction (in term of accuracy).
43 / 52
Artificial intelligence
Pruning of decision trees
Cost-complexity trade-off
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
44 / 52
Artificial intelligence
Pruning of decision trees
Cost-complexity trade-off
Cost-Complexity Pruning
The ideaI trade-off between predictive efficiency and complexity
I find a subtree that fulfills this trade-off
MetricsI ’Err’ ← misclassification rate or MSE
I Criterion : Rα = Err + αH
Steps
I Find a useful sequence of nested subtrees
I Choose the right subtree
45 / 52
Artificial intelligence
Pruning of decision trees
Cost-complexity trade-off
Cost-Complexity Pruning
Sequence of subtrees creation
Result: sequence of trees that are all sub-trees of T0 : T0T1 T2 T3 . . . Tk P1(initialnode)
Learn the biggest tree Ts = T0 := PHmax obtained for α0 = 0(s=0);
while Ts 6= P1 doTs+1 = argmin
t∈subtrees(Ts)[Rαs (t)− Rαs (Ts)];
αs+1 = Rαs (Ts+1)− Rαs (Ts);
end
We get 2 bijective sets : T0, . . . ,TS and α0, . . . , αS (with TS = P1)
Selection : Ts∗ = argminTs∈T0,...,TS
Err(Ts) ← pruning set or cross validation
46 / 52
Artificial intelligence
Pruning of decision trees
Cost-complexity trade-off
Cost-Complexity Pruning
Figure – Sequence of nested subtrees
Here, α2 < α1 =⇒ T − T1 ⊂ T − T2
47 / 52
Artificial intelligence
Extension : random forest
Plan
1. Introduction
2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis
3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm
4. Pruning of decision trees4.1 Cost-complexity trade-off
5. Extension : random forest
48 / 52
Artificial intelligence
Extension : random forest
Random forest
MotivationI trees instability
I bias-variance trade-off
Averaging reduces variance :
Var(X ) =Var(X )
N(for independant predictions)
→ Average models to reduce model variance
One problem :
- only one training set
- where do multiple models come from ?49 / 52
Artificial intelligence
Extension : random forest
Bagging : Bootstrap Aggregation
I Tin Kam Ho (1995) → Leo Breiman (2001)
I Take repeated bootstrap samples from the training set
I Bootstrap sampling : Given a training set D containing Nexamples, draw N examples at random with replacement fromD.
I Bagging :
- create B bootstrap samples D1, . . . ,DB
- train distinct classifier on each Db
- classify new instance by majority vote / averaging /aggregating predictions
50 / 52
Artificial intelligence
Extension : random forest
Random forest
51 / 52
Artificial intelligence
Extension : random forest
References
* L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen,Classification And Regression Trees, 1984.
* J. Quinlan, “Induction of decision trees,” Machine Learning,vol. 1, pp. 81–106, Oct. 1986
* L. Breiman. Random forests. Statistics, pages 1–33, 2001.
* G. Biau, L. Devroye, and G. Lugosi. Consistency of randomforests and other averaging classifiers. J. Mach. Learn. Res.,9 :2015–2033, jun 2008.
52 / 52