Decision trees · 2021. 1. 18. · Use of decision trees Plan 1. Introduction 2. Use of decision...

Post on 23-Jan-2021

2 views 0 download

transcript

Artificial intelligence

Decision trees

PRISM - Nicolas Sutton-Charani

18/01/2021

1 / 52

Artificial intelligence

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

2 / 52

Artificial intelligence

Introduction

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

3 / 52

Artificial intelligence

Introduction

What is a decision tree ?

attribute J1

attribute J2

labelprediction

labelprediction

attribute J3

attribute J4

labelprediction

labelprediction

labelprediction

4 / 52

Artificial intelligence

Introduction

What is a decision tree ?

attribute J1

attribute J2

labelprediction

labelprediction

attribute J3

attribute J4

labelprediction

labelprediction

labelprediction

5 / 52

Artificial intelligence

Introduction

What is a decision tree ? → supervised learning

attribute J1

attribute J2

labelprediction

values

labelprediction

values

values

attribute J3

attribute J4

labelprediction

values

labelprediction

values

values

labelprediction

values

values

6 / 52

Artificial intelligence

Introduction

A little history

!4machine learning (or data mining) decision trees6= decision theory decision trees

7 / 52

Artificial intelligence

Introduction

Types of decision trees

type of class label

I numerical → regression tree

I nominal → classification tree

type of algorithm (→ structure)

I CART : statistics, binary tree

I C4.5 : computer science, small tree

8 / 52

Artificial intelligence

Use of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

9 / 52

Artificial intelligence

Use of decision trees

Prediction

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

10 / 52

Artificial intelligence

Use of decision trees

Prediction

Classification treesWill the badminton match take place ?

11 / 52

Artificial intelligence

Use of decision trees

Prediction

Classification treesWhat fruit is it ?

12 / 52

Artificial intelligence

Use of decision trees

Prediction

Classification treesWhat he/she come to my party ?

13 / 52

Artificial intelligence

Use of decision trees

Prediction

Classification treesWill they wait ?

14 / 52

Artificial intelligence

Use of decision trees

Prediction

Classification treesWho will win the US presidential election ?

15 / 52

Artificial intelligence

Use of decision trees

Prediction

Regression treesWhat grade will a student get (given his homework averagegrade) ?

16 / 52

Artificial intelligence

Use of decision trees

Interpretability : Descriptive data analysis

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

17 / 52

Artificial intelligence

Use of decision trees

Interpretability : Descriptive data analysis

Data analysis tool

Trees are very interpretable : attributes spaces partitioning

→ a tree can be resumed by its leaves which define a law mixture

→ wonderful collaboration tool with experts

!4 INSTABILITY ← overfitting

18 / 52

Artificial intelligence

Learning of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

19 / 52

Artificial intelligence

Learning of decision trees

Formalism

Learning dataset (supervised learning) x1, y1...

xN , yN

=

x11 . . . xJ1 y1...

......

x1N . . . xJN yN

samples are assu-med to be i.i.d

I Attributes X = (X 1, . . . ,X J) ∈ X = X 1 × · · · × X J

I Spaces X j can be categorical or numerical

I Class label Y ∈ Ω = ω1, . . . , ωK (∈ RK for regression)

Tree

PH = t1, . . . , tH and πh = P(th) ≈ |th|N

with |th| = #i : xi ∈ th

20 / 52

Artificial intelligence

Learning of decision trees

Recursive partitioning

21 / 52

Artificial intelligence

Learning of decision trees

Recursive partitioning

22 / 52

Artificial intelligence

Learning of decision trees

Recursive partitioning

23 / 52

Artificial intelligence

Learning of decision trees

Recursive partitioning

24 / 52

Artificial intelligence

Learning of decision trees

Learning principleI Start with all the dataset in the initial nodeI Chose the best splits (on attributes) in order to get pure

leaves

Classification trees

purity = homogeneity in term of class labels

I CART → Gini impurity : i(th) =K∑

k=1pk (1− pk )

I ID3, C4.5 → Shanon entropy : i(th) = −K∑

k=1pk log2(pk )

whith

pk = P(Y = ωk |th)

Regression trees

purity = low variance of class labels

→ i(th) = Var(Y |th) = 1|th|

∑xi∈th

(yi − E(Y |th))2 with E(Y |th) = 1|th|

∑xi∈th

yi

25 / 52

Artificial intelligence

Learning of decision trees

Impurity measures

26 / 52

Artificial intelligence

Learning of decision trees

Purity criteria

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

27 / 52

Artificial intelligence

Learning of decision trees

Purity criteria

Purity criteria

leafto split

th

Impurity measure + tree structure → criteria

CART, ID3 : purity gain

C4.5 : information gain ratio

Regression trees

CART : Variance minimisation28 / 52

Artificial intelligence

Learning of decision trees

Purity criteria

Purity criteria

attribute ?

prediction ?

values ?

prediction ?

values ?th

tL tR

Impurity measure + tree structure → criteria

CART, ID3 : purity gain → ∆i = i(th)− πLi(tL)− πR i(tR)C4.5 : information gain ratio → IGR = ∆i

H(πL,πR)

Regression trees

CART : Variance minimisation → ∆i = i(th)− πLi(tL)− πR i(tR)

29 / 52

Artificial intelligence

Learning of decision trees

Stopping criteria

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

30 / 52

Artificial intelligence

Learning of decision trees

Stopping criteria

Stopping criteria (pre-pruning)

For all leaves thh=1,...,H and their potential children :

I leaves purity : ∃k ∈ 1, . . . ,K : pk = 1

I leaves and children sizes : |th| ≤ minLeafSize

I leaves and children weights : πh = |th|t0≤ minLeafProba

I leaves number : H ≥ maxNumberLeaves

I tree depth : depth(PH) ≥ maxDepth

I purity gain : ∆i ≤ minPurityGain

31 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

32 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

Learning algorithm

Result: Learnt tree

Start with all the learning data in an initial node (single leaf);

while Stopping criteria not verified for all leaves dofor each splitable leaf do

compute the purity gains obtained from all possiblesplit;

endSPLIT : select the split achieving the maximum purity gain;

endprune the obtained tree;

Recursive partitioning33 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Training Examples – [9+,5-]

34 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

35 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

36 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Selecting Next Attribute

37 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Best Attribute - Outlook

38 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Ssunny

39 / 52

Artificial intelligence

Learning of decision trees

Learning algorithm

ID3 - Results

40 / 52

Artificial intelligence

Pruning of decision trees

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

41 / 52

Artificial intelligence

Pruning of decision trees

Overfitting

42 / 52

Artificial intelligence

Pruning of decision trees

Overfitting

Remark : decision trees do not need variable selection ordimension reduction (in term of accuracy).

43 / 52

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

44 / 52

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

The ideaI trade-off between predictive efficiency and complexity

I find a subtree that fulfills this trade-off

MetricsI ’Err’ ← misclassification rate or MSE

I Criterion : Rα = Err + αH

Steps

I Find a useful sequence of nested subtrees

I Choose the right subtree

45 / 52

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

Sequence of subtrees creation

Result: sequence of trees that are all sub-trees of T0 : T0T1 T2 T3 . . . Tk P1(initialnode)

Learn the biggest tree Ts = T0 := PHmax obtained for α0 = 0(s=0);

while Ts 6= P1 doTs+1 = argmin

t∈subtrees(Ts)[Rαs (t)− Rαs (Ts)];

αs+1 = Rαs (Ts+1)− Rαs (Ts);

end

We get 2 bijective sets : T0, . . . ,TS and α0, . . . , αS (with TS = P1)

Selection : Ts∗ = argminTs∈T0,...,TS

Err(Ts) ← pruning set or cross validation

46 / 52

Artificial intelligence

Pruning of decision trees

Cost-complexity trade-off

Cost-Complexity Pruning

Figure – Sequence of nested subtrees

Here, α2 < α1 =⇒ T − T1 ⊂ T − T2

47 / 52

Artificial intelligence

Extension : random forest

Plan

1. Introduction

2. Use of decision trees2.1 Prediction2.2 Interpretability : Descriptive data analysis

3. Learning of decision trees3.1 Purity criteria3.2 Stopping criteria3.3 Learning algorithm

4. Pruning of decision trees4.1 Cost-complexity trade-off

5. Extension : random forest

48 / 52

Artificial intelligence

Extension : random forest

Random forest

MotivationI trees instability

I bias-variance trade-off

Averaging reduces variance :

Var(X ) =Var(X )

N(for independant predictions)

→ Average models to reduce model variance

One problem :

- only one training set

- where do multiple models come from ?49 / 52

Artificial intelligence

Extension : random forest

Bagging : Bootstrap Aggregation

I Tin Kam Ho (1995) → Leo Breiman (2001)

I Take repeated bootstrap samples from the training set

I Bootstrap sampling : Given a training set D containing Nexamples, draw N examples at random with replacement fromD.

I Bagging :

- create B bootstrap samples D1, . . . ,DB

- train distinct classifier on each Db

- classify new instance by majority vote / averaging /aggregating predictions

50 / 52

Artificial intelligence

Extension : random forest

Random forest

51 / 52

Artificial intelligence

Extension : random forest

References

* L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen,Classification And Regression Trees, 1984.

* J. Quinlan, “Induction of decision trees,” Machine Learning,vol. 1, pp. 81–106, Oct. 1986

* L. Breiman. Random forests. Statistics, pages 1–33, 2001.

* G. Biau, L. Devroye, and G. Lugosi. Consistency of randomforests and other averaging classifiers. J. Mach. Learn. Res.,9 :2015–2033, jun 2008.

52 / 52