+ All Categories
Home > Documents > Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap,...

Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap,...

Date post: 29-Aug-2019
Category:
Upload: ledung
View: 217 times
Download: 0 times
Share this document with a friend
24
Bootstrap – Bagging – Random Forests Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 1 / 20
Transcript
Page 1: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap – Bagging – Random Forests

Olivier Roustant

Mines Saint-Étienne

2017/11

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 1 / 20

Page 2: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Outline

1 Bootstrap

2 Aggregation, bagging and random forests

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 2 / 20

Page 3: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Warning

This is only a very short introduction to bootstrap, aggregation andrandom forests, aiming at giving some insights to the future casestudyThis has to be completed by your own reading on these topics, inparticular Chapters 9 and 15 of [ESL]

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 3 / 20

Page 4: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Bootstrap

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 4 / 20

Page 5: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Purpose

The idea of bootstrap is to resample in the data

→ Allows creating variability without extra information.Etymology : To go up by pulling on the bootstraps (without extra force !)

→ Allows simulating from an unknown distribution.

Application to the case study 2015Compute forecast intervals without assuming the normality of theresiduals εt in the linear model with AR(2) errors

yt = β0 + β1x1,t + · · ·+ βpxp,t + ut

ut = φ1ut−1 + φ2ut−2 + εt

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 5 / 20

Page 6: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Principle

Denote F̂n the empirical distribution, i.e. the discrete distributionsupported by the data {x1, . . . , xn}, with uniform weights :

dF̂n(x) =1nδx1(x) + · · ·+

1nδxn(x)

Assume that x1, . . . , xn is a sample of F (an unknown distribution).Then If n is large enough, simulating from F̂n or F with be very similar.

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 6 / 20

Page 7: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Principle

Ex. Explain why if U ∼ U({1, . . . ,n}) then xU ∼ F̂n. Thus, simulatingfrom F̂n is achieved by resampling the data (with replacement).

→ R code : sample(data, size = nboot, replace = TRUE)

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 7 / 20

Page 8: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Application to the case study 2015

Possible residuals εt are represented below.→ They look approx. independent (ignoring variance variations...)→ They have fatter tails than the normal distribution (’leptokurticity’)

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 8 / 20

Page 9: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Application to the case study 2015

Compare the cdf of bootstrapped residuals (drawn from F̂n) to the cdfof the Gaussian distribution (in red), here different from F .

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 9 / 20

Page 10: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Correlation of a bootstrapped sample

Since boostrapped data are drawn from the same data, they arecorrelated.

Ex. Let X1, . . . ,Xn i.i.d. (0, σ2). Define :

X ∗1 = XU1 , . . . ,X

∗B = XUB

boostrapped data, where U1, . . . ,UB are i.i.d. ∼ U({1, . . . ,n}) andindependent from X1, . . . ,Xn.

Prove that X ∗1 , . . . ,X

∗B are i.d. (0, σ2) but with cor(X ∗

i ,X∗j ) =

1n .

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 10 / 20

Page 11: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Bootstrap

Correlation of bootstrapped sample means

Ex. Let X1, . . . ,Xn i.i.d. (0, σ2) and let X∗1,X

∗2 two sample means

computed (independently) by bootstrap. Prove that

cor(X∗1,X

∗2) =

n2n − 1

≈ 50%

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 11 / 20

Page 12: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Aggregation, bagging and random forests

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 12 / 20

Page 13: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Bagging : Bootstrap + Aggregating

Principle. Consider a set of data z1, . . . , zN .Obtain new data by bootstrapping the original data→ each bootstrap sample Z ?b

1 , . . . ,Z ?bN gives a new learner

Aggregate (here : average) the learners

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 13 / 20

Page 14: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #1 : Bagging is most useful for instable models

NotationsZ = {(Yn,Xn),n = 1, . . . ,N} : i.i.d. r.v. representing the dataφ(x ,Z ) : Prediction of y for a new xφA(x) = EZ (φ(x ,Z )) : Aggregated prediction

In Bagging, φA(x) ≈ 1B∑B

b=1 φ(x ,Z?b)

Define, for given x , y :e(x , y) = EZ

[(y − φ(x ,Z ))2] : The mean square error

eA(x , y) = (y − φA(x))2 : The aggregate error

Exercise. By interpreting e and eA with risk and bias, show that

eA(x , y)− e(x , y) = −varZ (φ(x ,Z )) ≤ 0

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 14 / 20

Page 15: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #1 : Bagging is most useful for instable models

NotationsZ = {(Yn,Xn),n = 1, . . . ,N} : i.i.d. r.v. representing the dataφ(x ,Z ) : Prediction of y for a new xφA(x) = EZ (φ(x ,Z )) : Aggregated prediction

In Bagging, φA(x) ≈ 1B∑B

b=1 φ(x ,Z?b)

Define, for given x , y :e(x , y) = EZ

[(y − φ(x ,Z ))2] : The mean square error

eA(x , y) = (y − φA(x))2 : The aggregate error

Exercise. By interpreting e and eA with risk and bias, show that

eA(x , y)− e(x , y) = −varZ (φ(x ,Z )) ≤ 0

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 14 / 20

Page 16: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #1 : Bagging is most useful for instable models

NotationsZ = {(Yn,Xn),n = 1, . . . ,N} : i.i.d. r.v. representing the dataφ(x ,Z ) : Prediction of y for a new xφA(x) = EZ (φ(x ,Z )) : Aggregated prediction

In Bagging, φA(x) ≈ 1B∑B

b=1 φ(x ,Z?b)

Define, for given x , y :e(x , y) = EZ

[(y − φ(x ,Z ))2] : The mean square error

eA(x , y) = (y − φA(x))2 : The aggregate error

Exercise. By interpreting e and eA with risk and bias, show that

eA(x , y)− e(x , y) = −varZ (φ(x ,Z )) ≤ 0

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 14 / 20

Page 17: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #2 : Bagging is improved by reducing correlation

Fact. The ’weak’ learners φ(x ,Z ?b) are independent conditionaly toinitial data (X1,Y1), . . . , (Xn,Yn), but not independent.

Ex. #1. The φ(x ,Z ?b) have common variance and correlation.

Ex. #2. Let B r.v. W1, . . . ,WB with common variance σ2 and correlationρ ≥ 0. Then the variance of 1

B∑B

b=1 Wb is :

ρσ2 +1− ρ

Bσ2

→ All the more efficient as ρ is small.

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 15 / 20

Page 18: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #2 : Bagging is improved by reducing correlation

Fact. The ’weak’ learners φ(x ,Z ?b) are independent conditionaly toinitial data (X1,Y1), . . . , (Xn,Yn), but not independent.

Ex. #1. The φ(x ,Z ?b) have common variance and correlation.

Ex. #2. Let B r.v. W1, . . . ,WB with common variance σ2 and correlationρ ≥ 0. Then the variance of 1

B∑B

b=1 Wb is :

ρσ2 +1− ρ

Bσ2

→ All the more efficient as ρ is small.

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 15 / 20

Page 19: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Idea #2 : Bagging is improved by reducing correlation

Fact. The ’weak’ learners φ(x ,Z ?b) are independent conditionaly toinitial data (X1,Y1), . . . , (Xn,Yn), but not independent.

Ex. #1. The φ(x ,Z ?b) have common variance and correlation.

Ex. #2. Let B r.v. W1, . . . ,WB with common variance σ2 and correlationρ ≥ 0. Then the variance of 1

B∑B

b=1 Wb is :

ρσ2 +1− ρ

Bσ2

→ All the more efficient as ρ is small.

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 15 / 20

Page 20: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Principles of random forest

Use non-linear and unstable weak learners→ Averaging of linear learners result in a linear learner !→ Unstable : see above ’bagging and unstability’→ Trees are good candidatesResample the observations as in baggingResample the variables in order to decrease ρ ("feature sampling")

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 16 / 20

Page 21: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Trees in 1 slide (from [ESL, chapter 9])

Example with CART (Classification and Regression Trees).

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 17 / 20

Page 22: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

Aggregation, bagging and random forests

Algorithm (from [ESL, Chapter 15])

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 18 / 20

Page 23: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

References

References

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 19 / 20

Page 24: Bootstrap Bagging Random Forests fileWarning This is only a very short introduction to bootstrap, aggregation and random forests, aiming at giving some insights to the future case

References

ESL T. Hastie, R. Tibshirani and J. Friedman (2009), The Elements ofStatistical Learning, Springer, 2nd edition, print 10.

BRE L. Breiman (1994), Bagging Predictors, Technical Report 421,University of California at Berkeley.

Olivier Roustant (EMSE) Bootstrap – Bagging – Random Forests 2017/11 20 / 20


Recommended