+ All Categories
Home > Documents > Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested...

Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested...

Date post: 22-Aug-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
94
Cross Validation & Ensembling Shan-Hung Wu [email protected] Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34
Transcript
Page 1: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Cross Validation & Ensembling

Shan-Hung [email protected]

Department of Computer Science,National Tsing Hua University, Taiwan

Machine Learning

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34

Page 2: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 2 / 34

Page 3: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 3 / 34

Page 4: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Cross ValidationSo far, we use the hold out method for:

Hyperparameter tuning: validation setPerformance reporting: testing set

What if we get an “unfortunate” split?

K-fold cross validation:1 Split the data set X evenly into K subsets X(i) (called folds)2 For i = 1, · · · ,K, train f�N

(i) using all data but the i-th fold (X\X(i))3 Report the cross-validation error C

CV

by averaging all testing errorsC[f�N

(i) ]’s on X(i)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

Page 5: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Cross ValidationSo far, we use the hold out method for:

Hyperparameter tuning: validation setPerformance reporting: testing set

What if we get an “unfortunate” split?K-fold cross validation:

1 Split the data set X evenly into K subsets X(i) (called folds)2 For i = 1, · · · ,K, train f�N

(i) using all data but the i-th fold (X\X(i))3 Report the cross-validation error C

CV

by averaging all testing errorsC[f�N

(i) ]’s on X(i)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

Page 6: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Nested Cross Validation

Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting

E.g, 5⇥2 nested CV

1 Inner (2 folds): selecthyperparameters giving lowestC

CV

Can be wrapped by gridsearch

2 Train final model using both

training and validation sets withthe selected hyperparameters

3 Outer (5 folds): report C

CV

astest error

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Page 7: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Nested Cross Validation

Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting

E.g, 5⇥2 nested CV

1 Inner (2 folds): selecthyperparameters giving lowestC

CV

Can be wrapped by gridsearch

2 Train final model using both

training and validation sets withthe selected hyperparameters

3 Outer (5 folds): report C

CV

astest error

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Page 8: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Nested Cross Validation

Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting

E.g, 5⇥2 nested CV

1 Inner (2 folds): selecthyperparameters giving lowestC

CV

Can be wrapped by gridsearch

2 Train final model using both

training and validation sets withthe selected hyperparameters

3 Outer (5 folds): report C

CV

astest error

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Page 9: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Nested Cross Validation

Cross validation (CV) can be applied to both hyperparameter tuningand performance reporting

E.g, 5⇥2 nested CV

1 Inner (2 folds): selecthyperparameters giving lowestC

CV

Can be wrapped by gridsearch

2 Train final model using both

training and validation sets withthe selected hyperparameters

3 Outer (5 folds): report C

CV

astest error

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Page 10: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 6 / 34

Page 11: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? I

The cross-validation error CCV is an average of C[f�N

(i) ]’sRegard each C[f�N

(i) ] as an estimator of the expected generalizationerror EX(C[f

N

])

CCV is an estimator too, and we have

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

Page 12: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? I

The cross-validation error CCV is an average of C[f�N

(i) ]’s

Regard each C[f�N

(i) ] as an estimator of the expected generalizationerror EX(C[f

N

])

CCV is an estimator too, and we have

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

Page 13: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? I

The cross-validation error CCV is an average of C[f�N

(i) ]’sRegard each C[f�N

(i) ] as an estimator of the expected generalizationerror EX(C[f

N

])

CCV is an estimator too, and we have

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

Page 14: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? I

The cross-validation error CCV is an average of C[f�N

(i) ]’sRegard each C[f�N

(i) ] as an estimator of the expected generalizationerror EX(C[f

N

])

CCV is an estimator too, and we have

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

Page 15: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)= VarX( ˆq

n

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 16: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)= VarX( ˆq

n

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 17: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)= VarX( ˆq

n

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 18: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)= VarX( ˆq

n

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 19: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)

= VarX( ˆqn

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 20: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Point Estimation Revisited: Mean Square Error

Let ˆqn

be an estimator of quantity q related to random variable x

mapped from n i.i.d samples of x

Mean square error of ˆqn

:

MSE( ˆqn

) = EX⇥( ˆq

n

�q)2

Can be decomposed into the bias and variance:

EX⇥( ˆq

n

�q)2

⇤= E

⇥( ˆq

n

�E[ ˆqn

]+E[ ˆqn

]�q)2

= E

⇥( ˆq

n

�E[ ˆqn

])2 +(E[ ˆqn

]�q)2 +2( ˆqn

�E[ ˆqn

])(E[ ˆqn

]�q)⇤

= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+E

⇥(E[ ˆq

n

]�q)2

⇤+2E

�ˆqn

�E[ ˆqn

]�(E[ ˆq

n

]�q)= E

⇥( ˆq

n

�E[ ˆqn

])2

⇤+�E[ ˆq

n

]�q�

2

+2 ·0 · (E[ ˆqn

]�q)= VarX( ˆq

n

)+bias( ˆqn

)2

MSE of an unbiased estimator is its variance

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Page 21: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f

N

]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 22: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)

Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f

N

]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 23: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labels

EX(C[fN

]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 24: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f

N

]): read line

bias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 25: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f

N

]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])

VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 26: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example: 5-Fold vs. 10-Fold CV

MSE(CCV) = EX[(CCV�EX(C[fN

]))2] = VarX(CCV)+bias(CCV)2

Consider polynomial regression whereP(y|x) = sin(x)+ e,e ⇠ N (0,s2)Let C[·] be the MSE of predictions (made by a function) to true labelsEX(C[f

N

]): read linebias(CCV): gaps between the red and other solid lines (EX[CCV])VarX (CCV): shaded areas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Page 27: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 28: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])

= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 29: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])

= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 30: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 31: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 32: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 33: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 34: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Decomposing Bias and Variance

CCV is an estimator of the expected generalization error EX(C[fN

]):

MSE(CCV) = VarX(CCV)+bias(CCV)2, where

bias(CCV) = EX (CCV)�EX(C[fN

]) = E

�Â

i

1

K

C[f�N

(i) ]��E(C[f

N

])= 1

K

Âi

E

�C[f�N

(i) ]��E(C[f

N

])= E

�C[f�N

(s) ]��E(C[f

N

]),8s

= bias

�C[f�N

(s) ]�,8s

VarX (CCV) = Var

�Â

i

1

K

C[f�N

(i) ]�= 1

K

2

Var

�Â

i

C[f�N

(i) ]�

= 1

K

2

�Â

i

Var

�C[f�N

(i) ]�+2Â

i,j,j>i

CovX�C[f�N

(i) ],C[f�N

(j) ]��

= 1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 10 / 34

Page 35: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? II

MSE(CCV) = VarX(CCV)+bias(CCV)2, wherebias(CCV) = bias

�C[f�N

(s) ]�,8s

Var(CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

We can reduce bias(CCV) and Var(CCV) by learning theory

Choosing the right model complexity avoiding both underfitting andoverfittingCollecting more training examples (N)

Furthermore, we can reduce Var(CCV) by making f�N

(i) and f�N

(j)

uncorrelated

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 11 / 34

Page 36: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? II

MSE(CCV) = VarX(CCV)+bias(CCV)2, wherebias(CCV) = bias

�C[f�N

(s) ]�,8s

Var(CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

We can reduce bias(CCV) and Var(CCV) by learning theory

Choosing the right model complexity avoiding both underfitting andoverfittingCollecting more training examples (N)

Furthermore, we can reduce Var(CCV) by making f�N

(i) and f�N

(j)

uncorrelated

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 11 / 34

Page 37: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? III

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

With a large K, the CCV tends to have:

Low bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�, as f�N

(s) is trained on moreexamplesHigh Cov

�C[f�N

(i) ],C[f�N

(j) ]�, as training sets X\X(i) and X\X(j) are

more similar thus C[f�N

(i) ] and C[f�N

(j) ] are more positively correlated

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34

Page 38: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? III

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

With a large K, the CCV tends to have:Low bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�, as f�N

(s) is trained on moreexamples

High Cov

�C[f�N

(i) ],C[f�N

(j) ]�, as training sets X\X(i) and X\X(j) are

more similar thus C[f�N

(i) ] and C[f�N

(j) ] are more positively correlated

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34

Page 39: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? III

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

With a large K, the CCV tends to have:Low bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�, as f�N

(s) is trained on moreexamplesHigh Cov

�C[f�N

(i) ],C[f�N

(j) ]�, as training sets X\X(i) and X\X(j) are

more similar thus C[f�N

(i) ] and C[f�N

(j) ] are more positively correlated

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 12 / 34

Page 40: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? IV

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Conversely, with a small K, the cross-validation error tends to have ahigh bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�

but low Cov

�C[f�N

(i) ],C[f�N

(j) ]�

In practice, we usually set K = 5 or 10

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 13 / 34

Page 41: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

How Many Folds K? IV

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

Conversely, with a small K, the cross-validation error tends to have ahigh bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�

but low Cov

�C[f�N

(i) ],C[f�N

(j) ]�

In practice, we usually set K = 5 or 10

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 13 / 34

Page 42: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Leave-One-Out CV

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

For very small dataset:MSE(CCV) is dominated by bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�

Not Cov

�C[f�N

(i) ],C[f�N

(j) ]�

We can choose K = N, which we call the leave-one-out CV

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 14 / 34

Page 43: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Leave-One-Out CV

bias(CCV) = bias

�C[f�N

(s) ]�,8s

VarX (CCV) =1

K

Var

�C[f�N

(s) ]�+ 2

K

2

Âi,j,j>i

Cov

�C[f�N

(i) ],C[f�N

(j) ]�,8s

For very small dataset:MSE(CCV) is dominated by bias

�C[f�N

(s) ]�

and Var

�C[f�N

(s) ]�

Not Cov

�C[f�N

(i) ],C[f�N

(j) ]�

We can choose K = N, which we call the leave-one-out CV

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 14 / 34

Page 44: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 15 / 34

Page 45: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Ensemble Methods

No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasks

Can we combine multiple base-learners to improveApplicability across different domains, and/orGeneralization performance in a specific task?

These are the goals of ensemble learning

How to “combine” multiple base-learners?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34

Page 46: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Ensemble Methods

No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve

Applicability across different domains, and/orGeneralization performance in a specific task?

These are the goals of ensemble learning

How to “combine” multiple base-learners?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34

Page 47: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Ensemble Methods

No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve

Applicability across different domains, and/orGeneralization performance in a specific task?

These are the goals of ensemble learning

How to “combine” multiple base-learners?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34

Page 48: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Ensemble Methods

No free lunch theorem: there is no single ML algorithm that alwaysoutperforms the others in all domains/tasksCan we combine multiple base-learners to improve

Applicability across different domains, and/orGeneralization performance in a specific task?

These are the goals of ensemble learning

How to “combine” multiple base-learners?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 16 / 34

Page 49: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 17 / 34

Page 50: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

VotingVoting: linear combining the predictions of base-learners for each x:

y

k

= Âj

w

j

y

(j)k

where w

j

� 0,Âj

w

j

= 1.

If all learners are given equal weight w

j

= 1/L, we have the plurality

vote (multi-class version of majority vote)

Voting Rule Formular

Sum y

k

= 1

L

ÂL

j=1

y

(j)k

Weighted sum y

k

= Âj

w

j

y

(j)k

,wj

� 0,Âj

w

j

= 1

Median y

k

= medianj

y

(j)k

Minimum y

k

= min

j

y

(j)k

Maximum y

k

= max

j

y

(j)k

Product y

k

= ’j

y

(j)k

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 18 / 34

Page 51: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? I

Assume that each y

(j) has the expected value EX�y

(j) |x�

and varianceVarX

�y

(j) |x�

When w

j

= 1/L, we have:

EX (y |x) = E

Âj

1

L

y

(j) |x!

=1

L

Âj

E

⇣y

(j) |x⌘= E

⇣y

(j) |x⌘

VarX (y |x) = Var

Âj

1

L

y

(j) |x!

=1

L

2

Var

Âj

y

(j) |x!

=1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

The expected value doesn’t change, so the bias doesn’t change

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34

Page 52: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? I

Assume that each y

(j) has the expected value EX�y

(j) |x�

and varianceVarX

�y

(j) |x�

When w

j

= 1/L, we have:

EX (y |x) = E

Âj

1

L

y

(j) |x!

=1

L

Âj

E

⇣y

(j) |x⌘= E

⇣y

(j) |x⌘

VarX (y |x) = Var

Âj

1

L

y

(j) |x!

=1

L

2

Var

Âj

y

(j) |x!

=1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

The expected value doesn’t change, so the bias doesn’t change

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34

Page 53: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? I

Assume that each y

(j) has the expected value EX�y

(j) |x�

and varianceVarX

�y

(j) |x�

When w

j

= 1/L, we have:

EX (y |x) = E

Âj

1

L

y

(j) |x!

=1

L

Âj

E

⇣y

(j) |x⌘= E

⇣y

(j) |x⌘

VarX (y |x) = Var

Âj

1

L

y

(j) |x!

=1

L

2

Var

Âj

y

(j) |x!

=1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

The expected value doesn’t change, so the bias doesn’t change

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 19 / 34

Page 54: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? II

VarX (y |x) = 1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

If y

(i) and y

(j) are uncorrelated, the variance can be reducedUnfortunately, y

(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34

Page 55: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? II

VarX (y |x) = 1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

If y

(i) and y

(j) are uncorrelated, the variance can be reduced

Unfortunately, y

(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34

Page 56: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why Voting Works? II

VarX (y |x) = 1

L

Var

⇣y

(j) |x⌘+

2

L

2

Âi,j,i<j

Cov

⇣y

(i), y(j) |x⌘

If y

(i) and y

(j) are uncorrelated, the variance can be reducedUnfortunately, y

(j)’s may not be i.i.d. in practiceIf voters are positively correlated, variance increases

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 20 / 34

Page 57: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 21 / 34

Page 58: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Bagging

Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow?

Why not train them using slightly different training sets?

1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)

It is possible that some instances are drawn more than once and someare not at all

2 Train a base-learner for each X(j)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34

Page 59: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Bagging

Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?

1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)

It is possible that some instances are drawn more than once and someare not at all

2 Train a base-learner for each X(j)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34

Page 60: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Bagging

Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?

1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)

It is possible that some instances are drawn more than once and someare not at all

2 Train a base-learner for each X(j)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34

Page 61: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Bagging

Bagging (short for bootstrap aggregating) is a voting method, butbase-learners are made different deliberatelyHow? Why not train them using slightly different training sets?

1 Generate L slightly different samples from a given sample is done bybootstrap: given X of size N, we draw N points randomly from Xwith replacement to get X(j)

It is possible that some instances are drawn more than once and someare not at all

2 Train a base-learner for each X(j)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 22 / 34

Page 62: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 23 / 34

Page 63: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Boosting

In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning method

In boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d

(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner

A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34

Page 64: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Boosting

In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow?

Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d

(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner

A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34

Page 65: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Boosting

In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearners

For simplicity, let’s consider the binary classification here:d

(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner

A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34

Page 66: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Boosting

In bagging, generating “uncorrelated” base-learners is left to chanceand unstability of the learning methodIn boosting, we generate complementary base-learnersHow? Why not train the next learner on the mistakes of the previouslearnersFor simplicity, let’s consider the binary classification here:d

(j)(x) 2 {1,�1}The original boosting algorithm combines three weak learners togenerate a strong learner

A week learner has error probability less than 1/2 (better than randomguessing)A strong learner has arbitrarily small error probability

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 24 / 34

Page 67: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three

2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 68: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 69: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 70: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 71: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 72: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Original Boosting Algorithm

Training

1 Given a large training set, randomly divide it into three2 Use X(1) to train the first learner d

(1) and feed X(2) to d

(1)

3 Use all points misclassified by d

(1) and X(2) to train d

(2). Then feedX(3) to d

(1) and d

(2)

4 Use the points on which d

(1) and d

(2) disagree to train d

(3)

Testing

1 Feed a point it to d

(1) and d

(2) first. If their outputs agree, use themas the final prediction

2 Otherwise the output of d

(3) is taken

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 25 / 34

Page 73: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example

Assuming X(1), X(2), and X(3) are the same:

Disadvantage: requires a large training set to afford the three-way split

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 26 / 34

Page 74: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

AdaBoost

AdaBoost: uses the same training set over and over againHow to make some points “larger?”

Modify the probabilities of drawing the instances as a function of errorNotation:Pr

(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d

(j)

e(j) = Âi

Pr

(i,j)1(y(i) 6= d

(j)(x(i))): error rate of d

(j) on its training set

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34

Page 75: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

AdaBoost

AdaBoost: uses the same training set over and over againHow to make some points “larger?”Modify the probabilities of drawing the instances as a function of error

Notation:Pr

(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d

(j)

e(j) = Âi

Pr

(i,j)1(y(i) 6= d

(j)(x(i))): error rate of d

(j) on its training set

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34

Page 76: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

AdaBoost

AdaBoost: uses the same training set over and over againHow to make some points “larger?”Modify the probabilities of drawing the instances as a function of errorNotation:Pr

(i,j): probability that an example (x(i),y(i)) is drawn to train the jthbase-learner d

(j)

e(j) = Âi

Pr

(i,j)1(y(i) 6= d

(j)(x(i))): error rate of d

(j) on its training set

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 27 / 34

Page 77: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Algorithm

Training

1 Initialize Pr

(i,1) = 1

N

for all i

2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr

(i,j) and usethem to train d

(j)

2 Stop adding new base-learners if e(j) � 1

2

3 Define aj

= 1

2

log

⇣1�e(j)

e(j)

⌘> 0 and set

Pr

(i,j+1) = Pr

(i,j) ·exp(�aj

y

(i)d

(j)(x(i))) for all i

4 Normalize Pr

(i,j+1), 8i, by multiplying⇣

Âi

Pr

(i,j+1)⌘�1

Testing

1 Given x, calculate y

(j) for all j

2 Make final prediction y by voting: y = Âj

aj

d

(j)(x)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34

Page 78: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Algorithm

Training

1 Initialize Pr

(i,1) = 1

N

for all i

2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr

(i,j) and usethem to train d

(j)

2 Stop adding new base-learners if e(j) � 1

2

3 Define aj

= 1

2

log

⇣1�e(j)

e(j)

⌘> 0 and set

Pr

(i,j+1) = Pr

(i,j) ·exp(�aj

y

(i)d

(j)(x(i))) for all i

4 Normalize Pr

(i,j+1), 8i, by multiplying⇣

Âi

Pr

(i,j+1)⌘�1

Testing

1 Given x, calculate y

(j) for all j

2 Make final prediction y by voting: y = Âj

aj

d

(j)(x)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34

Page 79: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Algorithm

Training

1 Initialize Pr

(i,1) = 1

N

for all i

2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr

(i,j) and usethem to train d

(j)

2 Stop adding new base-learners if e(j) � 1

2

3 Define aj

= 1

2

log

⇣1�e(j)

e(j)

⌘> 0 and set

Pr

(i,j+1) = Pr

(i,j) ·exp(�aj

y

(i)d

(j)(x(i))) for all i

4 Normalize Pr

(i,j+1), 8i, by multiplying⇣

Âi

Pr

(i,j+1)⌘�1

Testing

1 Given x, calculate y

(j) for all j

2 Make final prediction y by voting: y = Âj

aj

d

(j)(x)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34

Page 80: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Algorithm

Training

1 Initialize Pr

(i,1) = 1

N

for all i

2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr

(i,j) and usethem to train d

(j)

2 Stop adding new base-learners if e(j) � 1

2

3 Define aj

= 1

2

log

⇣1�e(j)

e(j)

⌘> 0 and set

Pr

(i,j+1) = Pr

(i,j) ·exp(�aj

y

(i)d

(j)(x(i))) for all i

4 Normalize Pr

(i,j+1), 8i, by multiplying⇣

Âi

Pr

(i,j+1)⌘�1

Testing

1 Given x, calculate y

(j) for all j

2 Make final prediction y by voting: y = Âj

aj

d

(j)(x)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34

Page 81: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Algorithm

Training

1 Initialize Pr

(i,1) = 1

N

for all i

2 Start from j = 1:1 Randomly draw N examples from X with probabilities Pr

(i,j) and usethem to train d

(j)

2 Stop adding new base-learners if e(j) � 1

2

3 Define aj

= 1

2

log

⇣1�e(j)

e(j)

⌘> 0 and set

Pr

(i,j+1) = Pr

(i,j) ·exp(�aj

y

(i)d

(j)(x(i))) for all i

4 Normalize Pr

(i,j+1), 8i, by multiplying⇣

Âi

Pr

(i,j+1)⌘�1

Testing

1 Given x, calculate y

(j) for all j

2 Make final prediction y by voting: y = Âj

aj

d

(j)(x)

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 28 / 34

Page 82: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example

d

(j+1) complements d

(j) and d

(j�1) by focusing on predictions theydisagree

Voting weights (aj

= 1

2

log

⇣1�e(j)

e(j)

⌘) in predictions are proportional to

the base-learner’s accuracy

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 29 / 34

Page 83: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Example

d

(j+1) complements d

(j) and d

(j�1) by focusing on predictions theydisagreeVoting weights (a

j

= 1

2

log

⇣1�e(j)

e(j)

⌘) in predictions are proportional to

the base-learner’s accuracyShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 29 / 34

Page 84: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Outline

1Cross Validation

How Many Folds?

2Ensemble Methods

VotingBaggingBoostingWhy AdaBoost Works?

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 30 / 34

Page 85: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why AdaBoost WorksWhy AdaBoost improves performance?

By increasing model complexity? Not exactlyEmpirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error

AdaBoost increases margin [1, 2]

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34

Page 86: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity?

Not exactlyEmpirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error

AdaBoost increases margin [1, 2]

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34

Page 87: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity? Not exactly

Empirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error

AdaBoost increases margin [1, 2]

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34

Page 88: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Why AdaBoost WorksWhy AdaBoost improves performance?By increasing model complexity? Not exactly

Empirical study: AdaBoost reduces overfitting as L grows, even whenthere is no training error

AdaBoost increases margin [1, 2]

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 31 / 34

Page 89: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Margin as Confidence of Predictions

Recall in SVC, a larger marginimproves generalizability

Due to higher confidence

predictions over trainingexamplesWe can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:

margin(x(i),y(i)) = y

(i)f (x(i)) = Â

j:y

(i)=d

(j)(x(i))

aj

� Âj:y

(i) 6=d

(j)(x(i))

aj

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34

Page 90: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Margin as Confidence of Predictions

Recall in SVC, a larger marginimproves generalizabilityDue to higher confidence

predictions over trainingexamples

We can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:

margin(x(i),y(i)) = y

(i)f (x(i)) = Â

j:y

(i)=d

(j)(x(i))

aj

� Âj:y

(i) 6=d

(j)(x(i))

aj

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34

Page 91: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Margin as Confidence of Predictions

Recall in SVC, a larger marginimproves generalizabilityDue to higher confidence

predictions over trainingexamplesWe can define the margin for AdaBoost similarlyIn binary classification, define margin of a prediction of an example(x(i),y(i)) 2 X as:

margin(x(i),y(i)) = y

(i)f (x(i)) = Â

j:y

(i)=d

(j)(x(i))

aj

� Âj:y

(i) 6=d

(j)(x(i))

aj

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 32 / 34

Page 92: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Margin DistributionMargin distribution over q :

PrX(y(i)

f (x(i)) q)⇡ |(x(i),y(i)) : y

(i)f (x(i)) q |

|X|

A complementary learner:Clarifies low confidence areasIncreases margin of points in theseareas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 33 / 34

Page 93: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Margin DistributionMargin distribution over q :

PrX(y(i)

f (x(i)) q)⇡ |(x(i),y(i)) : y

(i)f (x(i)) q |

|X|

A complementary learner:Clarifies low confidence areasIncreases margin of points in theseareas

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 33 / 34

Page 94: Cross Validation & EnsemblingShan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34. Nested Cross Validation Cross validation (CV) can be applied to bothhyperparameter tuning

Reference I

[1] Yoav Freund, Robert Schapire, and N Abe.A short introduction to boosting.Journal-Japanese Society For Artificial Intelligence, 14(771-780):1612,1999.

[2] Liwei Wang, Masashi Sugiyama, Cheng Yang, Zhi-Hua Zhou, and JufuFeng.On the margin explanation of boosting algorithms.In COLT, pages 479–490. Citeseer, 2008.

Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 34 / 34


Recommended