+ All Categories
Home > Documents > Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30...

Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30...

Date post: 20-Jun-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
65
Lecture 11: Cross validation Reading: Chapter 5 STATS 202: Data mining and analysis Jonathan Taylor, 10/17 Slide credits: Sergio Bacallado 1 / 26
Transcript
Page 1: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Lecture 11: Cross validation

Reading: Chapter 5

STATS 202: Data mining and analysis

Jonathan Taylor, 10/17Slide credits: Sergio Bacallado

1 / 26

Page 2: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Comparing classification methods through simulation

1. Simulate data from several different known distributions with 2predictors and a binary response variable.

2. Compare the test error (0-1 loss) for the following methods:

I KNN-1

I KNN-CV (“optimal” KNN)

I Logistic regression

I Linear discriminant analysis (LDA)

I Quadratic discriminant analysis (QDA)

2 / 26

Page 3: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Comparing classification methods through simulation

1. Simulate data from several different known distributions with 2predictors and a binary response variable.

2. Compare the test error (0-1 loss) for the following methods:

I KNN-1

I KNN-CV (“optimal” KNN)

I Logistic regression

I Linear discriminant analysis (LDA)

I Quadratic discriminant analysis (QDA)

2 / 26

Page 4: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 1

KNN−1 KNN−CV LDA Logistic QDA

0.2

50

.30

0.3

50

.40

0.4

5

SCENARIO 1

KNN−1 KNN−CV LDA Logistic QDA

0.1

50

.20

0.2

50

.30

SCENARIO 2

KNN−1 KNN−CV LDA Logistic QDA

0.2

00

.25

0.3

00

.35

0.4

00

.45

SCENARIO 3

I X1, X2 standard normal.

I No correlation in either class.

−2

−1

0

1

2

3

−1 0 1 2 3x1

x2

3 / 26

Page 5: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 2

KNN−1 KNN−CV LDA Logistic QDA

0.2

50

.30

0.3

50

.40

0.4

5

SCENARIO 1

KNN−1 KNN−CV LDA Logistic QDA

0.1

50

.20

0.2

50

.30

SCENARIO 2

KNN−1 KNN−CV LDA Logistic QDA

0.2

00

.25

0.3

00

.35

0.4

00

.45

SCENARIO 3

I X1, X2 standard normal.

I Correlation is -0.5 in bothclasses.

−1

0

1

2

3

−1 0 1 2 3x1

x2

4 / 26

Page 6: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 3

KNN−1 KNN−CV LDA Logistic QDA

0.2

50

.30

0.3

50

.40

0.4

5

SCENARIO 1

KNN−1 KNN−CV LDA Logistic QDA

0.1

50

.20

0.2

50

.30

SCENARIO 2

KNN−1 KNN−CV LDA Logistic QDA

0.2

00

.25

0.3

00

.35

0.4

00

.45

SCENARIO 3

I X1, X2 Student t randomvariables.

I No correlation in either class.

−6

−3

0

3

6

0 5 10 15x1

x2

5 / 26

Page 7: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 4

KNN−1 KNN−CV LDA Logistic QDA

0.3

00.3

50.4

0

SCENARIO 4

KNN−1 KNN−CV LDA Logistic QDA

0.2

00.2

50.3

00.3

50.4

0

SCENARIO 5

KNN−1 KNN−CV LDA Logistic QDA

0.1

80.2

00.2

20.2

40.2

60.2

80.3

00.3

2

SCENARIO 6

I X1, X2 standard normal.

I First class has correlation0.5, second class hascorrelation -0.5.

−2

0

2

4

−2.5 0.0 2.5x1

x2

6 / 26

Page 8: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 5

KNN−1 KNN−CV LDA Logistic QDA

0.3

00.3

50.4

0

SCENARIO 4

KNN−1 KNN−CV LDA Logistic QDA

0.2

00.2

50.3

00.3

50.4

0

SCENARIO 5

KNN−1 KNN−CV LDA Logistic QDA

0.1

80.2

00.2

20.2

40.2

60.2

80.3

00.3

2

SCENARIO 6

I X1, X2 uncorrelated,standard normal.

I Response Y was sampledfrom:

P (Y = 1|X) =

eβ0+β1(X21 )+β2(X

22 )+β3(X1X2)

1 + eβ0+β1(X21 )+β2(X

22 )+β3(X1X2)

.

I The true decision boundaryis quadratic.

7 / 26

Page 9: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 5

KNN−1 KNN−CV LDA Logistic QDA

0.3

00.3

50.4

0

SCENARIO 4

KNN−1 KNN−CV LDA Logistic QDA

0.2

00.2

50.3

00.3

50.4

0

SCENARIO 5

KNN−1 KNN−CV LDA Logistic QDA

0.1

80.2

00.2

20.2

40.2

60.2

80.3

00.3

2

SCENARIO 6

I X1, X2 uncorrelated,standard normal.

I Response Y was sampledfrom:

P (Y = 1|X) =

eβ0+β1(X21 )+β2(X

22 )+β3(X1X2)

1 + eβ0+β1(X21 )+β2(X

22 )+β3(X1X2)

.

I The true decision boundaryis quadratic.

7 / 26

Page 10: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 6

KNN−1 KNN−CV LDA Logistic QDA

0.3

00.3

50.4

0

SCENARIO 4

KNN−1 KNN−CV LDA Logistic QDA

0.2

00.2

50.3

00.3

50.4

0

SCENARIO 5

KNN−1 KNN−CV LDA Logistic QDA

0.1

80.2

00.2

20.2

40.2

60.2

80.3

00.3

2

SCENARIO 6

I X1, X2 uncorrelated,standard normal.

I Response Y was sampledfrom:

P (Y = 1|X) =

efnonlinear(X1,X2)

1 + efnonlinear(X1,X2).

I The true decision boundaryis very rough.

8 / 26

Page 11: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Scenario 6

KNN−1 KNN−CV LDA Logistic QDA

0.3

00.3

50.4

0

SCENARIO 4

KNN−1 KNN−CV LDA Logistic QDA

0.2

00.2

50.3

00.3

50.4

0

SCENARIO 5

KNN−1 KNN−CV LDA Logistic QDA

0.1

80.2

00.2

20.2

40.2

60.2

80.3

00.3

2

SCENARIO 6

I X1, X2 uncorrelated,standard normal.

I Response Y was sampledfrom:

P (Y = 1|X) =

efnonlinear(X1,X2)

1 + efnonlinear(X1,X2).

I The true decision boundaryis very rough.

8 / 26

Page 12: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Thinking about the loss function is important

Most of the regression methods we’ve studied aim to minimize theRSS, while classification methods aim to minimize the 0-1 loss.

In classification, we often care about certain kinds of error morethan others; i.e. the natural loss function is not the 0-1 loss.

Even if we use a method which minimizes a certain kind of trainingerror, we can tune it to optimize our true loss function.

I e.g. Find the threshold that brings the False negative ratebelow an acceptable level.

In the Kaggle competition, what is our loss function?

9 / 26

Page 13: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Thinking about the loss function is important

Most of the regression methods we’ve studied aim to minimize theRSS, while classification methods aim to minimize the 0-1 loss.

In classification, we often care about certain kinds of error morethan others; i.e. the natural loss function is not the 0-1 loss.

Even if we use a method which minimizes a certain kind of trainingerror, we can tune it to optimize our true loss function.

I e.g. Find the threshold that brings the False negative ratebelow an acceptable level.

In the Kaggle competition, what is our loss function?

9 / 26

Page 14: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Thinking about the loss function is important

Most of the regression methods we’ve studied aim to minimize theRSS, while classification methods aim to minimize the 0-1 loss.

In classification, we often care about certain kinds of error morethan others; i.e. the natural loss function is not the 0-1 loss.

Even if we use a method which minimizes a certain kind of trainingerror, we can tune it to optimize our true loss function.

I e.g. Find the threshold that brings the False negative ratebelow an acceptable level.

In the Kaggle competition, what is our loss function?

9 / 26

Page 15: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Thinking about the loss function is important

Most of the regression methods we’ve studied aim to minimize theRSS, while classification methods aim to minimize the 0-1 loss.

In classification, we often care about certain kinds of error morethan others; i.e. the natural loss function is not the 0-1 loss.

Even if we use a method which minimizes a certain kind of trainingerror, we can tune it to optimize our true loss function.

I e.g. Find the threshold that brings the False negative ratebelow an acceptable level.

In the Kaggle competition, what is our loss function?

9 / 26

Page 16: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Thinking about the loss function is important

Most of the regression methods we’ve studied aim to minimize theRSS, while classification methods aim to minimize the 0-1 loss.

In classification, we often care about certain kinds of error morethan others; i.e. the natural loss function is not the 0-1 loss.

Even if we use a method which minimizes a certain kind of trainingerror, we can tune it to optimize our true loss function.

I e.g. Find the threshold that brings the False negative ratebelow an acceptable level.

In the Kaggle competition, what is our loss function?

9 / 26

Page 17: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation

Problem: Choose a supervised method that minimizes the testerror.

In addition, tune the parameters of each method:

I k in k-nearest neighbors.

I The number of variables to include in forward or backwardselection.

I The order of a polynomial in polynomial regression.

Use of a validation set is one way to approximate the test error:

I Divide the data into two parts.

I Train each model with one part.

I Compute the error on the other.

10 / 26

Page 18: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation

Problem: Choose a supervised method that minimizes the testerror. In addition, tune the parameters of each method:

I k in k-nearest neighbors.

I The number of variables to include in forward or backwardselection.

I The order of a polynomial in polynomial regression.

Use of a validation set is one way to approximate the test error:

I Divide the data into two parts.

I Train each model with one part.

I Compute the error on the other.

10 / 26

Page 19: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation

Problem: Choose a supervised method that minimizes the testerror. In addition, tune the parameters of each method:

I k in k-nearest neighbors.

I The number of variables to include in forward or backwardselection.

I The order of a polynomial in polynomial regression.

Use of a validation set is one way to approximate the test error:

I Divide the data into two parts.

I Train each model with one part.

I Compute the error on the other.

10 / 26

Page 20: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation

Problem: Choose a supervised method that minimizes the testerror. In addition, tune the parameters of each method:

I k in k-nearest neighbors.

I The number of variables to include in forward or backwardselection.

I The order of a polynomial in polynomial regression.

Use of a validation set is one way to approximate the test error:

I Divide the data into two parts.

I Train each model with one part.

I Compute the error on the other.

10 / 26

Page 21: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation

Problem: Choose a supervised method that minimizes the testerror. In addition, tune the parameters of each method:

I k in k-nearest neighbors.

I The number of variables to include in forward or backwardselection.

I The order of a polynomial in polynomial regression.

Use of a validation set is one way to approximate the test error:

I Divide the data into two parts.

I Train each model with one part.

I Compute the error on the other.

10 / 26

Page 22: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Goal: Estimate the test error for a supervised learning method.

Strategy:

I Split the data in two parts.

I Train the method in the first part.

I Compute the error on the second part.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!!""!! #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!& !

11 / 26

Page 23: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Goal: Estimate the test error for a supervised learning method.

Strategy:

I Split the data in two parts.

I Train the method in the first part.

I Compute the error on the second part.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!!""!! #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!& !

11 / 26

Page 24: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Goal: Estimate the test error for a supervised learning method.

Strategy:

I Split the data in two parts.

I Train the method in the first part.

I Compute the error on the second part.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!!""!! #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!& !

11 / 26

Page 25: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Goal: Estimate the test error for a supervised learning method.

Strategy:

I Split the data in two parts.

I Train the method in the first part.

I Compute the error on the second part.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!!""!! #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!& !

11 / 26

Page 26: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Polynomial regression to estimate mpg from horsepower in theAuto data.

2 4 6 8 10

16

18

20

22

24

26

28

Degree of Polynomial

Mean S

quare

d E

rror

2 4 6 8 1016

18

20

22

24

26

28

Degree of Polynomial

Mean S

quare

d E

rror

Problem: Every split yields a different estimate of the error.

12 / 26

Page 27: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Validation set approach

Polynomial regression to estimate mpg from horsepower in theAuto data.

2 4 6 8 10

16

18

20

22

24

26

28

Degree of Polynomial

Mean S

quare

d E

rror

2 4 6 8 1016

18

20

22

24

26

28

Degree of Polynomial

Mean S

quare

d E

rror

Problem: Every split yields a different estimate of the error.

12 / 26

Page 28: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

I For every i = 1, . . . , n:

I train the model on every point except i,

I compute the test error on the held out point.

I Average the test errors.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!

%!

%!

13 / 26

Page 29: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

I For every i = 1, . . . , n:

I train the model on every point except i,

I compute the test error on the held out point.

I Average the test errors.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

%!

%!

%!

13 / 26

Page 30: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

I For every i = 1, . . . , n:

I train the model on every point except i,

I compute the test error on the held out point.

I Average the test errors.

CV(n) =1

n

n∑i=1

(yi − y(−i)i )2

Prediction for the i sample without using the ith sample.

14 / 26

Page 31: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

I For every i = 1, . . . , n:

I train the model on every point except i,

I compute the test error on the held out point.

I Average the test errors.

CV(n) =1

n

n∑i=1

1(yi 6= y(−i)i )

... for a classification problem.

15 / 26

Page 32: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

Computing CV(n) can be computationally expensive, since itinvolves fitting the model n times.

For linear regression, there is a shortcut:

CV(n) =1

n

n∑i=1

(yi − yi1− hii

)2

where hii is the leverage statistic.

16 / 26

Page 33: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Leave one out cross-validation

Computing CV(n) can be computationally expensive, since itinvolves fitting the model n times.

For linear regression, there is a shortcut:

CV(n) =1

n

n∑i=1

(yi − yi1− hii

)2

where hii is the leverage statistic.

16 / 26

Page 34: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

k-fold cross-validation

I Split the data into k subsets or folds.

I For every i = 1, . . . , k:I train the model on every fold except the ith fold,

I compute the test error on the ith fold.

I Average the test errors.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%! !%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

17 / 26

Page 35: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

k-fold cross-validation

I Split the data into k subsets or folds.

I For every i = 1, . . . , k:I train the model on every fold except the ith fold,

I compute the test error on the ith fold.

I Average the test errors.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%! !%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

17 / 26

Page 36: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

k-fold cross-validation

I Split the data into k subsets or folds.

I For every i = 1, . . . , k:I train the model on every fold except the ith fold,

I compute the test error on the ith fold.

I Average the test errors.

!"!#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%! !%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

!%&!'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(%!

17 / 26

Page 37: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

LOOCV vs. k-fold cross-validation

2 4 6 8 10

16

18

20

22

24

26

28

LOOCV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

18

20

22

24

26

28

10−fold CV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

I k-fold CV depends on the chosen split (somewhat).

I In k-fold CV, we train the model on less data than what isavailable. This introduces bias into the estimates of test error.

I In LOOCV, the training samples highly resemble each other.This increases the variance of the test error estimate.

I n-fold CV is equivalent LOOCV.

18 / 26

Page 38: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

LOOCV vs. k-fold cross-validation

2 4 6 8 10

16

18

20

22

24

26

28

LOOCV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

18

20

22

24

26

28

10−fold CV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

I k-fold CV depends on the chosen split (somewhat).

I In k-fold CV, we train the model on less data than what isavailable. This introduces bias into the estimates of test error.

I In LOOCV, the training samples highly resemble each other.This increases the variance of the test error estimate.

I n-fold CV is equivalent LOOCV.

18 / 26

Page 39: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

LOOCV vs. k-fold cross-validation

2 4 6 8 10

16

18

20

22

24

26

28

LOOCV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

18

20

22

24

26

28

10−fold CV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

I k-fold CV depends on the chosen split (somewhat).

I In k-fold CV, we train the model on less data than what isavailable. This introduces bias into the estimates of test error.

I In LOOCV, the training samples highly resemble each other.This increases the variance of the test error estimate.

I n-fold CV is equivalent LOOCV.

18 / 26

Page 40: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

LOOCV vs. k-fold cross-validation

2 4 6 8 10

16

18

20

22

24

26

28

LOOCV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

18

20

22

24

26

28

10−fold CV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

I k-fold CV depends on the chosen split (somewhat).

I In k-fold CV, we train the model on less data than what isavailable. This introduces bias into the estimates of test error.

I In LOOCV, the training samples highly resemble each other.This increases the variance of the test error estimate.

I n-fold CV is equivalent LOOCV.

18 / 26

Page 41: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

LOOCV vs. k-fold cross-validation

2 4 6 8 10

16

18

20

22

24

26

28

LOOCV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

2 4 6 8 10

16

18

20

22

24

26

28

10−fold CV

Degree of Polynomial

Me

an

Sq

ua

red

Err

or

I k-fold CV depends on the chosen split (somewhat).

I In k-fold CV, we train the model on less data than what isavailable. This introduces bias into the estimates of test error.

I In LOOCV, the training samples highly resemble each other.This increases the variance of the test error estimate.

I n-fold CV is equivalent LOOCV.18 / 26

Page 42: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Choosing an optimal model

2 5 10 20

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Flexibility

Mean S

quare

d E

rror

2 5 10 20

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Flexibility

Mean S

quare

d E

rror

2 5 10 20

05

10

15

20

Flexibility

Mean S

quare

d E

rror

Even if the error estimates are off, choosing the model with theminimum cross validation error often leads to a method with near

minimum test error.

19 / 26

Page 43: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Choosing an optimal model

In a classification problem, things look similar.

Degree=1

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

Degree=2

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

Degree=3

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

Degree=4

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

- - - Bayes boundary

—— Logistic regressionwith polynomial predictorsof increasing degree.

20 / 26

Page 44: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

Choosing an optimal model

In a classification problem, things look similar.

2 4 6 8 10

0.1

20.1

40.1

60.1

80.2

0

Order of Polynomials Used

Err

or

Ra

te

0.01 0.02 0.05 0.10 0.20 0.50 1.000.1

20.1

40.1

60.1

80.2

0

1/K

Err

or

Ra

te

21 / 26

Page 45: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The one standard error rule

Forward stepwise selection

Elements of Statistical Learning (2nd Ed.) c!Hastie, Tibshirani & Friedman 2009 Chap 7

Subset Size p

Mis

clas

sific

atio

n Er

ror

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

0.6

•• •

••

• ••

•• • • • • • • • • • •

••

• ••

•• •

• • • • • • •• • • • •

FIGURE 7.9. Prediction error (orange) and tenfoldcross-validation curve (blue) estimated from a singletraining set, from the scenario in the bottom right panelof Figure 7.3.

Blue: 10-fold cross validationYellow: True test error

I A number of models with10 ≤ p ≤ 15 have almostthe same CV error.

I The vertical bars represent 1standard error in the testerror from the 10 folds.

I Rule of thumb: Choose thesimplest model whose CVerror is no more than onestandard error above themodel with the lowest CVerror.

22 / 26

Page 46: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The one standard error rule

Forward stepwise selection

Elements of Statistical Learning (2nd Ed.) c!Hastie, Tibshirani & Friedman 2009 Chap 7

Subset Size p

Mis

clas

sific

atio

n Er

ror

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

0.6

•• •

••

• ••

•• • • • • • • • • • •

••

• ••

•• •

• • • • • • •• • • • •

FIGURE 7.9. Prediction error (orange) and tenfoldcross-validation curve (blue) estimated from a singletraining set, from the scenario in the bottom right panelof Figure 7.3.

Blue: 10-fold cross validationYellow: True test error

I A number of models with10 ≤ p ≤ 15 have almostthe same CV error.

I The vertical bars represent 1standard error in the testerror from the 10 folds.

I Rule of thumb: Choose thesimplest model whose CVerror is no more than onestandard error above themodel with the lowest CVerror.

22 / 26

Page 47: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The one standard error rule

Forward stepwise selection

Elements of Statistical Learning (2nd Ed.) c!Hastie, Tibshirani & Friedman 2009 Chap 7

Subset Size p

Mis

clas

sific

atio

n Er

ror

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

0.6

•• •

••

• ••

•• • • • • • • • • • •

••

• ••

•• •

• • • • • • •• • • • •

FIGURE 7.9. Prediction error (orange) and tenfoldcross-validation curve (blue) estimated from a singletraining set, from the scenario in the bottom right panelof Figure 7.3.

Blue: 10-fold cross validationYellow: True test error

I A number of models with10 ≤ p ≤ 15 have almostthe same CV error.

I The vertical bars represent 1standard error in the testerror from the 10 folds.

I Rule of thumb: Choose thesimplest model whose CVerror is no more than onestandard error above themodel with the lowest CVerror.

22 / 26

Page 48: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The one standard error rule

Forward stepwise selection

Elements of Statistical Learning (2nd Ed.) c!Hastie, Tibshirani & Friedman 2009 Chap 7

Subset Size p

Mis

clas

sific

atio

n Er

ror

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

0.6

•• •

••

• ••

•• • • • • • • • • • •

••

• ••

•• •

• • • • • • •• • • • •

FIGURE 7.9. Prediction error (orange) and tenfoldcross-validation curve (blue) estimated from a singletraining set, from the scenario in the bottom right panelof Figure 7.3.

Blue: 10-fold cross validationYellow: True test error

I A number of models with10 ≤ p ≤ 15 have almostthe same CV error.

I The vertical bars represent 1standard error in the testerror from the 10 folds.

I Rule of thumb: Choose thesimplest model whose CVerror is no more than onestandard error above themodel with the lowest CVerror.

22 / 26

Page 49: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

Reading: Section 7.10.2 of The Elements of Statistical Learning.

We want to classify 200 individuals according to whether they havecancer or not.

We use logistic regression onto 1000 measurementsof gene expression.

Proposed strategy:

I Using all the data, select the 20 most significant genes usingz-tests.

I Estimate the test error of logistic regression with these 20predictors via 10-fold cross validation.

23 / 26

Page 50: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

Reading: Section 7.10.2 of The Elements of Statistical Learning.

We want to classify 200 individuals according to whether they havecancer or not. We use logistic regression onto 1000 measurementsof gene expression.

Proposed strategy:

I Using all the data, select the 20 most significant genes usingz-tests.

I Estimate the test error of logistic regression with these 20predictors via 10-fold cross validation.

23 / 26

Page 51: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

Reading: Section 7.10.2 of The Elements of Statistical Learning.

We want to classify 200 individuals according to whether they havecancer or not. We use logistic regression onto 1000 measurementsof gene expression.

Proposed strategy:

I Using all the data, select the 20 most significant genes usingz-tests.

I Estimate the test error of logistic regression with these 20predictors via 10-fold cross validation.

23 / 26

Page 52: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

Reading: Section 7.10.2 of The Elements of Statistical Learning.

We want to classify 200 individuals according to whether they havecancer or not. We use logistic regression onto 1000 measurementsof gene expression.

Proposed strategy:

I Using all the data, select the 20 most significant genes usingz-tests.

I Estimate the test error of logistic regression with these 20predictors via 10-fold cross validation.

23 / 26

Page 53: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

To see how that works, let’s use the following simulated data:

I Each gene expression is standard normal and independent ofall others.

I The response (cancer or not) is sampled from a coin flip — nocorrelation to any of the “genes”.

What should the misclassification rate be for any classificationmethod using these predictors?

Roughly 50%.

24 / 26

Page 54: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

To see how that works, let’s use the following simulated data:

I Each gene expression is standard normal and independent ofall others.

I The response (cancer or not) is sampled from a coin flip — nocorrelation to any of the “genes”.

What should the misclassification rate be for any classificationmethod using these predictors?

Roughly 50%.

24 / 26

Page 55: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

To see how that works, let’s use the following simulated data:

I Each gene expression is standard normal and independent ofall others.

I The response (cancer or not) is sampled from a coin flip — nocorrelation to any of the “genes”.

What should the misclassification rate be for any classificationmethod using these predictors?

Roughly 50%.

24 / 26

Page 56: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

To see how that works, let’s use the following simulated data:

I Each gene expression is standard normal and independent ofall others.

I The response (cancer or not) is sampled from a coin flip — nocorrelation to any of the “genes”.

What should the misclassification rate be for any classificationmethod using these predictors?

Roughly 50%.

24 / 26

Page 57: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

To see how that works, let’s use the following simulated data:

I Each gene expression is standard normal and independent ofall others.

I The response (cancer or not) is sampled from a coin flip — nocorrelation to any of the “genes”.

What should the misclassification rate be for any classificationmethod using these predictors?

Roughly 50%.

24 / 26

Page 58: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

We run this simulation, and obtain a CV error rate of 3%!

Why is this?

I Since we only have 200 individuals in total, among 1000variables, at least some will be correlated with the response.

I We do variable selection using all the data, so the variables weselect have some correlation with the response in every subsetor fold in the cross validation.

25 / 26

Page 59: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

We run this simulation, and obtain a CV error rate of 3%!

Why is this?

I Since we only have 200 individuals in total, among 1000variables, at least some will be correlated with the response.

I We do variable selection using all the data, so the variables weselect have some correlation with the response in every subsetor fold in the cross validation.

25 / 26

Page 60: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The wrong way to do cross validation

We run this simulation, and obtain a CV error rate of 3%!

Why is this?

I Since we only have 200 individuals in total, among 1000variables, at least some will be correlated with the response.

I We do variable selection using all the data, so the variables weselect have some correlation with the response in every subsetor fold in the cross validation.

25 / 26

Page 61: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The right way to do cross validation

I Divide the data into 10 folds.

I For i = 1, . . . , 10:

I Using every fold except i, perform the variable selection and fitthe model with the selected variables.

I Compute the error on fold i.

I Average the 10 test errors obtained.

In our simulation, this produces an error estimate of close to 50%.

Moral of the story: Every aspect of the learning method thatinvolves using the data — variable selection, for example — mustbe cross-validated.

26 / 26

Page 62: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The right way to do cross validation

I Divide the data into 10 folds.

I For i = 1, . . . , 10:

I Using every fold except i, perform the variable selection and fitthe model with the selected variables.

I Compute the error on fold i.

I Average the 10 test errors obtained.

In our simulation, this produces an error estimate of close to 50%.

Moral of the story: Every aspect of the learning method thatinvolves using the data — variable selection, for example — mustbe cross-validated.

26 / 26

Page 63: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The right way to do cross validation

I Divide the data into 10 folds.

I For i = 1, . . . , 10:

I Using every fold except i, perform the variable selection and fitthe model with the selected variables.

I Compute the error on fold i.

I Average the 10 test errors obtained.

In our simulation, this produces an error estimate of close to 50%.

Moral of the story: Every aspect of the learning method thatinvolves using the data — variable selection, for example — mustbe cross-validated.

26 / 26

Page 64: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The right way to do cross validation

I Divide the data into 10 folds.

I For i = 1, . . . , 10:

I Using every fold except i, perform the variable selection and fitthe model with the selected variables.

I Compute the error on fold i.

I Average the 10 test errors obtained.

In our simulation, this produces an error estimate of close to 50%.

Moral of the story: Every aspect of the learning method thatinvolves using the data — variable selection, for example — mustbe cross-validated.

26 / 26

Page 65: Lecture 11: Cross validation - Stanford UniversityScenario2 KNN!1 KNN!CV LDA Logistic QDA 0.25 0.30 0.35 0.40 0.45 SCENARIO 1 KNN!1 KNN!CV LDA Logistic QDA 0.15 0.20 0.25 0.30 SCENARIO

The right way to do cross validation

I Divide the data into 10 folds.

I For i = 1, . . . , 10:

I Using every fold except i, perform the variable selection and fitthe model with the selected variables.

I Compute the error on fold i.

I Average the 10 test errors obtained.

In our simulation, this produces an error estimate of close to 50%.

Moral of the story: Every aspect of the learning method thatinvolves using the data — variable selection, for example — mustbe cross-validated.

26 / 26


Recommended