+ All Categories
Home > Documents > UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model...

UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model...

Date post: 08-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
77
UVA CS 6316: Machine Learning Lecture 6: Linear Regression Model with Regularizations Dr. Yanjun Qi University of Virginia Department of Computer Science
Transcript
Page 1: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

UVA CS 6316: Machine Learning

Lecture 6: Linear Regression Model with Regularizations

Dr. Yanjun Qi

University of Virginia Department of Computer Science

Page 2: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Last: Multivariate Linear Regression with basis Expansion

Regression: y continuous

Y = Weighted linear sum of (X basis expansion)

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 2

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters :

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

Page 3: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today: Regularized multivariate linear regression

Regression

Y = Weighted linear sum of X’s

Least-squares + Regularization

Linear algebra for Ridge / sub-GD for Lasso & Elastic

Regression coefficients (regularized weights)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

9/18/19 Dr. Yanjun Qi / UVA CS 3

min J(β ) = Y −Y^⎛

⎝⎞⎠

2

i=1

n

∑ + λ( β jq )1/q

j=1

p

Page 4: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

We aim to make the learned model

•1. Generalize Well

• 2. Computational Scalable and Efficient

• 3. Robust / Trustworthy / Interpretable• Especially for some domains, this is about trust!

9/18/19 Dr. Yanjun Qi / UVA CS 4

Page 5: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Choose Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 5

Page 6: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

SUPERVISED Regression

• When, target Y is a continuous target variable

9/18/19 Dr. Yanjun Qi / 6

f(x?)

Training dataset consists of input-

output pairs

Page 7: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Review: Normal equation for LR• Write the cost function in matrix form:

To minimize J(θ), take derivative and set to zero:

9/18/19 Dr. Yanjun Qi / UVA CS

7

J(β)= 12 (x iTβ − yi )2i=1

n

= 12 Xβ − !y( )T Xβ − !y( )= 12 βT XT Xβ −βT XT !y − !yT Xβ + !yT !y( )

⇒ XTXβ = XT !yThe normal equations

β* = XTX( )−1 XT !y

ß

X =

−− x1T −−

−− x2T −−

! ! !−− xn

T −−

"

#

$$$$$

%

&

'''''

Y =

y1y2!yn

!

"

#####

$

%

&&&&&

Assume that XTX is invertible

Page 8: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Comments on the normal equation

What if X has less than full column rank? àNot Invertible

9/18/19 Dr. Yanjun Qi / UVA CS

8

Page 9: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 9

Page11 0f Handout L2

Page 10: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Choose Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 10

Page 11: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

A norm of a vector ||x|| is informally a measure of the “length” of the vector.

– Common norms: L1, L2 (Euclidean)

– Linfinity

Review: Vector norms

9/18/19 Dr. Yanjun Qi / UVA CS 11

Page 12: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Review: Vector Norm (L2, when p=2)

9/18/19 Dr. Yanjun Qi / UVA CS 12

Page 13: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Lasso Quadratic

Norms

Page 14: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Ridge Regression / L2 Regularization

• If not invertible, a classical solution is to add a small positive element to diagonal

9/18/19 Dr. Yanjun Qi / UVA CS 14

β * = XTX +λI( )−1 XT !y

By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term

β* = XTX( )−1 XT !y

Page 15: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 15

One important property of positive definite matrices is that

è They are always full rank, and hence, invertible.

è Extra: See Proof at Page 17-18 of Linear-Algebra Handout

Extra: Positive Definite Matrix

Page 16: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 16

Page 17: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 17

β* = XTX +λI( )−1 XT !y

Extra: Positive Definite Matrix

Page 18: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Ridge Regression / Squared Loss+L2

• As the solution from

9/18/19 Dr. Yanjun Qi / UVA CS 18

β * = XTX +λI( )−1 XT !y

β! ridge = argmin( y − Xβ)T( y − Xβ)+λβTβ

HW2

to minimize, take derivative and set to zero

By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term

Page 19: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Ridge Regression / Squared Loss+L2

9/18/19 Dr. Yanjun Qi / UVA CS 19

β * = XTX +λI( )−1 XT !y

β!ridge

= argmin( y − Xβ )T( y − Xβ )+λβTβ

HW2

to minimize, take derivative and set to zero

By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term

• As the solution from

Page 20: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Ridge Regression / Squared Loss+L2

9/18/19 Dr. Yanjun Qi / UVA CS 20

β * = XTX +λI( )−1 XT !y

β!ridge

= argmin( y − Xβ )T( y − Xβ )+λβTβ

HW2

to minimize, take derivative and set to zero

By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term

• As the solution from

• Equivalently β!ridge

= argmin( y − Xβ )T( y − Xβ )subjectto

j={1..p}∑ β j

2 ≤ s2

Page 21: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 21

Surface map

Contour map

Review

Page 22: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 22

β!ridge

= argmin( y − Xβ )T( y − Xβ )subjectto

j={1..p}∑ β j

2 ≤ s2

Page 23: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Objective Function’s Contour lines from Ridge Regression

9/18/19 Dr. Yanjun Qi / UVA CS 23

OLS: Least Square

solution

s

β!ridge

= argmin( y − Xβ )T( y − Xβ )subjectto

j={1..p}∑ β j

2 ≤ s2

Page 24: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Objective Function’s Contour lines from Ridge Regression

9/18/19 Dr. Yanjun Qi / UVA CS 24

OLS: Least Square

solution

Ridge Regression

solution

s

Page 25: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 25

�1

�2

Least Square+L2: Ridge solution

s

Least Square

solution

Ridge Regression

solution

Page 26: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 26

Page 27: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Parameter Shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 27

βOLS = XTX( )−1 XT !y

Page65 of ESL book @ http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

When

When

βRg = XTX +λI( )−1 XT !y

Page 28: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Extra: two forms of Ridge Regression

• Totally equivalent

9/18/19 Dr. Yanjun Qi / UVA CS 28

http://stats.stackexchange.com/questions/190993/how-to-find-regression-coefficients-beta-in-ridge-regression

Page 29: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Ridge Regression: Squared Loss+L2

• > 0 penalizes each

• if = 0 we get the least squares estimator;

• if , then to zero

9/18/19 Dr. Yanjun Qi / UVA CS 29

l

l

¥®l

β𝑗

β𝑗

Page 30: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 30

�1

�2

üInfluence of Regularization Parameter

Least Square

solution

Ridge Regression

solution

Page 31: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 31

�1

�2

�1

�2

λ→∞

üInfluence of Regularization Parameter

Page 32: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 32

�1

�2 λ→0üInfluence of Regularization Parameter

Page 33: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Pick Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 33

Page 34: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

(2) Lasso (least absolute shrinkage and selection operator) / Squared Loss+L1

• The lasso is a shrinkage method like ridge, but acts in a nonlinear manner on the outcome y.

• The lasso is defined by

9/18/19 Dr. Yanjun Qi / UVA CS 34

β lasso = argmin( y − X β )T( y − X β )subjectto β j ≤ s∑

By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term

( yi − xiTβ )2i=1n∑

Page 35: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Lasso (least absolute shrinkage and selection operator)

• Suppose in 2 dimension• β= (β1 , β2)• | β1 |+| β2 |=const• | β1 |+|- β2 |=const• | -β1 |+| β2 |=const• | -β1 |+| -β2 |=const

9/18/19 Dr. Yanjun Qi / UVA CS 35

s

Least Square

solution

Lasso Solution

Page 36: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

• In the Figure, the solution has eliminated the role of x2, leading to sparsity

9/18/19 Dr. Yanjun Qi / UVA CS 36

s

Least Square

solution

Lasso Solution

Page 37: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 37

Ridge Regression

Lasso Estimator

ss

Page 38: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Lasso (least absolute shrinkage and selection operator)

• Notice that ridge penalty is replaced by

• Due to the nature of the constraint, if tuning parameter is chosen small enough, then the lasso will set some coefficients exactly to zero.

9/18/19 Dr. Yanjun Qi / UVA CS 38

å jbå 2

jb

Page 39: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Lasso: Implicit Feature Selection

9/18/19 Dr. Yanjun Qi / UVA CS 39

X

p

n

Page 40: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

e.g., Leukemia Diagnosis

9/18/19 Dr. Yanjun Qi / UVA CS 40

Golub et al, Science Vol 286:15 Oct. 1999

-1

+1

n

{yi},

Page 41: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 41

Page 42: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Pick Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 42

Page 43: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Lasso for when p>n• Prediction accuracy and model interpretation are two important

aspects of regression models.

• LASSO does shrinkage and variable selection simultaneously for better prediction and model interpretation.

Disadvantage:-In p>n case, lasso selects at most n variable before it saturates -If there is a group of variables among which the pairwise

correlations are very high, then lasso select one from the group

9/18/19 Dr. Yanjun Qi / UVA CS 43

Page 44: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

(3) Hybrid of Ridge and Lasso : Elastic Net regularization

• L1 part of the penalty generates a sparse model • L2 part of the penalty (extra):

• Remove the limitation of the number of selected variables • Encouraging group effect• Stabilize the L1 regularization path

9/18/19 Dr. Yanjun Qi / UVA CS 44

Page 45: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Naïve elastic net• For any non negative fixed λ1 and λ2, naive elastic net criterion:

• The naive elastic net estimator is the minimizer of above equation

• Equivalently:

9/18/19 Dr. Yanjun Qi / UVA CS 45

Page 46: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Naïve elastic net• For any non negative fixed λ1 and λ2, naive elastic net criterion:

• The naive elastic net estimator is the minimizer of above

• Equivalently:

9/18/19 Dr. Yanjun Qi / UVA CS 46

Page 47: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Geometry of elastic net

9/18/19 Dr. Yanjun Qi / UVA CS 47

Page 48: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

e.g. A Practical Application of Regression Model

9/18/19 Dr. Yanjun Qi / UVA CS 48

Proceedings of HLT ’2010 Human Language Technologies:

Page 49: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 49

Page 50: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 50

e.g., Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 (1.7k n / >3k features)

e.g. counts of a ngram in

the text

Page 51: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 51

The feature weights can be directly interpreted as U.S. dollars contributed to the predicted value yˆ by each

occurrence of the feature.

to movies

A REAL APPLICATION: Movie Reviews and meta to Revenues

Page 52: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 52

Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 Human Language Technologies:

Use linear regression to directly predict the opening weekend gross earnings, denoted as y, based on features x extracted from the

movie metadata and/or the text of the reviews.

Page 53: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 53

An example of how real applications use the elastic net and its weights!

Here, the features are from the text-only model annotated in Table 2.

The feature weights can be directly interpreted as U.S. dollars contributed to the predicted value by each occurrence of the feature.

Sentiment-related text features are not as prominent as might be expected, and their overall proportion in the set of features with non-zero weights is quite small (estimated in preliminary trials at less than 15%). Phrases that refer to metadata are the more highly weighted and frequent ones.

Page 54: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 54

A combination of the meta and text features achieves the best performance both in terms of MAE and pearson r.

Page 55: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

• Pearson correlation coefficient

• For regression:

9/18/19 Dr. Yanjun Qi / UVA CS

r(x , y)=(xi − x)( yi − y)

i=1

m

(xi − x)2 × ( yi − y)2i=1

m

∑i=1

m

wherex = 1m xii=1

m

∑ and y = 1m yii=1

m

∑ .

r(x , y) ≤1

More Ways for Measuring Regression Predictions: Correlation Coefficient

r(!ypredicted ,

!yknown )

• Measuring the linear correlationbetween two sequences, x and y,

• giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.

55

Page 56: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Advantage of Elastic net (Extra)

• Native Elastic set can be converted to lasso with augmented data form

• In the augmented formulation, • sample size n+p and X* has rank p • è can potentially select all the predictors

• Naïve elastic net can perform automatic variable selection like lasso

9/18/19 Dr. Yanjun Qi / UVA CS 56

Page 57: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Summary: Regularized multivariate linear regression

9/18/19 Dr. Yanjun Qi / UVA CS

57/54

• Model: pp xxY^

11

^

0

^^bbb +++= !

• Ridge regression estimation:

• LR estimation:

• LASSO estimation:

argmin Y −Y

^⎛

⎝⎜⎞

⎠⎟

2

argmin Y −Y^⎛

⎝⎜⎞

⎠⎟

2

i=1

n

∑ +λ β jj=1

p

argmin Y −Y

^⎛

⎝⎜⎞

⎠⎟

2

i=1

n

∑ +λ β j2

j=1

p

Page 58: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Regularized multivariate linear regression

Regression

Y = Weighted linear sum of X’s

Least-squares + Regularization

Linear algebra for Ridge / sub-GD for Lasso & Elastic

Regression coefficients (regularized weights)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

9/18/19 Dr. Yanjun Qi / UVA CS 58

min J(β ) = Y −Y^⎛

⎝⎞⎠

2

i=1

n

∑ + λ( β jq )1/q

j=1

p

Page 59: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

More: A family of shrinkage estimators

• for q >=0, contours of constant value of are shown for the case of two inputs.

9/18/19 Dr. Yanjun Qi / UVA CS 59

β = argminβ ( yi − xiTβ)2i=1N∑

subjectto β j∑q≤ s

å j

q

jb

Page 60: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

norms visualized

all p-norms penalize larger weights

q < 2 tends to create sparse (i.e. lots of 0 weights)

q > 2 tends to push for similar weights

q

Page 61: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

We aim to make the learned model

•1. Generalize Well

• 2. Computationally Scalable and Efficient

• 3. Robust / Trustworthy / Interpretable• Especially for some domains, this is about trust!

9/18/19 Dr. Yanjun Qi / UVA CS 61

Page 62: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to pick Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 62

Page 63: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Regularized multivariate linear regression

Regression

Y = Weighted linear sum of X’s

Least-squares + Regularization

Linear algebra for Ridge / sub-GD for Lasso & Elastic

Regression coefficients (regularized weights)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

9/18/19 Dr. Yanjun Qi / UVA CS 63

min J(β ) = Y −Y^⎛

⎝⎞⎠

2

i=1

n

∑ + λ( β jq )1/q

j=1

p

Page 64: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Common regularizers

L2: Squared weights penalizes large values more

L1: Sum of weights will penalize small values more

β jj

β 2j

j

Generally, we don’t want huge weights

If weights are large, a small change in a feature can result in a large change in the prediction

Might also prefer weights of 0 for features that aren’t useful

Page 65: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Model Selection & Generalization

• Generalisation: learn function / hypothesis from past data in order to “explain”, “predict”, “model” or “control” new data examples

• Underfitting: when model is too simple, both training and test errors are large

• Overfitting: when model is too complex and test errors are large although training errors are small.

• After learning knowledge, model tends to learn “noise”

9/18/19 Dr. Yanjun Qi / UVA CS 65

Page 66: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Issue: Overfitting and underfitting

9/18/19 Dr. Yanjun Qi / UVA CS

66

xy 10 qq += 2210 xxy qqq ++= å =

=5

0jj

j xy q

K-fold Cross Validation !!!!

Generalisation: learn function / hypothesis from past data in order to “explain”, “predict”, “model” or “control” new data examples

Under fit Looks good Over fit

Page 67: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Overfitting: Handled by Regularization

A regularizer is an additional criteria to the loss function to make sure that we don’t overfit

It’s called a regularizer since it tries to keep the parameters more normal/regular

It is a bias on the model forces the learning to prefer certain types of weights over others, e.g.,

β! ridge = argminβ ( yi − xiTβ)2i=1

n∑ +λβTβ

Page 68: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

WHY and How to Select λ?

• 1. Generalization ability è k-folds CV to decide

• 2. Control the bias and Variance of the model (details in future lectures)

9/18/19 Dr. Yanjun Qi / UVA CS 68

L2: Squared weights penalizes large values more

L1: Sum of weights will penalize small values more

β jj

β 2j

j

Page 69: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 69

Regularization path of a Ridge Regression

¥®l λ = 0

Page 70: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 70

Regularization path of a Ridge Regression

¥®l λ = 0

Weight Decay

An example with 8 features

Page 71: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 71

Regularization path of a Lasso Regression

¥®l λ = 0

when varying λ, how βj varies.

An example with 8 features

Page 72: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

An example of Ridge Regression

when varying λ, how βjvaries.

9/18/19 Dr. Yanjun Qi / UVA CS 72

λ increases

λ→∞ λ = 0

Choose λ that generalizes well !

An example with 8 features

Page 73: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

9/18/19 Dr. Yanjun Qi / UVA CS 73

¥®l λ = 0

Choose λ that generalizes well !

when varying λ, how βj varies.

An example with 8 features

Page 74: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Today Recap

q Linear Regression Model with Regularizations

üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüInfluence of Regularization Parameter

9/18/19 Dr. Yanjun Qi / UVA CS 74

Page 75: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Regression (supervised)

q Four ways to train / perform optimization for linear regression modelsq Normal Equationq Gradient Descent (GD) q Stochastic GD q Newton’s method

qSupervised regression models qLinear regression (LR) qLR with non-linear basis functionsqLocally weighted LRqLR with Regularizations

9/18/19 Dr. Yanjun Qi / UVA CS 75

Page 76: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

Extra More

• Optimization of regularized regressions: • See L6-extra slide

• Relation between λ and s • See L6-extra slide

• Why Elastic Net has a few nice properties • See L6-extra slide

9/18/19 Dr. Yanjun Qi / UVA CS 76

Page 77: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage

References

q Big thanks to Prof. Eric Xing @ CMU for allowing me to reuse some of his slides

q Prof. Nando de Freitas’s tutorial slideq Regularization and variable selection via the elastic net, Hui Zou

and Trevor Hastie, Stanford University, USAqESL book: Elements of Statistical Learning

9/18/19 Dr. Yanjun Qi / UVA CS 77


Recommended