+ All Categories
Home > Documents > Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my...

Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my...

Date post: 12-Jun-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
23
1/8/17 1 Linear Regression: Model and Algorithms CSE 446: Machine Learning Emily Fox University of Washington January 9, 2017 ©2017 Emily Fox CSE 446: Machine Learning Linear regression: The model ©2017 Emily Fox
Transcript
Page 1: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

1

CSE 446: Machine Learning

Linear Regression: Model and Algorithms CSE 446: Machine Learning Emily Fox University of Washington January 9, 2017

©2017 Emily Fox

CSE 446: Machine Learning

Linear regression: The model

©2017 Emily Fox

Page 2: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

2

CSE 446: Machine Learning 3

How much is my house worth?

©2017 Emily Fox

I want to list my house

for sale

CSE 446: Machine Learning 4

How much is my house worth?

©2017 Emily Fox

$$ ????

Page 3: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

3

CSE 446: Machine Learning

Data

©2017 Emily Fox

(x1 = sq.ft., y1 = $)

(x2 = sq.ft., y2 = $)

(x3 = sq.ft., y3 = $)

(x4 = sq.ft., y4 = $) Input vs. Output: •  y is the quantity of interest

•  assume y can be predicted from x

input output

(x5 = sq.ft., y5 = $)

CSE 446: Machine Learning 6 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Model – How we assume the world works

Regression model:

Page 4: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

4

CSE 446: Machine Learning 7 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

“Essen/ally,allmodelsarewrong,butsomeareuseful.”

GeorgeBox,1987.

Model – How we assume the world works

CSE 446: Machine Learning 8 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Simple linear regression model

yi = w0+w1 xi + εi

f(x) = w0+w1 x

Page 5: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

5

CSE 446: Machine Learning 9 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Simple linear regression model

parameters: regression coefficients

yi = w0+w1 xi + εi

CSE 446: Machine Learning 10 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2

x2

What about a quadratic function?

Page 6: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

6

CSE 446: Machine Learning 11

Even higher order polynomial

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2

x2 + … + wp xp

CSE 446: Machine Learning 12

Polynomial regression

Model: yi = w0 + w1

xi+ w2 xi

2 + … + wp xi

p + εi

feature 1 = 1 (constant) feature 2 = x feature 3 = x2 … feature p+1 = xp

©2017 Emily Fox

treat as different features

parameter 1 = w0 parameter 2 = w1 parameter 3 = w2 … parameter p+1 = wp

Page 7: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

7

CSE 446: Machine Learning 13

Generic basis expansion

Model: yi = w0h0(xi) + w1

h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1 feature 2 = h1(x)… e.g., x feature 3 = h2(x)… e.g., x2 or sin(2πx/12) … feature p+1 = hp(x)… e.g., xp

©2017 Emily Fox

jth regression coefficient or weight

jth feature

DX

j=0

CSE 446: Machine Learning 14

Generic basis expansion

Model: yi = w0h0(xi) + w1

h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1 (constant) feature 2 = h1(x)… e.g., x feature 3 = h2(x)… e.g., x2 or sin(2πx/12) … feature D+1 = hD(x)… e.g., xp

©2017 Emily Fox

DX

j=0

Page 8: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

8

CSE 446: Machine Learning 15

Predictions just based on house size

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Only 1 bathroom! Not same as my

3 bathrooms

CSE 446: Machine Learning 16

Add more inputs

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x[1]

y

x[2]

f(x) = w0 + w1 sq.ft.

+ w2 #bath

Page 9: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

9

CSE 446: Machine Learning 17

Many possible inputs

- Square feet

- # bathrooms

- # bedrooms

- Lot size

- Year built

- …

©2017 Emily Fox

CSE 446: Machine Learning 18

General notation

Output: y Inputs: x = (x[1],x[2],…, x[d]) Notational conventions:

x[j] = jth input (scalar) hj(x) = jth feature (scalar) xi = input of ith data point (vector) xi[j] = jth input of ith data point (scalar)

©2017 Emily Fox

d-dim vector

scalar

Page 10: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

10

CSE 446: Machine Learning 19

Generic linear regression model

©2017 Emily Fox

Model: yi = w0 h0(xi) + w1

h1(xi) + … + wD hD(xi) + εi

= wj hj(xi) + εi

feature 1 = h0(x) … e.g., 1 feature 2 = h1(x) … e.g., x[1] = sq. ft. feature 3 = h2(x) … e.g., x[2] = #bath

or, log(x[7]) x[2] = log(#bed) x #bath … feature D+1 = hD(x) … some other function of x[1],…, x[d]

DX

j=0

CSE 446: Machine Learning

Fitting the linear regression model

©2017 Emily Fox

Page 11: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

11

CSE 446: Machine Learning

Step 1: Rewrite the regression model

©2017 Emily Fox

CSE 446: Machine Learning 22

Rewrite in matrix notation

For observation i

©2017 Emily Fox

yi = wj hj(xi) + εi

= + εi

yi

DX

j=0

= + εi

Page 12: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

12

CSE 446: Machine Learning 23

Rewrite in matrix notation

For all observations together

©2017 Emily Fox

= +

CSE 446: Machine Learning

Step 2: Compute the cost

©2017 Emily Fox

Page 13: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

13

CSE 446: Machine Learning 25

“Cost” of using a given line

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y Residual sum of squares (RSS)

RSS(w0,w1) = (yi-[w0+w1xi])2

CSE 446: Machine Learning 26

RSS for multiple regression

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

y

RSS(w) = (yi- )2

= (y-Hw)T(y-Hw)

x[1]

x[2]

Page 14: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

14

CSE 446: Machine Learning

Step 3: Take the gradient

©2017 Emily Fox

CSE 446: Machine Learning 28

Gradient of RSS

©2017 Emily Fox

RSS(w) = [(y-Hw)T(y-Hw)]

= -2HT(y-Hw)

Δ Δ

Why? By analogy to 1D case:

Page 15: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

15

CSE 446: Machine Learning

Step 4, Approach 1: Set the gradient = 0

©2017 Emily Fox

CSE 446: Machine Learning 30

Closed-form solution

©2017 Emily Fox

RSS(w) = -2HT(y-Hw) = 0

Δ

Solve for w:

Page 16: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

16

CSE 446: Machine Learning 31

Closed-form solution

©2017 Emily Fox

ŵ = ( HTH )-1 HTy

Invertible if: Complexity of inverse:

CSE 446: Machine Learning

Step 4, Approach 2: Gradient descent

©2017 Emily Fox

Page 17: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

17

CSE 446: Machine Learning 33

Gradient descent

©2017 Emily Fox

while not converged

w(t+1) ß w(t) - η RSS(w(t))

Δ

-2HT(y-Hw)

CSE 446: Machine Learning 34

Interpreting elementwise

©2017 Emily Fox

wj(t+1) ß wj

(t) + 2η hj(xi)(yi-ŷi(w(t)))

Update to jth feature weight:

square feet (sq.ft.)

pri

ce

($

)

y

x[1]

x[2]

Page 18: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

18

CSE 446: Machine Learning 35

Summary of gradient descent for multiple regression

©2017 Emily Fox

init w(1)=0 (or randomly, or smartly), t=1

while || RSS(w(t))|| > ε for j=0,…,D

partial[j] =-2 hj(xi)(yi-ŷi(w(t)))

wj(t+1) ß wj

(t) – η partial[j]

t ß t + 1

Δ

CSE 446: Machine Learning

Why min RSS?

©2017 Emily Fox

Page 19: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

19

CSE 446: Machine Learning 37

Assuming Gaussian noise

©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y Model for εi: Implied distribution on yi:

CSE 446: Machine Learning 38

Maximum likelihood estimate of params

Maximize log-likelihood wrt w

©2017 Emily Fox

ln p(D | w,�) = ln

✓1

�p2⇡

◆N NY

j=1

e�(y

i

�P

j

w

j

h

j

(xi

))2

2�2

Page 20: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

20

CSE 446: Machine Learning

Interpreting the fitted function

©2015 Emily Fox & Carlos Guestrin

CSE 446: Machine Learning 40 ©2015 Emily Fox & Carlos Guestrin

square feet (sq.ft.)

pri

ce

($

)

x

y

Interpreting the coefficients – Simple linear regression

ŷ = ŵ0 + ŵ1 x

1 sq. ft.

predicted change in $

Page 21: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

21

CSE 446: Machine Learning 41

fix

©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients – Two linear features

square feet (sq.ft.)

pri

ce

($

)

x[1]

y

x[2]

ŷ = ŵ0 + ŵ1 x[1] + ŵ2

x[2]

CSE 446: Machine Learning 42

fix

©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients – Two linear features

ŷ = ŵ0 + ŵ1 x[1] + ŵ2

x[2]

# bathrooms

pri

ce

($

)

x[2]

1 bathroom

predicted change in $

y

For fixed # sq.ft.!

Page 22: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

22

CSE 446: Machine Learning 43

fix fix fix fix

©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients – Multiple linear features ŷ = ŵ0 + ŵ1

x[1] + …+ŵj x[j] + … + ŵd

x[d]

square feet (sq.ft.)

pri

ce

($

)

x[1]

y

x[2]

CSE 446: Machine Learning 44 ©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients- Polynomial regression

ŷ = ŵ0 + ŵ1x +… + ŵj xj + … + ŵp

xp

square feet (sq.ft.)

pri

ce

($

)

x

y

Can’t hold other features

fixed!

Page 23: Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my house worth? ©2017 Emily Fox I want to list my house for sale 4 CSE 446: Machine Learning

1/8/17

23

CSE 446: Machine Learning

Recap of concepts

©2017 Emily Fox

CSE 446: Machine Learning 46

What you can do now… •  Describe polynomial regression •  Write a regression model using multiple inputs or

features thereof •  Cast both polynomial regression and regression

with multiple inputs as regression with multiple features

•  Calculate a goodness-of-fit metric (e.g., RSS) •  Estimate model parameters of a general multiple

regression model to minimize RSS: -  In closed form -  Using an iterative gradient descent algorithm

•  Interpret the coefficients of a non-featurized multiple regression fit

•  Exploit the estimated model to form predictions

©2017 Emily Fox


Recommended