Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my...

1/8/17

1

CSE 446: Machine Learning

Linear Regression: Model and Algorithms CSE 446: Machine Learning Emily Fox University of Washington January 9, 2017

©2017 Emily Fox


Linear regression: The model

©2017 Emily Fox

1/8/17

2

CSE 446: Machine Learning 3

How much is my house worth?

©2017 Emily Fox

I want to list my house

for sale


How much is my house worth?

©2017 Emily Fox

$$ ????

1/8/17

3


Data

©2017 Emily Fox

(x1 = sq.ft., y1 = $)

(x2 = sq.ft., y2 = $)

(x3 = sq.ft., y3 = $)

(x4 = sq.ft., y4 = $) Input vs. Output: •  y is the quantity of interest

•  assume y can be predicted from x

input output

…

(x5 = sq.ft., y5 = $)

CSE 446: Machine Learning 6 ©2017 Emily Fox

square feet (sq.ft.)

pri

ce

($

)

x

y

Model – How we assume the world works

Regression model:

1/8/17

4



pri

ce

($

)

x

y

“Essen/ally,allmodelsarewrong,butsomeareuseful.”

GeorgeBox,1987.

Model – How we assume the world works



pri

ce

($

)

x

y

Simple linear regression model

yi = w0+w1 xi + εi

f(x) = w0+w1 x

1/8/17

5



pri

ce

($

)

x

y

Simple linear regression model

parameters: regression coefficients

yi = w0+w1 xi + εi



pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2

x2

What about a quadratic function?

1/8/17

6


Even higher order polynomial

©2017 Emily Fox


pri

ce

($

)

x

y

f(x) = w0 + w1 x+ w2

x2 + … + wp xp


Polynomial regression

Model: yi = w0 + w1

xi+ w2 xi

2 + … + wp xi

p + εi

feature 1 = 1 (constant) feature 2 = x feature 3 = x2 … feature p+1 = xp

©2017 Emily Fox

treat as different features

parameter 1 = w0 parameter 2 = w1 parameter 3 = w2 … parameter p+1 = wp

1/8/17

7


Generic basis expansion

Model: yi = w0h0(xi) + w1

h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1 feature 2 = h1(x)… e.g., x feature 3 = h2(x)… e.g., x2 or sin(2πx/12) … feature p+1 = hp(x)… e.g., xp

©2017 Emily Fox

jth regression coefficient or weight

jth feature

DX

j=0


Generic basis expansion

Model: yi = w0h0(xi) + w1

h1(xi) + … + wD hD(xi)+ εi

= wj hj(xi) + εi

feature 1 = h0(x)…often 1 (constant) feature 2 = h1(x)… e.g., x feature 3 = h2(x)… e.g., x2 or sin(2πx/12) … feature D+1 = hD(x)… e.g., xp

©2017 Emily Fox

DX

j=0

1/8/17

8


Predictions just based on house size

©2017 Emily Fox


pri

ce

($

)

x

y

Only 1 bathroom! Not same as my

3 bathrooms


Add more inputs

©2017 Emily Fox


pri

ce

($

)

x[1]

y

x[2]

f(x) = w0 + w1 sq.ft.

+ w2 #bath

1/8/17

9


Many possible inputs

- Square feet

- # bathrooms

- # bedrooms

- Lot size

- Year built

- …

©2017 Emily Fox


General notation

Output: y Inputs: x = (x[1],x[2],…, x[d]) Notational conventions:

x[j] = jth input (scalar) hj(x) = jth feature (scalar) xi = input of ith data point (vector) xi[j] = jth input of ith data point (scalar)

©2017 Emily Fox

d-dim vector

scalar

1/8/17

10


Generic linear regression model

©2017 Emily Fox

Model: yi = w0 h0(xi) + w1

h1(xi) + … + wD hD(xi) + εi

= wj hj(xi) + εi

feature 1 = h0(x) … e.g., 1 feature 2 = h1(x) … e.g., x[1] = sq. ft. feature 3 = h2(x) … e.g., x[2] = #bath

or, log(x[7]) x[2] = log(#bed) x #bath … feature D+1 = hD(x) … some other function of x[1],…, x[d]

DX

j=0


Fitting the linear regression model

©2017 Emily Fox

1/8/17

11


Step 1: Rewrite the regression model

©2017 Emily Fox


Rewrite in matrix notation

For observation i

©2017 Emily Fox

yi = wj hj(xi) + εi

= + εi

yi

DX

j=0

= + εi

1/8/17

12


Rewrite in matrix notation

For all observations together

©2017 Emily Fox

= +


Step 2: Compute the cost

©2017 Emily Fox

1/8/17

13


“Cost” of using a given line

©2017 Emily Fox


pri

ce

($

)

x

y Residual sum of squares (RSS)

RSS(w0,w1) = (yi-[w0+w1xi])2


RSS for multiple regression

©2017 Emily Fox


pri

ce

($

)

y

RSS(w) = (yi- )2

= (y-Hw)T(y-Hw)

x[1]

x[2]

1/8/17

14


Step 3: Take the gradient

©2017 Emily Fox


Gradient of RSS

©2017 Emily Fox

RSS(w) = [(y-Hw)T(y-Hw)]

= -2HT(y-Hw)

Δ Δ

Why? By analogy to 1D case:

1/8/17

15


Step 4, Approach 1: Set the gradient = 0

©2017 Emily Fox


Closed-form solution

©2017 Emily Fox

RSS(w) = -2HT(y-Hw) = 0

Δ

Solve for w:

1/8/17

16


Closed-form solution

©2017 Emily Fox

ŵ = ( HTH )-1 HTy

Invertible if: Complexity of inverse:


Step 4, Approach 2: Gradient descent

©2017 Emily Fox

1/8/17

17


Gradient descent

©2017 Emily Fox

while not converged

w(t+1) ß w(t) - η RSS(w(t))

Δ

-2HT(y-Hw)


Interpreting elementwise

©2017 Emily Fox

wj(t+1) ß wj

(t) + 2η hj(xi)(yi-ŷi(w(t)))

Update to jth feature weight:


pri

ce

($

)

y

x[1]

x[2]

1/8/17

18


Summary of gradient descent for multiple regression

©2017 Emily Fox

init w(1)=0 (or randomly, or smartly), t=1

while || RSS(w(t))|| > ε for j=0,…,D

partial[j] =-2 hj(xi)(yi-ŷi(w(t)))

wj(t+1) ß wj

(t) – η partial[j]

t ß t + 1

Δ


Why min RSS?

©2017 Emily Fox

1/8/17

19


Assuming Gaussian noise

©2017 Emily Fox


pri

ce

($

)

x

y Model for εi: Implied distribution on yi:


Maximum likelihood estimate of params

Maximize log-likelihood wrt w

©2017 Emily Fox

ln p(D | w,�) = ln

✓1

�p2⇡

◆N NY

j=1

e�(y

i

�P

j

w

j

h

j

(xi

))2

2�2

1/8/17

20


Interpreting the fitted function

©2015 Emily Fox & Carlos Guestrin

CSE 446: Machine Learning 40 ©2015 Emily Fox & Carlos Guestrin


pri

ce

($

)

x

y

Interpreting the coefficients – Simple linear regression

ŷ = ŵ0 + ŵ1 x

1 sq. ft.

predicted change in $

1/8/17

21


fix


Interpreting the coefficients – Two linear features


pri

ce

($

)

x[1]

y

x[2]

ŷ = ŵ0 + ŵ1 x[1] + ŵ2

x[2]


fix


Interpreting the coefficients – Two linear features

ŷ = ŵ0 + ŵ1 x[1] + ŵ2

x[2]

# bathrooms

pri

ce

($

)

x[2]

1 bathroom

predicted change in $

y

For fixed # sq.ft.!

1/8/17

22


fix fix fix fix


Interpreting the coefficients – Multiple linear features ŷ = ŵ0 + ŵ1

x[1] + …+ŵj x[j] + … + ŵd

x[d]


pri

ce

($

)

x[1]

y

x[2]

CSE 446: Machine Learning 44 ©2015 Emily Fox & Carlos Guestrin

Interpreting the coefficients- Polynomial regression

ŷ = ŵ0 + ŵ1x +… + ŵj xj + … + ŵp

xp


pri

ce

($

)

x

y

Can’t hold other features

fixed!

1/8/17

23


Recap of concepts

©2017 Emily Fox


What you can do now… •  Describe polynomial regression •  Write a regression model using multiple inputs or

features thereof •  Cast both polynomial regression and regression

with multiple inputs as regression with multiple features

•  Calculate a goodness-of-fit metric (e.g., RSS) •  Estimate model parameters of a general multiple

regression model to minimize RSS: -  In closed form -  Using an iterative gradient descent algorithm

•  Interpret the coefficients of a non-featurized multiple regression fit

•  Exploit the estimated model to form predictions

©2017 Emily Fox

Date post:	12-Jun-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Linear Regression: Model and Algorithms · 1/8/17 2 3 CSE 446: Machine Learning How much is my...

Documents