Download - Regression

Regression

CS294 Practical Machine Learning

Romain Thibaux09/18/06

Outline

• Ordinary Least Squares regression– Derivation from minimizing the sum of squares– Probabilistic interpretation– Online version (LMS)

• Overfitting and Regularization

• Numerical stability

• L1 Regression

• Kernel Regression, Spline Regression• Multiple Adaptive Regression Splines (MARS)

Classification (reminder)

X ! YAnything:

• continuous (, d, …)

• discrete ({0,1}, {1,…k}, …)

• structured (tree, string, …)

• …

• discrete:

– {0,1} binary

– {1,…k} multi-class

– tree, etc. structured


XAnything:


• discrete ({0,1}, {1,…k}, …)


• …


XAnything:


• discrete ({0,1}, {1,…k}, …)


• …

Perceptron

Logistic Regression

Support Vector Machine

Decision TreeDecision TreeRandom ForestRandom Forest

Kernel trickKernel trick

Regression

X ! Y• continuous:

– , dAnything:


• discrete ({0,1}, {1,…k}, …)


• …

1

Examples

• Voltage ! Temperature

• Processes, memory ! Power consumption• Protein structure ! Energy [next week]

• Robot arm controls ! Torque at effector

• Location, industry, past losses ! Premium

Linear regression

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

0 10 200

20

40

[start Matlab demo lecture2.m]

Given examples

Predict given a new point

0 200

20

40

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

Linear regression

Prediction Prediction

Ordinary Least Squares (OLS)

0 200

Error or “residual”

Prediction

Observation

Sum squared error

Minimize the sum squared error

Sum squared error

Linear equation

Linear system

Alternative derivation

n

d Solve the system (it’s better not to invert the matrix)

LMS Algorithm(Least Mean Squares)

where

Online algorithm

Beyond lines and planes

everything is the same with

still linear in

0 10 200

20

40

Geometric interpretation

[Matlab demo]

010

200

100

200

300

400

-10

0

10

20

Ordinary Least Squares [summary]

n

d

Let

For example

Let

Minimize by solving

Given examples

Predict

Probabilistic interpretation

0 200

Likelihood

Assumptions vs. Reality

Voltage

0 1 2 3 4 5 6 70

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Intel sensor network data

Temperature

Overfitting

0 2 4 6 8 10 12 14 16 18 20-15

-10

-5

0

5

10

15

20

25

30

[Matlab demo]

Degree 15 polynomial

Ridge Regression(Regularization)

0 2 4 6 8 10 12 14 16 18 20-10

-5

0

5

10

15Effect of regularization (degree 19)

with “small”

Minimize by solving

Probabilistic interpretation

Likelihood

Prior

Posterior

Numerical Accuracy

Condition number

vs

We want covariates as perpendicular as possible, and roughly the same scale• Regularization• Preconditioning

Errors in Variables(Total Least Squares)

00

Sensitivity to outliers

High weight given to outliers

010

2030

40

0

10

20

30

5

10

15

20

25

Temperature at noon

Influence function

L1 Regression

Linear program Influence function

Kernel Regression

0 2 4 6 8 10 12 14 16 18 20-10

-5

0

5

10

15Kernel regression (sigma=1)

Spline RegressionRegression on each interval

5200 5400 5600 5800

50

60

70

Spline RegressionWith equality constraints

5200 5400 5600 5800

50

60

70

Spline RegressionWith L1 cost

5200 5400 5600 5800

50

60

70

0 1 20

#requests per minute

Time (days)

5000

Heteroscedasticity

MARSMultivariate Adaptive Regression Splines

…on the board…

Further topics

• Generalized Linear Models

• Gaussian process regression

• Local Linear regression• Feature Selection [next class]