Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis...

transcript

UVACS6316/4501–Fall2016

MachineLearning

Lecture5:Non-LinearRegressionModels

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

9/12/16

Dr.YanjunQi/UVACS6316/f16

Wherearewe?èFivemajorsecGonsofthiscourse

q Regression(supervised)q ClassificaGon(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

9/12/16 2

TodayèRegression(supervised)

q Fourwaystotrain/performopGmizaGonforlinearregressionmodelsq NormalEquaGonq GradientDescent(GD)q StochasGcGDq Newton’smethod

q Supervisedregressionmodels

q Linearregression(LR)q LRwithnon-linearbasisfuncGonsq LocallyweightedLRq LRwithRegularizaGons

9/12/16 3

q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear

– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)

9/12/16 4

TradiGonalProgramming

MachineLearning

ComputerData

ProgramOutput

ComputerData

OutputProgram

9/12/16 5

Machine Learning in a Nutshell

Representation

Score Function

Search/Optimization

Models, Parameters

9/12/16 6

MLgrewoutofworkinAIOp#mizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

(1) Multivariate Linear Regression

Regression

Y = Weighted linear sum of X�s

Sum of squared error

Linear algebra / GD / SGD

Regression coefficients

Representation

Score Function

Search/Optimization

Models, Parameters

y = f (x)=θ0 +θ1x1 +θ2x29/12/16 7

9/12/16 8

LRwithnon-linearbasisfuncGons

•  LRdoesnotmeanwecanonlydealwithlinearrelaGonships

•  Wearefreetodesign(non-linear)features(e.g.,basisfuncGonderived)underLR

wherethearefixedbasisfuncGons(alsodefine).

•  E.g.:polynomialregression:

!!y =θ0 + θ jϕ j(x)j=1m∑ =θTϕ(x)

ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T

9/12/16

!!ϕ j(x)

!!ϕ0(x)=1

e.g.(1)polynomialregression

•  IntroducebasisfuncGons

9/12/16

10Dr.NandodeFreitas’stutorialslide

θ * = ϕTϕ( )−1ϕT !y

e.g.(1)polynomialregression

9/12/16

KEY:ifthebasesaregiven,theproblemoflearningtheparametersissGlllinear.

ManyPossibleBasisfuncGons•  TherearemanybasisfuncGons,e.g.:

–  Polynomial

–  RadialbasisfuncGons

–  Sigmoidal

–  Splines,–  Fourier,–  Wavelets,etc

ϕ j (x) = xj−1

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

µφ exp)(

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

µσφ )(

9/12/16

ManyPossibleBasisfuncGons

9/12/16

e.g.(2)LRwithradial-basisfuncGons

•  E.g.:LRwithRBFregression:

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

9/12/16

!!ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3),Kλ4(x ,r4 )⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

Kλ(x ,r)= exp − (x − r)

2λ2⎛

⎝⎜⎞

⎠⎟

RBF=radial-basisfuncGon:afuncGonwhichdependsonlyontheradialdistancefromacentrepoint

GaussianRBFè

asdistancefromthecenterrincreases,theoutputoftheRBFdecreases

1Dcase 2Dcase

9/12/16

Kλ(x ,r)= exp − (x − r)

2λ2⎛

⎝⎜⎞

⎠⎟

10.6065307

0.1353353

0.0001234098

Kλ(x ,r)=r

r +2λr +3λ

e.g.anotherLinearregressionwith1DRBFbasisfuncGons

(assuming3predefinedcentresandwidth)

9/12/16

ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3

(x ,r3)⎡⎣

⎤⎦T

θ * = ϕTϕ( )−1ϕT !y

e.g.aLRwith1DRBFs(3predefinedcentresandwidth)

•  1DRBF

•  Aierfit:

9/12/16

e.g.2DGoodandBadRBFs

•  Agood2DRBF

•  Twobad2DRBFs

9/12/16

Twomainissues:

•  Learntheparameter\theta– AlmostthesameasLR,justèXto– LinearcombinaGonofbasisfuncGons(thatcanbenon-linear)

•  Howtochoosethemodelorder,e.g.whatpolynomialdegreeforpolynomialregression

9/12/16

Issue:Overfilngandunderfilng

xy 10 θθ += 2210 xxy θθθ ++= ∑ =

j xy θ

9/12/16

K-foldCrossValidaGon!!!!

Generalisation:learnfuncGon/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

Underfit Looksgood Overfit

(2) Multivariate Linear Regression with basis Expansion

Regression

Y = Weighted linear sum of (X basis expansion)

Linear algebra

Regression coefficients

Representation

Score Function

Search/Optimization

Models, Parameters

!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

9/12/16 22

9/12/16 23

Locally weighted regression

•  aka locally weighted regression, local linear regression, LOESS, …

9/12/16

linear_func(x)->yèTorepresentonlytheneighborregionofx_0

UseRBFfuncGontopickout/emphasizetheneighborregionofx_0è

Kλ (xi, x0 )

Locallyweightedlinearregression

Insteadofminimizingnowwefittominimize

21 )()( θθ x

J(θ ) = 12

wi (xiTθ − yi )

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

9/12/16

wherex_0isthequerypointforwhichwe'dliketoknowitscorresponding y

LocallyweightedlinearregressionWefit\thetatominimizeWheredowi'scomefrom?

•  x_0isthequerypointforwhichwe'dliketoknowitscorresponding y

àEssenGallyweputhigherweightson(thoseerrorsfrom)trainingexamplesthatareclosetothequerypointx_0(thanthosethatarefurtherawayfromthequerypoint)

J(θ ) = 12

wi (xiTθ − yi )

wi = Kλ(x i ,x0)= exp −

(x i − x0)22λ2

⎝⎜

⎠⎟

9/12/16

Locally weighted linear regression

•  ThewidthofRBFmasers!

9/12/16

Dr. Yanjun Qi / UVA CS 6316 / f16

•  èSeparate weighted least squares at each target point x0:

θ0(x0 ),θ1(x0 )Kλ(xi ,x0)[ yi −θ0(x0)−θ1(x0)xi ]2

f (x0)=θ0!(x0)+θ1

!(x0)x0

LEARNING of Locally weighted linear regression

9/12/16

è Separate weighted least squares at each target point x0

f (x0)=θ0!(x0)+θ1

!(x0)x0

9/12/16

Dr. Yanjun Qi / UVA CS 6316 / f16

•  b(x)T=(1,x); •  B: Nx2 regression matrix with i-th row b(x)T;

( ) NixxKdiagxW iNN ,,1,),()( 00 …==× λ

f (x0 ) = b(x0 )T (BTW (x0 )B)

−1BTW (x0 )y

e.g.whenforonlyonefeaturevariable

versus LR f (xq ) = (xq )Tθ * = (xq )

T XTX( )−1XT !y

9/12/16

More è Local Weighted Polynomial Regression

•  Local polynomial fits of any degree d

∑∑ ∑

⎥⎦

⎤⎢⎣

⎡−−

jijiidjxx

xxxyxxKj

1 0000

1000,,1),(),(

)(ˆ)(ˆ)(ˆ

)()(),(min00

βαλβα …

Blue:trueGreen:esGmated

Parametricvs.non-parametric•  Locallyweightedlinearregressionisanon-parametric

algorithm.

•  The(unweighted)linearregressionalgorithmthatwesawearlierisknownasaparametriclearningalgorithm–  becauseithasafixed,finitenumberofparameters(the),whicharefittothedata;

–  Oncewe'vefitthe\thetaandstoredthemaway,wenolongerneedtokeepthetrainingdataaroundtomakefuturepredicGons.

–  Incontrast,tomakepredicGonsusinglocallyweightedlinearregression,weneedtokeeptheenGretrainingsetaround.

•  Theterm"non-parametric"(roughly)referstothefactthattheamountofknowledgeweneedtokeep,inordertorepresentthehypothesisgrowswithlinearlythesizeofthetrainingset.

9/12/16

(3) Locally Weighted / Kernel Linear Regression

Regression

Y = Weighted linear sum of X�s

Weighted SSE

Linear algebra

Local Regression coefficients

(conditioned on each test point)

Representation

Score Function

Search/Optimization

Models, Parameters

0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2

∑9/12/16 33

TodayRecap

9/12/16 34

ProbabilisGcInterpretaGonofLinearRegression(LATER)

•  LetusassumethatthetargetvariableandtheinputsarerelatedbytheequaGon:

whereεisanerrortermofunmodeledeffectsorrandomnoise

•  NowassumethatεfollowsaGaussianN(0,σ),thenwehave:

•  Byiid(amongsamples)assumpGon:

iy εθ += x

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

σπθ )(

exp);|( iT

⎟⎟

⎞⎜⎜

⎛ −−⎟

⎠⎞⎜

⎝⎛== ∑∏ =

σπθθ

)(exp);|()(

9/12/16

ManymorevariaGonsofLRfromthisperspecGve,e.g.binomial/poisson

(LATER)

References

•  BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

q Prof.NandodeFreitas’stutorialslide

9/12/16 36

Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis...

Documents