Post on 01-Aug-2020
transcript
UVACS6316/4501–Fall2016
MachineLearning
Lecture5:Non-LinearRegressionModels
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
9/12/16
Dr.YanjunQi/UVACS6316/f16
1
Wherearewe?èFivemajorsecGonsofthiscourse
q Regression(supervised)q ClassificaGon(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
9/12/16 2
Dr.YanjunQi/UVACS6316/f16
TodayèRegression(supervised)
q Fourwaystotrain/performopGmizaGonforlinearregressionmodelsq NormalEquaGonq GradientDescent(GD)q StochasGcGDq Newton’smethod
q Supervisedregressionmodels
q Linearregression(LR)q LRwithnon-linearbasisfuncGonsq LocallyweightedLRq LRwithRegularizaGons
9/12/16 3
Dr.YanjunQi/UVACS6316/f16
Today
q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear
– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)
9/12/16 4
Dr.YanjunQi/UVACS6316/f16
TradiGonalProgramming
MachineLearning
ComputerData
ProgramOutput
ComputerData
OutputProgram
9/12/16 5
Dr.YanjunQi/UVACS6316/f16
Machine Learning in a Nutshell
Task
Representation
Score Function
Search/Optimization
Models, Parameters
9/12/16 6
MLgrewoutofworkinAIOp#mizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata
Dr.YanjunQi/UVACS6316/f16
(1) Multivariate Linear Regression
Regression
Y = Weighted linear sum of X�s
Sum of squared error
Linear algebra / GD / SGD
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
y = f (x)=θ0 +θ1x1 +θ2x29/12/16 7
Dr.YanjunQi/UVACS6316/f16
Today
q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear
– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)
9/12/16 8
Dr.YanjunQi/UVACS6316/f16
Dr.YanjunQi/UVACS6316/f16
9
LRwithnon-linearbasisfuncGons
• LRdoesnotmeanwecanonlydealwithlinearrelaGonships
• Wearefreetodesign(non-linear)features(e.g.,basisfuncGonderived)underLR
wherethearefixedbasisfuncGons(alsodefine).
• E.g.:polynomialregression:
!!y =θ0 + θ jϕ j(x)j=1m∑ =θTϕ(x)
ϕ(x):= 1,x ,x2⎡⎣ ⎤⎦T
9/12/16
!!ϕ j(x)
!!ϕ0(x)=1
e.g.(1)polynomialregression
• IntroducebasisfuncGons
9/12/16
Dr.YanjunQi/UVACS6316/f16
10Dr.NandodeFreitas’stutorialslide
θ * = ϕTϕ( )−1ϕT !y
e.g.(1)polynomialregression
9/12/16
Dr.YanjunQi/UVACS6316/f16
11
KEY:ifthebasesaregiven,theproblemoflearningtheparametersissGlllinear.
Dr.YanjunQi/UVACS6316/f16
12
ManyPossibleBasisfuncGons• TherearemanybasisfuncGons,e.g.:
– Polynomial
– RadialbasisfuncGons
– Sigmoidal
– Splines,– Fourier,– Wavelets,etc
ϕ j (x) = xj−1
( )⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2sx
x jj
µφ exp)(
⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
sx
x jj
µσφ )(
9/12/16
ManyPossibleBasisfuncGons
9/12/16
Dr.YanjunQi/UVACS6316/f16
13
Dr.YanjunQi/UVACS6316/f16
14
e.g.(2)LRwithradial-basisfuncGons
• E.g.:LRwithRBFregression:
!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
9/12/16
!!ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3
(x ,r3),Kλ4(x ,r4 )⎡⎣
⎤⎦T
θ * = ϕTϕ( )−1ϕT !y
Kλ(x ,r)= exp − (x − r)
2
2λ2⎛
⎝⎜⎞
⎠⎟
RBF=radial-basisfuncGon:afuncGonwhichdependsonlyontheradialdistancefromacentrepoint
GaussianRBFè
asdistancefromthecenterrincreases,theoutputoftheRBFdecreases
1Dcase 2Dcase
9/12/16
Dr.YanjunQi/UVACS6316/f16
15
9/12/16
Dr.YanjunQi/UVACS6316/f16
16
Kλ(x ,r)= exp − (x − r)
2
2λ2⎛
⎝⎜⎞
⎠⎟
X=
10.6065307
0.1353353
0.0001234098
Kλ(x ,r)=r
r +λ
r +2λr +3λ
e.g.anotherLinearregressionwith1DRBFbasisfuncGons
(assuming3predefinedcentresandwidth)
9/12/16
Dr.YanjunQi/UVACS6316/f16
17
ϕ(x):= 1,Kλ1(x ,r1),Kλ2(x ,r2),Kλ3
(x ,r3)⎡⎣
⎤⎦T
θ * = ϕTϕ( )−1ϕT !y
Dr.YanjunQi/UVACS6316/f16
18
e.g.aLRwith1DRBFs(3predefinedcentresandwidth)
• 1DRBF
• Aierfit:
9/12/16
Dr.YanjunQi/UVACS6316/f16
19
e.g.2DGoodandBadRBFs
• Agood2DRBF
• Twobad2DRBFs
9/12/16
Twomainissues:
• Learntheparameter\theta– AlmostthesameasLR,justèXto– LinearcombinaGonofbasisfuncGons(thatcanbenon-linear)
• Howtochoosethemodelorder,e.g.whatpolynomialdegreeforpolynomialregression
9/12/16
Dr.YanjunQi/UVACS6316/f16
20
ϕ(x)
Dr.YanjunQi/UVACS6316/f16
21
Issue:Overfilngandunderfilng
xy 10 θθ += 2210 xxy θθθ ++= ∑ =
= 5
0jj
j xy θ
9/12/16
K-foldCrossValidaGon!!!!
Generalisation:learnfuncGon/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples
Underfit Looksgood Overfit
(2) Multivariate Linear Regression with basis Expansion
Regression
Y = Weighted linear sum of (X basis expansion)
SSE
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
9/12/16 22
Dr.YanjunQi/UVACS6316/f16
Today
q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear
– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)
9/12/16 23
Dr.YanjunQi/UVACS6316/f16
24
Locally weighted regression
• aka locally weighted regression, local linear regression, LOESS, …
9/12/16
Dr.YanjunQi/UVACS6316/f16
linear_func(x)->yèTorepresentonlytheneighborregionofx_0
UseRBFfuncGontopickout/emphasizetheneighborregionofx_0è
Kλ (xi, x0 )
Kλ (xi, x0 )
Dr.YanjunQi/UVACS6316/f16
25
Locallyweightedlinearregression
Insteadofminimizingnowwefittominimize
∑=
−=n
ii
Ti yJ
1
2
21 )()( θθ x
J(θ ) = 12
wi (xiTθ − yi )
2
i=1
n
∑
wi = Kλ(x i ,x0)= exp −
(x i − x0)22λ2
⎛
⎝⎜
⎞
⎠⎟
9/12/16
wherex_0isthequerypointforwhichwe'dliketoknowitscorresponding y
Dr.YanjunQi/UVACS6316/f16
26
LocallyweightedlinearregressionWefit\thetatominimizeWheredowi'scomefrom?
• x_0isthequerypointforwhichwe'dliketoknowitscorresponding y
àEssenGallyweputhigherweightson(thoseerrorsfrom)trainingexamplesthatareclosetothequerypointx_0(thanthosethatarefurtherawayfromthequerypoint)
J(θ ) = 12
wi (xiTθ − yi )
2
i=1
n
∑
wi = Kλ(x i ,x0)= exp −
(x i − x0)22λ2
⎛
⎝⎜
⎞
⎠⎟
9/12/16
27
Locally weighted linear regression
• ThewidthofRBFmasers!
9/12/16
Dr.YanjunQi/UVACS6316/f16
9/12/16
Dr. Yanjun Qi / UVA CS 6316 / f16
Locally weighted linear regression
• èSeparate weighted least squares at each target point x0:
min
θ0(x0 ),θ1(x0 )Kλ(xi ,x0)[ yi −θ0(x0)−θ1(x0)xi ]2
i=1
N
∑
f (x0)=θ0!(x0)+θ1
!(x0)x0
LEARNING of Locally weighted linear regression
9/12/16
Dr.YanjunQi/UVACS6316/f16
x0
29
è Separate weighted least squares at each target point x0
f (x0)=θ0!(x0)+θ1
!(x0)x0
9/12/16
Dr. Yanjun Qi / UVA CS 6316 / f16
Locally weighted linear regression
• b(x)T=(1,x); • B: Nx2 regression matrix with i-th row b(x)T;
( ) NixxKdiagxW iNN ,,1,),()( 00 …==× λ
f (x0 ) = b(x0 )T (BTW (x0 )B)
−1BTW (x0 )y
e.g.whenforonlyonefeaturevariable
versus LR f (xq ) = (xq )Tθ * = (xq )
T XTX( )−1XT !y
LWR
9/12/16
More è Local Weighted Polynomial Regression
• Local polynomial fits of any degree d
∑∑ ∑
=
= ==
+=
⎥⎦
⎤⎢⎣
⎡−−
d
jj
j
N
i
d
j
jijiidjxx
xxxxf
xxxyxxKj
1 0000
1
2
1000,,1),(),(
)(ˆ)(ˆ)(ˆ
)()(),(min00
βα
βαλβα …
Dr.YanjunQi/UVACS6316/f16
Blue:trueGreen:esGmated
Dr.YanjunQi/UVACS6316/f16
32
Parametricvs.non-parametric• Locallyweightedlinearregressionisanon-parametric
algorithm.
• The(unweighted)linearregressionalgorithmthatwesawearlierisknownasaparametriclearningalgorithm– becauseithasafixed,finitenumberofparameters(the),whicharefittothedata;
– Oncewe'vefitthe\thetaandstoredthemaway,wenolongerneedtokeepthetrainingdataaroundtomakefuturepredicGons.
– Incontrast,tomakepredicGonsusinglocallyweightedlinearregression,weneedtokeeptheenGretrainingsetaround.
• Theterm"non-parametric"(roughly)referstothefactthattheamountofknowledgeweneedtokeep,inordertorepresentthehypothesisgrowswithlinearlythesizeofthetrainingset.
9/12/16
θ
(3) Locally Weighted / Kernel Linear Regression
Regression
Y = Weighted linear sum of X�s
Weighted SSE
Linear algebra
Local Regression coefficients
(conditioned on each test point)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min
α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2
i=1
N
∑9/12/16 33
Dr.YanjunQi/UVACS6316/f16
TodayRecap
q MachineLearningMethodinanutshellq RegressionModelsBeyondLinear
– LRwithnon-linearbasisfuncGons– Locallyweightedlinearregression– RegressiontreesandMulGlinearInterpolaGon(later)
9/12/16 34
Dr.YanjunQi/UVACS6316/f16
35
ProbabilisGcInterpretaGonofLinearRegression(LATER)
• LetusassumethatthetargetvariableandtheinputsarerelatedbytheequaGon:
whereεisanerrortermofunmodeledeffectsorrandomnoise
• NowassumethatεfollowsaGaussianN(0,σ),thenwehave:
• Byiid(amongsamples)assumpGon:
iiT
iy εθ += x
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
221
σθ
σπθ )(
exp);|( iT
iii
yxyp
x
⎟⎟
⎠
⎞⎜⎜
⎝
⎛ −−⎟
⎠⎞⎜
⎝⎛== ∑∏ =
=2
12
1 221
σθ
σπθθ
n
i iT
inn
iii
yxypL
)(exp);|()(
x
9/12/16
Dr.YanjunQi/UVACS6316/f16
ManymorevariaGonsofLRfromthisperspecGve,e.g.binomial/poisson
(LATER)
References
• BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides
q Prof.NandodeFreitas’stutorialslide
9/12/16 36
Dr.YanjunQi/UVACS6316/f16