Post on 19-Jan-2016
transcript
Kernel MethodsKernel Methods
Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
23/4/21 Kernel Methods 2
Outline
• One-Dimensional Kernel Smoothers• Local Regression• Local Likelihood• Kernel Density estimation• Naive Bayes• Radial Basis Functions• Mixture Models and EM
23/4/21 Kernel Methods 3
One-Dimensional Kernel Smoothers
• k-NN:
• 30-NN curve is bumpy, since is discontinuous in x.
• The average changes in a discrete way, leading to a discontinuous .
))(|()(ˆ xNxyAvexf kii
)(ˆ xf
)(ˆ xf
23/4/21 Kernel Methods 4
• Nadaraya-Watson Kernel weighted average:
• Epanechnikov quadratic kernel:
One-Dimensional Kernel Smoothers
N
i i
N
i ii
xxK
yxxKxf
1 0
1 00
),(
),()(ˆ
0
0 ),(xx
DxxK
23/4/21 Kernel Methods 5
One-Dimensional Kernel Smoothers
• More general kernel:
– : width function that determines the width
of the neighborhood at x0.
– For quadratic kernel– For k-NN kernel
Variance constant– The Epanechnikov kernel has compact support
)(),(
0
00 xh
xxDxxK
)( 0xh
constantBias ,)( 0 xh|,|)( ][00 kk
replacedxxxhk
23/4/21 Kernel Methods 6
• Three popular kernel for local smoothing:
• Epanechnikov kerneland tri-cube kernel are compact but tri-cube has two continuous derivatives
• Gaussian kernel is infinite support
One-Dimensional Kernel Smoothers
0
0 ),(xx
DxxK
23/4/21 Kernel Methods 7
• Boundary issue– Badly biased on the boundaries because of the
asymmetry of the kernel in the region.– Linear fitting remove the bias to first order
Local Linear Regression
23/4/21 Kernel Methods 8
Local Linear Regression
• Locally weighted linear regression make a first-order correction
• Separate weighted least squares at each target point x0:
• The estimate:• b(x)T=(1,x); B: Nx2 regression matrix with i-th row
b(x)T;
N
iiii
xxxxxyxxK
1
2000
)(),(])()()[,(min
00
0000 )(ˆ)(ˆ)(ˆ xxxxf
NixxKdiagxW iNN ,,1,),()( 00
23/4/21 Kernel Methods 9
Local Linear Regression
• The weights combine the weighting kernel and the least squares operations——Equivalent Kernel
)( 0xli),( 0 xK
23/4/21 Kernel Methods 10
• The expansion for , using the linearity of local regression and a series expansion of the true function f around x0
• For local regression • The bias depends only on quadratic and
higher-order terms in the expansion of .
Local Linear Regression
)(ˆ0xfE
Rxlxxxf
xlxxxfxlxfxfxlxfE
N
iii
N
iii
N
ii
N
iii
10
20
0
1000
100
100
)()(2
)(
)()()()()()()()(ˆ
)()(ˆ00 xfxfE
f
23/4/21 Kernel Methods 11
Local Polynomial Regression• Fit local polynomial fits of any degree d
d
j
jj
N
i
d
j
jijii
djxx
xxxxf
xxxyxxKj
1 0000
1
2
1000
,,1),(),(
)(ˆ)(ˆ)(ˆ
)()(),(min00
23/4/21 Kernel Methods 12
Local Polynomial Regression
• Bias only have components of degree d+1 and higher.
• The reduction for bias costs the increased variance. dxlxlxf )(,)())(ˆvar( withincreases0
2
02
0
23/4/21 Kernel Methods 13
选择核的宽度• 核 中, 是参数,控制核宽度:
– 对于有紧支集的核, 取其支集区域的半径– 对于高斯核, 取其方差– 对 k- 对近邻域法, 取 k/N
• 窗口宽度导致偏倚 - 方差权衡:– 窗口较窄,方差误差大,均值误差偏倚小– 窗口较宽,方差误差小,均值误差偏倚大
K
23/4/21 Kernel Methods 14
Structured Local Regression
• Structured kernels
– Introduce structure by imposing appropriate restrictions on A
• Structured regression function
– Introduce structure by eliminating some of the higher-order terms
)()(
),( 000,
xxAxxDxxK
T
A
j lk
lkkljjp XXgXgXXXf ),()(),,,( 21
23/4/21 Kernel Methods 15
• Any parametric model can be made local:– Parameter associated with :– Log-likelihood:– Model likelihood local to :
– A varying coefficient model
Local Likelihood & Other Models
Tiii xx )(
N
i
Tii xyll
1),()(
N
i
Tiii xxylxxKxl
1000 ))(,(),())((
)(X0x
iy
)(z
T
N
iii
xxge
zxylzzKzl
),(..
)))(,(,(),())((1
0000
23/4/21 Kernel Methods 16
• Logistic Regression
– Local log-likelihood for the J class model
– Center the local regressions at
Local Likelihood & Other Models
1
1 0
0
)exp(1
)exp()|Pr( J
k
Tkk
Tjj
x
xxXjG
1
1 0000
1 00000
))()()(exp(1log
)()()(),(J
k iT
kk
N
i iT
ggi
xxxx
xxxxxxKii
1
1 00
00
))(ˆexp(1
))(ˆexp()|r(P̂ J
k k
j
x
xxXjG
23/4/21 Kernel Methods 17
• A natural local estimate
• The smooth Parzen estimate
– For Gaussian kernel – The estimate become
Kernel Density Estimation
NxNx
xf iX
)(#)(ˆ 0
0
N
i iX xxKN
xf1 00 ),(
1)(ˆ
)(/),( 00 xxxxK ii
N
i ip
N
i iX
xxN
xxN
xf
1
202/2
1 0
))/(2
1exp(
)2(
1
)(1
)(ˆ
23/4/21 Kernel Methods 18
Kernel Density Estimation• A kernel density estimate for systolic blood
pressure. The density estimate at each point is the average contribution from each of the kernels at that point.
23/4/21 Kernel Methods 19
• Bayes’ theorem:
• The estimate for CHD uses the tri-cube kernel with k-NN bandwidth.
Kernel Density Classification
23/4/21 Kernel Methods 20
Kernel Density Classification
• The population class densities and the posterior probabilities
23/4/21 Kernel Methods 21
Naïve Bayes
• Naïve Bayes model assume that given a class G=j, the features Xk are independent:
– is kernel density estimate, or Gaussian, for coordinate Xk in class j.
– If Xk is categorical, use Histogram.
p
kkjkj XfXf
1
)()(
)(ˆkjk Xf
p
k kk
p
kkJk
kk
J
p
k kJkJ
p
k kk
JJ
XgXf
Xf
Xf
Xf
Xf
Xf
XJG
XG
11
1
1
)()(
)(loglog
)(
)(log
)(
)(log
)|Pr(
)|Pr(logit
23/4/21 Kernel Methods 22
Radial Basis Function & Kernel
• Radial basis function combine the local and flexibility of kernel methods.
– Each basis element is indexed by a location or prototype parameter and a scale parameter
– , a pop choice is the standard Gaussian density function.
M
j jj
j
jj
M
j
xDxKxf
j 11),()(
j j
D
23/4/21 Kernel Methods 23
Radial Basis Function & Kernel
• For simplicity, focus on least squares methods for regression, and use the Gaussian kernel.
• RBF network model:
• Estimate the separately from the .• A undesirable side effect of creating holes——
regions of IRp where none of the kernels has appreciable support.
2
1 120
,,
)()(expmin
1
N
i
M
j j
jiT
jiji
xxy
Mjjj
jj , j
23/4/21 Kernel Methods 24
Gaussian radial basis function with fixed width can leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines.
• Renormalized radial basis functions.
• The expansion in renormalized RBF
Radial Basis Function & Kernel
M
k k
j
jxD
xDxh
1/
/)(
N
i ii
N
i N
i i
ii
xhy
xxK
xxKyxf
1 0
1
1 0
0
)(
),(
),()(
23/4/21 Kernel Methods 25
Mixture Models & EM
• Gaussian Mixture Model
– are mixture proportions,
• EM algorithm for mixtures– Given log-likelihood:
– Suppose we observe Latent Binary
M
m mmm xxf1
),;()(
m 11
M
m m
N
i ii xxyl1
)()1()(log),(21
,,,, 21 nxxx
Bad
21
21
~0,~1
)()1(log)(log),,(0
11
1
xzxz
xxzxLN
zi i
N
zi i
ii
that such z Good
23/4/21 Kernel Methods 26
Mixture Models & EM
• Given ,compute
• In Example
0))(
~max(),,)(,,()(
~ 0 yzxE
N
i iiii
iii
i
ii
xwxw
wxx
xxzE
1
ˆˆ
ˆ0
)()1(log)1()(ˆlog)(
)()ˆ1()(ˆ
)(ˆ),|(
21
21
1
23/4/21 Kernel Methods 27
Mixture Models & EM
• Application of mixtures to the heart disease risk factor study.
23/4/21 Kernel Methods 28
Mixture Models & EM
• Mixture model used for classification of the simulated data