+ All Categories
Home > Documents > Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Date post: 19-Jan-2016
Category:
Upload: clarence-hensley
View: 231 times
Download: 0 times
Share this document with a friend
Popular Tags:
28
Kernel Methods Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Transcript
Page 1: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Kernel MethodsKernel Methods

Dept. Computer Science & Engineering,

Shanghai Jiao Tong University

Page 2: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 2

Outline

• One-Dimensional Kernel Smoothers• Local Regression• Local Likelihood• Kernel Density estimation• Naive Bayes• Radial Basis Functions• Mixture Models and EM

Page 3: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 3

One-Dimensional Kernel Smoothers

• k-NN:

• 30-NN curve is bumpy, since is discontinuous in x.

• The average changes in a discrete way, leading to a discontinuous .

))(|()(ˆ xNxyAvexf kii

)(ˆ xf

)(ˆ xf

Page 4: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 4

• Nadaraya-Watson Kernel weighted average:

• Epanechnikov quadratic kernel:

One-Dimensional Kernel Smoothers

N

i i

N

i ii

xxK

yxxKxf

1 0

1 00

),(

),()(ˆ

0

0 ),(xx

DxxK

Page 5: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 5

One-Dimensional Kernel Smoothers

• More general kernel:

– : width function that determines the width

of the neighborhood at x0.

– For quadratic kernel– For k-NN kernel

Variance constant– The Epanechnikov kernel has compact support

)(),(

0

00 xh

xxDxxK

)( 0xh

constantBias ,)( 0 xh|,|)( ][00 kk

replacedxxxhk

Page 6: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 6

• Three popular kernel for local smoothing:

• Epanechnikov kerneland tri-cube kernel are compact but tri-cube has two continuous derivatives

• Gaussian kernel is infinite support

One-Dimensional Kernel Smoothers

0

0 ),(xx

DxxK

Page 7: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 7

• Boundary issue– Badly biased on the boundaries because of the

asymmetry of the kernel in the region.– Linear fitting remove the bias to first order

Local Linear Regression

Page 8: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 8

Local Linear Regression

• Locally weighted linear regression make a first-order correction

• Separate weighted least squares at each target point x0:

• The estimate:• b(x)T=(1,x); B: Nx2 regression matrix with i-th row

b(x)T;

N

iiii

xxxxxyxxK

1

2000

)(),(])()()[,(min

00

0000 )(ˆ)(ˆ)(ˆ xxxxf

NixxKdiagxW iNN ,,1,),()( 00

Page 9: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 9

Local Linear Regression

• The weights combine the weighting kernel and the least squares operations——Equivalent Kernel

)( 0xli),( 0 xK

Page 10: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 10

• The expansion for , using the linearity of local regression and a series expansion of the true function f around x0

• For local regression • The bias depends only on quadratic and

higher-order terms in the expansion of .

Local Linear Regression

)(ˆ0xfE

Rxlxxxf

xlxxxfxlxfxfxlxfE

N

iii

N

iii

N

ii

N

iii

10

20

0

1000

100

100

)()(2

)(

)()()()()()()()(ˆ

)()(ˆ00 xfxfE

f

Page 11: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 11

Local Polynomial Regression• Fit local polynomial fits of any degree d

d

j

jj

N

i

d

j

jijii

djxx

xxxxf

xxxyxxKj

1 0000

1

2

1000

,,1),(),(

)(ˆ)(ˆ)(ˆ

)()(),(min00

Page 12: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 12

Local Polynomial Regression

• Bias only have components of degree d+1 and higher.

• The reduction for bias costs the increased variance. dxlxlxf )(,)())(ˆvar( withincreases0

2

02

0

Page 13: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 13

选择核的宽度• 核 中, 是参数,控制核宽度:

– 对于有紧支集的核, 取其支集区域的半径– 对于高斯核, 取其方差– 对 k- 对近邻域法, 取 k/N

• 窗口宽度导致偏倚 - 方差权衡:– 窗口较窄,方差误差大,均值误差偏倚小– 窗口较宽,方差误差小,均值误差偏倚大

K

Page 14: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 14

Structured Local Regression

• Structured kernels

– Introduce structure by imposing appropriate restrictions on A

• Structured regression function

– Introduce structure by eliminating some of the higher-order terms

)()(

),( 000,

xxAxxDxxK

T

A

j lk

lkkljjp XXgXgXXXf ),()(),,,( 21

Page 15: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 15

• Any parametric model can be made local:– Parameter associated with :– Log-likelihood:– Model likelihood local to :

– A varying coefficient model

Local Likelihood & Other Models

Tiii xx )(

N

i

Tii xyll

1),()(

N

i

Tiii xxylxxKxl

1000 ))(,(),())((

)(X0x

iy

)(z

T

N

iii

xxge

zxylzzKzl

),(..

)))(,(,(),())((1

0000

Page 16: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 16

• Logistic Regression

– Local log-likelihood for the J class model

– Center the local regressions at

Local Likelihood & Other Models

1

1 0

0

)exp(1

)exp()|Pr( J

k

Tkk

Tjj

x

xxXjG

1

1 0000

1 00000

))()()(exp(1log

)()()(),(J

k iT

kk

N

i iT

ggi

xxxx

xxxxxxKii

1

1 00

00

))(ˆexp(1

))(ˆexp()|r(P̂ J

k k

j

x

xxXjG

Page 17: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 17

• A natural local estimate

• The smooth Parzen estimate

– For Gaussian kernel – The estimate become

Kernel Density Estimation

NxNx

xf iX

)(#)(ˆ 0

0

N

i iX xxKN

xf1 00 ),(

1)(ˆ

)(/),( 00 xxxxK ii

N

i ip

N

i iX

xxN

xxN

xf

1

202/2

1 0

))/(2

1exp(

)2(

1

)(1

)(ˆ

Page 18: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 18

Kernel Density Estimation• A kernel density estimate for systolic blood

pressure. The density estimate at each point is the average contribution from each of the kernels at that point.

Page 19: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 19

• Bayes’ theorem:

• The estimate for CHD uses the tri-cube kernel with k-NN bandwidth.

Kernel Density Classification

Page 20: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 20

Kernel Density Classification

• The population class densities and the posterior probabilities

Page 21: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 21

Naïve Bayes

• Naïve Bayes model assume that given a class G=j, the features Xk are independent:

– is kernel density estimate, or Gaussian, for coordinate Xk in class j.

– If Xk is categorical, use Histogram.

p

kkjkj XfXf

1

)()(

)(ˆkjk Xf

p

k kk

p

kkJk

kk

J

p

k kJkJ

p

k kk

JJ

XgXf

Xf

Xf

Xf

Xf

Xf

XJG

XG

11

1

1

)()(

)(loglog

)(

)(log

)(

)(log

)|Pr(

)|Pr(logit

Page 22: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 22

Radial Basis Function & Kernel

• Radial basis function combine the local and flexibility of kernel methods.

– Each basis element is indexed by a location or prototype parameter and a scale parameter

– , a pop choice is the standard Gaussian density function.

M

j jj

j

jj

M

j

xDxKxf

j 11),()(

j j

D

Page 23: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 23

Radial Basis Function & Kernel

• For simplicity, focus on least squares methods for regression, and use the Gaussian kernel.

• RBF network model:

• Estimate the separately from the .• A undesirable side effect of creating holes——

regions of IRp where none of the kernels has appreciable support.

2

1 120

,,

)()(expmin

1

N

i

M

j j

jiT

jiji

xxy

Mjjj

jj , j

Page 24: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 24

Gaussian radial basis function with fixed width can leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines.

• Renormalized radial basis functions.

• The expansion in renormalized RBF

Radial Basis Function & Kernel

M

k k

j

jxD

xDxh

1/

/)(

N

i ii

N

i N

i i

ii

xhy

xxK

xxKyxf

1 0

1

1 0

0

)(

),(

),()(

Page 25: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 25

Mixture Models & EM

• Gaussian Mixture Model

– are mixture proportions,

• EM algorithm for mixtures– Given log-likelihood:

– Suppose we observe Latent Binary

M

m mmm xxf1

),;()(

m 11

M

m m

N

i ii xxyl1

)()1()(log),(21

,,,, 21 nxxx

Bad

21

21

~0,~1

)()1(log)(log),,(0

11

1

xzxz

xxzxLN

zi i

N

zi i

ii

that such z Good

Page 26: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 26

Mixture Models & EM

• Given ,compute

• In Example

0))(

~max(),,)(,,()(

~ 0 yzxE

N

i iiii

iii

i

ii

xwxw

wxx

xxzE

1

ˆˆ

ˆ0

)()1(log)1()(ˆlog)(

)()ˆ1()(ˆ

)(ˆ),|(

21

21

1

Page 27: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 27

Mixture Models & EM

• Application of mixtures to the heart disease risk factor study.

Page 28: Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

23/4/21 Kernel Methods 28

Mixture Models & EM

• Mixture model used for classification of the simulated data


Recommended