+ All Categories
Home > Documents > Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density...

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density...

Date post: 18-Jan-2018
Category:
Upload: diana-newton
View: 247 times
Download: 0 times
Share this document with a friend
Description:
Kernel Smoothers – The Goal Estimating a function by using noisy observations, when the parametric model for this function is unknown The resulting function should be smooth The level of “smoothness” should be set by a single parameter
43
Kernel Methods Arie Nakhmani
Transcript
Page 1: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Methods

Arie Nakhmani

Page 2: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers

Page 3: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Smoothers – The Goal Estimating a function by

using noisy observations, when the parametric model for this function is unknown

The resulting function should be smooth

The level of “smoothness” should be set by a single parameter

( ) : pf X

Page 4: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Example

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

N=100 sample points

What is it: “smooth enough” ?

Page 5: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

Example

Y in X , X ~ U 0,2 , ~ (0,1/ 4)s N

N=100 sample points

Page 6: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Exponential Smoother

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

ˆ ˆ( ) (1 ) ( 1) ( )ˆ(1) (1) 0 1

sorted

sorted

Y i Y i Y i

Y Y

0.25

Smaller smoother line, but more delayed

Page 7: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Exponential Smoother

Simple Sequential Single parameter Single value memory Too rough Delayed

ˆ ˆ( ) (1 ) ( 1) ( )sortedY i Y i Y i

Page 8: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Moving Average Smoother

5 :

Y(1) = Y (1)

Y(2) = (Y (1) + Y (2) + Y (3))/3

Y(3) = (Y (1) + Y (2) + Y (3) + Y (4) + Y (5))/5

Y(4) = (Y (2) + Y (3) + Y (4) + Y

sorted

sorted sorted sorted

sorted sorted sorted sorted sorted

sorted sorted sorted sort

For m

(5) + Y (6))/5...

ed sorted

Page 9: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Moving Average Smoother

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

m=11

Larger m smoother, but straightened line

Page 10: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Moving Average Smoother Sequential Single parameter: the window size

m Memory for m values Irregularly smooth What if we have p-dimensional

problem with p>1 ???

Page 11: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

YNearest Neighbors Smoother

0 0ˆ( ) | ( )i i mY x Average y x Neighborhood x

x0

m=160

ˆ( )Y x

Larger m smoother, but biased line

Page 12: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Nearest Neighbors Smoother Not sequential Single parameter: the number of

neighbors m Trivially extended to any number of

dimensions Memory for m values Depends on metrics definition Not smooth enough Biased end-points

Page 13: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Low Pass Filter

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

2nd order Butterworth: 2

22 1( ) 0.0078

3 0.77z zH z

z z

Why do we need kernel smoothers ???

Page 14: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Low Pass Filter

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

The same filter…for log function

Page 15: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Low Pass Filter Smooth Simply extended to any number of

dimensions Effectively, 3 parameters: type,

order, and bandwidth Biased end-points Inappropriate for some functions

(depends on bandwidth)

Page 16: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Average Smoother

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

0ˆ( )Y x

0 0ˆ( ) |i i iY x Average w y x x

x0

Page 17: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Average Smoother Nadaraya-Watson kernel-weighted average:

with the kernel:

for Nearest Neighbor Smoother for Locally Weighted Average

01

0

01

( , )ˆ( )

( , )

N

i iiN

ii

K x x yY x

K x x

0

00

( , )( )

x xK x x D

h x

0( )h x 0 0 [ ]( )m mh x x x

t

Page 18: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Popular Kernels Epanechnikov kernel:

Tri-cube kernel:

Gaussian Kernel:

23(1 ) / 4 1( )0,

t if tD totherwise

3 3(1 ) 1( )0,

t if tD totherwise

21( ) ( ) exp22tD t t

-3 -2 -1 0 1 2 3

0

0.2

0.4

0.6

0.8

1

EpanechnikovTri-cubeGaussian

Page 19: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Non-Symmetric Kernel Kernel example:

Which kernel is that ???

(1 ) , 0, 0 1( )0,

t tD totherwise

1

2 11 2 1

ˆ(1 )

ˆ (1 ) (1 ) ... (1 )

i

ii i i i

Y

Y Y Y Y Y

0(1 ) 1i

i

-3 -2 -1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Page 20: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Average Smoother Single parameter: window width Smooth Trivially extended to any number

of dimensions Memory-based method – little or

no training is required Depends on metrics definition Biased end-points

Page 21: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Linear Regression Kernel-weighted average

minimizes:

Local linear regression minimizes:

0

20 0 0 0

( ) 1

ˆmin ( , ) ( ) ( ) ( )N

i ix i

K x x y x Y x x

0 0

20 0 0

( ), ( ) 1

0 0 0 0

min ( , ) ( ) ( )

ˆ( ) ( ) ( )

N

i i ix x i

K x x y x x x

Y x x x x

Page 22: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Linear Regression Solution:

where:

Other representation:

10 0 0 0

ˆ( ) 1, ( ) ( )T TY x x x x

B W B B W y

1 2

1 1 ... 1...

T

Nx x x

B

0 0( ) ( , )i N Nx diag K x x W

1

N

y

y

y

0 01

ˆ( ) ( )N

i ii

Y x l x y

equivalent kernel01

( ) 1N

iil x

Page 23: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Linear Regression

0 1 2 3 4 5 6-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

X

Y

0ˆ( )Y x

x0

Page 24: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Equivalent Kernels

Page 25: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Polynomial Regression Why stop at local linear fits? Let’s minimize:

0 0

2

0 0 0( ), ( ), 1,..., 1 1

0 0 0 01

min ( , ) ( ) ( )

ˆ( ) ( ) ( )

j

N dj

i i j ix x j d i j

dj

jj

K x x y x x x

Y x x x x

Page 26: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Polynomial Regression

Page 27: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Variance Compromise 2 2 2

0 0ˆ( ) ( ) ( ) ; , 0i iVar Y x l x for y f x Var E

0.2 tri-cube kernel

Page 28: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Conclusions Local linear fits can help bias dramatically at

the boundaries at a modest cost in variance. Local linear fits more reliable for extrapolation.

Local quadratic fits do little at the boundaries for bias, but increase the variance a lot.

Local quadratic fits tend to be most helpful in reducing bias due to curvature in the interior of the domain.

λ controls the tradeoff between bias and variance. Larger λ makes lower variance but higher bias

Page 29: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Local Regression in Radial kernel:

p

00

0( , )

( )x x

K x x Dh x

0

2

0 0 0( ) 1

2 21 2 1 2 1 2

0 0 0

ˆ( ) arg min ( , ) ( ) ( )

( ) 1, , , , , ...

ˆˆ( ) ( ) ( )

NT

i i ix i

T

x K x x y b x x

b X X X X X X X

Y x b x x

Page 30: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Popular Kernels Epanechnikov kernel

Tri-cube kernel

Gaussian kernel

23(1 ) / 4, 1( )0,

t if tD t

otherwise

331 , 1( )0,

t if tD totherwise

21( ) ( ) exp22tD t t

Page 31: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Example

0 1 2 3 4 5 6 70

5

10-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

XY

sin( )~ [0,2 ]

~ (0,0.2)

Z XX U

N

Page 32: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Higher Dimensions The boundary estimation is problematic

Many sample points are needed to reduce the bias

Local regression is less useful for p>3 It’s impossible to maintain localness

(low bias) and sizeable samples (low variance) at the same time

Page 33: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Structured Kernels Non-radial kernel:

Coordinates or directions can be downgraded or omitted by imposing restrictions on A.

Covariance can be used to adapt a metric A. (related to Mahalanobis distance)

Projection-pursuit model

0 0, 0

( ) ( )( , ) ; 0

Tx x x xK x x D

A

A A

Page 34: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Structured Regression Divide into a set (X1,X2,…,Xq) with

q<p and the remainder of the variables collect in vector Z.

Conditionally linear model:

For given Z fit a model by locally weighted least squares:

1 1( ) ( ) ( ) ... ( )q qf X Z Z X Z X

0 0

20 0 1 1 0 0

( ), ( ) 1min ( , ) ( ) ( ) ... ( )

N

i i i qi qz z i

K z z y z x z x z

pX

Page 35: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Density Estimation

-10 -5 0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

X

Den

sity

original distribution

constant window estimation

sample set0

0# Neighbor(x )ˆ ( ) i

Xx

f xN

Mixture of two normal distributions

6000.3

N

Page 36: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Density EstimationSmooth Parzen estimate: 0 0

1

1ˆ ( ) ( , )N

Xi

f x K x xN

Page 37: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Comparison

-10 -5 0 5 10 15 200

0.05

0.1

0.15

0.2

0.25

X

Den

sity

Data SamplesNearest NeighborsEpanechnikovGaussian

Mixture of two normal distributions

Usually Bandwidth selection is more important than kernel function selection

Page 38: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Density Estimation Gaussian kernel density estimation:

where denote the Gaussian density with mean zero and standard deviation .

Generalization to :

1

1ˆ ˆ( ) ( ) * ( )N

X ii

f x x x F xN

LPF

p

20 02 /2

1

1ˆ ( ) exp 0.5 /(2 )

N

X ipi

f x x xN

Page 39: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Density Classification

00

01

ˆˆ ( )Pr | ; 1,...,

ˆˆ ( )

j jJ

k kk

f xG j X x j J

f x

For a J class problem:

( )

Pr( | )jf x

X x G j

Page 40: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Radial Basis Functions Function f(x) is represented as expansion in

basis functions:

Radial basis functions expansion (RBF):

where the sum-of-squares is minimized with respect to all the parameters (for Gaussian kernel):

1( ) ( )Mj jjf x h x

1 1

( ) ,j

M M jj j

jj j

xf x K x D

1

2

0 1 2{ , , } 1

( ) ( )min exp

Mj j j

TN M i j i ji jj

i j

x xy

Page 41: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Radial Basis Functions When assuming constant j= : the problem of

“holes”

The solution - Renormalized RBF: 1

/( )

/

jj M

kk

D xh x

D x

Page 42: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Additional Applications Local likelihood Mixture models for density

estimation and classification Mean-shift

Page 43: Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Conclusions Memory-based methods: the model is the

entire training data set Infeasible for many real-time applications Provides good smoothing result for

arbitrary sampled function Appropriate for interpolation and

extrapolation When the model is known, better use

another fitting methods


Recommended