Input and Variable Selection for Local Models · Antti Sorjamaa5/31 Input Selection lExhaustive...

Antti Sorjamaa

Input and Variable Selection for Local Models

Antti Sorjamaa 2/31

OutlineOutline

l Basic Conceptsl Input selectionl Models

– k-NN, Lazy Learningl Variable Selection

– Leave-one-out, Bootstrapsl Results

Antti Sorjamaa 3/31

Selection Selection –– The Word of TodayThe Word of Today

l Inputs– Input selection method (Wrapper or Filter)

l Modell Parameters

– Validation method– Bounds

l Local or Globall Data sets

– Learning, validation, test

Antti Sorjamaa 4/31

l Notations:

to minimize

Selection PrincipleSelection Principle

( )( )∑=

∞→

−=

N

n

nnNgen N

y1

2,glim)(E θθ

x)(E θgen

( )validationtraining

nn

td

n

xgyRyR

,E

,ˆ,

θ=∈∈x

l Generalization Error

Antti Sorjamaa 5/31

Input SelectionInput Selection

l Exhaustive method– “Brute force”– All 2d input combinations explored

l Forward or Backward– Only d(d-1) input sets evaluated– Local minima problems à Suboptimal

l Forward-Backward– “More Optimal” ß More input sets estimated– Time consumption unknown beforehand

Antti Sorjamaa 6/31

Possible Selected InputsAction 1 2 3 4 5Initial x x x

1 x x2 x x


1 x x2 x x3 x x x x4 x x x x5 x x

Input Selection (2)Input Selection (2)

l Input selection with Forward-Backward method– Initialization



1 x x

Antti Sorjamaa 7/31

kk--NNNN

l Fast and reliablel Can be used as a part or as a whole

approximatorl Can be used with many different methodsl Only inputs and k need to be determined

beforehand

Antti Sorjamaa 8/31

1-NN3-NN

kk--NNNN

For Classification:For Classification: Class 1Class 1 Class 2Class 2

??For Regression:For Regression:

k

yy

k

jj

i

∑== 1

)P(

ˆ

Antti Sorjamaa 9/31

Lazy LearningLazy Learning

l Local, linear modell Laziness

– ”Do nothing until query”– No learning mandatory

l Compared to k-NN (local, constant model)– More time consuming than k-NN– Almost as diversified

l Locality can be ”globalized” incrementally

Antti Sorjamaa 10/31

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Local, linear modelLocal, linear model


0.78 0.8 0.82 0.84 0.86 0.88

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

Local, linear modelLocal, linear model


iii ey += )f(x

∑=

′−N

i

qiii h

y1

2 )}),d(

K(){(xx

βx

FormulaFormula


∑=

′−N

i

qiii h

y1

2 )}),d(

K(){(xx

βx

yXPβ

XXP′=

= −

ˆ)'( 1

βx ˆˆ qqy ′=

Simplified version: K à KNN

FormulaFormula


0.78 0.8 0.82 0.84 0.86 0.88

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

““Do nothing until query”Do nothing until query”


““Do nothing until query”Do nothing until query”

New input needs an output approximationl Validate optimal inputs

– Search nearest neighbors– Validate optimal neighborhood size

l Build linear model

l Calculate the needed estimate


-6 -4 -2 0 2 4 6-6

-4

-2

0

2

4

6

1st parameter value

2nd

para

met

er v

alue k local or global!

Example: Example: yy((tt)=LL()=LL(yy((tt--1),1),yy((tt--2))2))


Lazy Lazy LearningLearning

l Locality can be ”globalized” incrementally– Global k instead of Local k– Globally selected inputs instead of Local

l More Globalizationà Less Lazinessl Best amount of Globalization should be

determined for each case– Intensive validation and testing– Different attached methods


LeaveLeave--OneOne--Out (LOO)Out (LOO)

DATA1

Validation set Learning set

A model is builtError

Procedure repeated N times ∑=

−=N

iiigen yy

N 1

2)ˆ(1E


)1()1()(ˆ)1(ˆ)(ˆ)1()1()1(

)1()1()1()1()()1(1)()1()1()()()1(

+++=+

+′−+=+

++=+++′+

+′+−=+

kekkk

kkkyke

kkkkkk

kkkkkk

γ

γ

ββ

βx

xPxPx

PxxPPP

Recursive formula for LOORecursive formula for LOO


0.78 0.8 0.82 0.84 0.86 0.88

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

)(ˆ kβ



0.78 0.8 0.82 0.84 0.86 0.88

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

)1( +kx

)1(ˆ +kβ



)(ke

kk optimal



World Sample≠

Sample=

New WorldNew

Sample

Bootstrap ResamplingBootstrap Resampling


Bootstrap Bootstrap ResamplingResampling

New World

New sample

Estimate: ( )∑=

−=B

b

bnewnew

bsamnewB 1

,, )(E)(E1)ism(mopti θθθ

)optimism()(E)(E ,, θθθ += samsamgensam

)(E-)(Eˆ)optimism( θθθ sam,samsam,gen=Definition:

)ism(mopti)(E)(E ,, θθθ += samsamgensam


Bootstrap 632Bootstrap 632

+ 0.632 is derived from probability of single data point to be selected to bootstrap set

+ Unbiased and faster to evaluate

Bootstrap:

Bootstrap 632:

( )∑=

−=B

b

bnewnew

bsamnewB 1

,, )(E)(E1)ism(mopti θθθ

( )∑=

=B

b

bnewnewB 1

,632 )(E1)(ismmopti θθ

)(ismmopti632.0E)632.01(E 632, θ+−= samsamgen

newsamplenew −=


The MethodThe Method

l Input selection with brute force– All 2d input combinations explored

l Using k-NN as approximatorl k selected with Leave-one-out, Bootstrap and

Bootstrap 632– Best k selected with each method

l Best input combination selected with each method


ResultsResults

Selected Inputs k Êgen Test errorLOO t - {1, 2, 3, 5, 7, 8} 15 0.9219 1.1650

Bootstrap t - {1, 2, 4, 5, 7, 8} 1 0.6054 1.8458Bootstrap 632 t - {1, 2, 3, 5, 7, 8} 16 0.9333 1.1625

l Darwin Sea Pressure Data – 1400 values– 1000 values for training and 400 for testing


The MethodThe Method22

l Input selection– For k-NN, all 2d input possibilities explored– For LL, Backward Selection and continuous

l k selected with Leave-one-out– Best k selected with each method combination

l Best input combination selected with each method combination

l Testing k-NN selected inputs with LL


ResultsResults22

l Santa Fe Data – 10 000 values– 1000 values for training and 9000 for testing

Method Calculation time

Test Prediction 40 steps

k LOO error Minutes MSE MSELL 56 42.32 2.58 42.0746 1765.6LL pruned 59 19.42 13.95 20.6037 148.37k -NN 3 57.71 33.78 53.5387 1252.1k -NN + LL 15 33.57 0.20 31.4548 1770.1

Learning


ConclusionsConclusions

l Leave-one-out is fast and good method to select inputs

l Bootstraps can select more optimal number of neighbours for k-NN

l Inputs selected with k-NN are not as good to use with LL than the ones selected with LLàk-NN is not good filter for LL


Questions?Questions?

l A. Sorjamaa, A. Lendasse, and M. Verleysen, “Pruned Lazy Learning Models for Time Series Prediction,” pp. 509–514, ESANN 2005.

l A. Sorjamaa, N. Reyhani, and A. Lendasse, “Input and Structure Selection for k-NN Approximator,” in Lecture Notes in Computer Science, vol. 3512, pp. 985–991, IWANN 2005.

l Chris Atkeson, A. Moore and S. Schaal. Locally weighted learning,AI Review, 11:11-73, April 1997

Publications:

Date post:	07-Jun-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Input and Variable Selection for Local Models · Antti Sorjamaa5/31 Input Selection lExhaustive...

Documents