Antti Sorjamaa
Input and Variable Selection for Local Models
Antti Sorjamaa 2/31
OutlineOutline
l Basic Conceptsl Input selectionl Models
– k-NN, Lazy Learningl Variable Selection
– Leave-one-out, Bootstrapsl Results
Antti Sorjamaa 3/31
Selection Selection –– The Word of TodayThe Word of Today
l Inputs– Input selection method (Wrapper or Filter)
l Modell Parameters
– Validation method– Bounds
l Local or Globall Data sets
– Learning, validation, test
Antti Sorjamaa 4/31
l Notations:
to minimize
Selection PrincipleSelection Principle
( )( )∑=
∞→
−=
N
n
nnNgen N
y1
2,glim)(E θθ
x)(E θgen
( )validationtraining
nn
td
n
xgyRyR
,E
,ˆ,
θ=∈∈x
l Generalization Error
Antti Sorjamaa 5/31
Input SelectionInput Selection
l Exhaustive method– “Brute force”– All 2d input combinations explored
l Forward or Backward– Only d(d-1) input sets evaluated– Local minima problems à Suboptimal
l Forward-Backward– “More Optimal” ß More input sets estimated– Time consumption unknown beforehand
Antti Sorjamaa 6/31
Possible Selected InputsAction 1 2 3 4 5Initial x x x
1 x x2 x x
Possible Selected InputsAction 1 2 3 4 5Initial x x x
1 x x2 x x3 x x x x4 x x x x5 x x
Input Selection (2)Input Selection (2)
l Input selection with Forward-Backward method– Initialization
Possible Selected InputsAction 1 2 3 4 5Initial x x x
Possible Selected InputsAction 1 2 3 4 5Initial x x x
1 x x
Antti Sorjamaa 7/31
kk--NNNN
l Fast and reliablel Can be used as a part or as a whole
approximatorl Can be used with many different methodsl Only inputs and k need to be determined
beforehand
Antti Sorjamaa 8/31
1-NN3-NN
kk--NNNN
For Classification:For Classification: Class 1Class 1 Class 2Class 2
??For Regression:For Regression:
k
yy
k
jj
i
∑== 1
)P(
ˆ
Antti Sorjamaa 9/31
Lazy LearningLazy Learning
l Local, linear modell Laziness
– ”Do nothing until query”– No learning mandatory
l Compared to k-NN (local, constant model)– More time consuming than k-NN– Almost as diversified
l Locality can be ”globalized” incrementally
Antti Sorjamaa 10/31
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Local, linear modelLocal, linear model
Antti Sorjamaa 11/31
0.78 0.8 0.82 0.84 0.86 0.88
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
Local, linear modelLocal, linear model
Antti Sorjamaa 12/31
iii ey += )f(x
∑=
′−N
i
qiii h
y1
2 )}),d(
K(){(xx
βx
FormulaFormula
Antti Sorjamaa 13/31
∑=
′−N
i
qiii h
y1
2 )}),d(
K(){(xx
βx
yXPβ
XXP′=
= −
ˆ)'( 1
βx ˆˆ qqy ′=
Simplified version: K à KNN
FormulaFormula
Antti Sorjamaa 14/31
0.78 0.8 0.82 0.84 0.86 0.88
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
““Do nothing until query”Do nothing until query”
Antti Sorjamaa 15/31
““Do nothing until query”Do nothing until query”
New input needs an output approximationl Validate optimal inputs
– Search nearest neighbors– Validate optimal neighborhood size
l Build linear model
l Calculate the needed estimate
Antti Sorjamaa 16/31
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
1st parameter value
2nd
para
met
er v
alue k local or global!
Example: Example: yy((tt)=LL()=LL(yy((tt--1),1),yy((tt--2))2))
Antti Sorjamaa 17/31
Lazy Lazy LearningLearning
l Locality can be ”globalized” incrementally– Global k instead of Local k– Globally selected inputs instead of Local
l More Globalizationà Less Lazinessl Best amount of Globalization should be
determined for each case– Intensive validation and testing– Different attached methods
Antti Sorjamaa 18/31
LeaveLeave--OneOne--Out (LOO)Out (LOO)
DATA1
Validation set Learning set
A model is builtError
Procedure repeated N times ∑=
−=N
iiigen yy
N 1
2)ˆ(1E
Antti Sorjamaa 19/31
)1()1()(ˆ)1(ˆ)(ˆ)1()1()1(
)1()1()1()1()()1(1)()1()1()()()1(
+++=+
+′−+=+
++=+++′+
+′+−=+
kekkk
kkkyke
kkkkkk
kkkkkk
γ
γ
ββ
βx
xPxPx
PxxPPP
Recursive formula for LOORecursive formula for LOO
Antti Sorjamaa 20/31
0.78 0.8 0.82 0.84 0.86 0.88
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
)(ˆ kβ
Recursive formula for LOORecursive formula for LOO
Antti Sorjamaa 21/31
0.78 0.8 0.82 0.84 0.86 0.88
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
)1( +kx
)1(ˆ +kβ
Recursive formula for LOORecursive formula for LOO
Antti Sorjamaa 22/31
)(ke
kk optimal
Recursive formula for LOORecursive formula for LOO
Antti Sorjamaa 23/31
World Sample≠
Sample=
New WorldNew
Sample
Bootstrap ResamplingBootstrap Resampling
Antti Sorjamaa 24/31
Bootstrap Bootstrap ResamplingResampling
New World
New sample
Estimate: ( )∑=
−=B
b
bnewnew
bsamnewB 1
,, )(E)(E1)ism(mopti θθθ
)optimism()(E)(E ,, θθθ += samsamgensam
)(E-)(Eˆ)optimism( θθθ sam,samsam,gen=Definition:
)ism(mopti)(E)(E ,, θθθ += samsamgensam
Antti Sorjamaa 25/31
Bootstrap 632Bootstrap 632
+ 0.632 is derived from probability of single data point to be selected to bootstrap set
+ Unbiased and faster to evaluate
Bootstrap:
Bootstrap 632:
( )∑=
−=B
b
bnewnew
bsamnewB 1
,, )(E)(E1)ism(mopti θθθ
( )∑=
=B
b
bnewnewB 1
,632 )(E1)(ismmopti θθ
)(ismmopti632.0E)632.01(E 632, θ+−= samsamgen
newsamplenew −=
Antti Sorjamaa 26/31
The MethodThe Method
l Input selection with brute force– All 2d input combinations explored
l Using k-NN as approximatorl k selected with Leave-one-out, Bootstrap and
Bootstrap 632– Best k selected with each method
l Best input combination selected with each method
Antti Sorjamaa 27/31
ResultsResults
Selected Inputs k Êgen Test errorLOO t - {1, 2, 3, 5, 7, 8} 15 0.9219 1.1650
Bootstrap t - {1, 2, 4, 5, 7, 8} 1 0.6054 1.8458Bootstrap 632 t - {1, 2, 3, 5, 7, 8} 16 0.9333 1.1625
l Darwin Sea Pressure Data – 1400 values– 1000 values for training and 400 for testing
Antti Sorjamaa 28/31
The MethodThe Method22
l Input selection– For k-NN, all 2d input possibilities explored– For LL, Backward Selection and continuous
l k selected with Leave-one-out– Best k selected with each method combination
l Best input combination selected with each method combination
l Testing k-NN selected inputs with LL
Antti Sorjamaa 29/31
ResultsResults22
l Santa Fe Data – 10 000 values– 1000 values for training and 9000 for testing
Method Calculation time
Test Prediction 40 steps
k LOO error Minutes MSE MSELL 56 42.32 2.58 42.0746 1765.6LL pruned 59 19.42 13.95 20.6037 148.37k -NN 3 57.71 33.78 53.5387 1252.1k -NN + LL 15 33.57 0.20 31.4548 1770.1
Learning
Antti Sorjamaa 30/31
ConclusionsConclusions
l Leave-one-out is fast and good method to select inputs
l Bootstraps can select more optimal number of neighbours for k-NN
l Inputs selected with k-NN are not as good to use with LL than the ones selected with LLàk-NN is not good filter for LL
Antti Sorjamaa 31/31
Questions?Questions?
l A. Sorjamaa, A. Lendasse, and M. Verleysen, “Pruned Lazy Learning Models for Time Series Prediction,” pp. 509–514, ESANN 2005.
l A. Sorjamaa, N. Reyhani, and A. Lendasse, “Input and Structure Selection for k-NN Approximator,” in Lecture Notes in Computer Science, vol. 3512, pp. 985–991, IWANN 2005.
l Chris Atkeson, A. Moore and S. Schaal. Locally weighted learning,AI Review, 11:11-73, April 1997
Publications: