Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical analysis of LVQ type learning rules
Rijksuniversiteit Groningen
Mathematics and Computing Science
httpwwwcsrugnl~biehl
mbiehlrugnl
Michael Biehl Anarta GhoshClausthal University of Technology
Institute of Computing Science
Barbara Hammer
Dynamical Analysis of LVQ type algorithms WSOM 2005
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
Learning Vector Quantization (LVQ)- identification of prototype vectors from labelled example data
- parameterization of distance based classification schemes
example basic LVQ scheme [Kohonen] ldquoLVQ 1rdquo
often heuristically motivated variations of competitive learning
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamical Analysis of LVQ type algorithms WSOM 2005
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
independent components
with variance
ℝN
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
Learning Vector Quantization (LVQ)- identification of prototype vectors from labelled example data
- parameterization of distance based classification schemes
example basic LVQ scheme [Kohonen] ldquoLVQ 1rdquo
often heuristically motivated variations of competitive learning
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamical Analysis of LVQ type algorithms WSOM 2005
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
independent components
with variance
ℝN
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamical Analysis of LVQ type algorithms WSOM 2005
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
independent components
with variance
ℝN
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
separation prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
independent components
with variance
ℝN
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Nξffη QxfηQxfη1N
Ryfη1N
RR
μts
1-μst
μst
1-μst
μts
1-μst
μst
1-μsσ
μσs
1-μsσ
μsσ
22
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww
recursions
Mathematical analysis of the learning dynamics
μσ
μσ
μ1-μs
μs ξByx ξw
random vector ξμ enters only through
its length and the projections
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
completely specified in terms of first and second moments
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N (here prop N-1)
μsσ
μst R Q
learning dynamics is completely described in terms of averages
3 self-averaging property
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
1-μsσ
μσs
sσ1-μsσ
μσs
1-μsσ
μsσ Ryfη
dα
dRRyfη
1N
RR
probability for misclassification of a novel example
ddpddp gε
QQQv
RR2QQ
QQQv
RR2QQpp
22 2
1
2
1
5 learning curve
generalization error εg(α) after training with α N examples
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αinfin
εg = min p+p- RS+
RS-
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- lowest minimum assumed for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
ldquoLearning From Mistakes (LFM)rdquo
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
work in progress outlook
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Summary
bullprototype-based learning
Vector Quantization and Learning Vector Quantization
bulla model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithms
LVQ 1 close to optimal asymptotic generalization
LVQ 21 instability trivial (stationary) classification
+ stopping potentially very good performance
LFM far from optimal generalization behavior
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Perspectives
bullSelf-Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Outlook
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
sσσ
N
1jjsσs R x
jw
completely specified in terms of first and second moments (wo indices μ)
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
2 average over the current example
averaged recursions closed in p σ1σ
σ
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
μsσ
μst R Q
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
N
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for - dependence on learning rate separation initialization-
investigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptions
maximizeα
g
d
d ε
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamical Analysis of LVQ type algorithms WSOM 2005
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw