Unsupervised Learning
Using ANNs
2PR , ANN, & ML
If correct I/O association is not provided
A number of samples are imposed
What does an ANN do with samples?
Network topology
Layers and connection
Learning rules used
familiarity, principal component analysis, feature mapping, etc.
Learning paradigm
Competitive vs. cooperative
Update
Batch (off-line) update vs. interactive (on-line) update
Unsupervised Learning
3PR , ANN, & ML
Again, the Recurring Theme of
Finding to which a sample belongs
Belong to everyone
Belong to only one
Belong to a small group of classes
How a sample affects class statistics
Global weighted update
Competitive update
Collaborative update
4PR , ANN, & ML
Issues
Network topology
Does multiple layers help?
Training mechanism?
Separation of functionalities?
But lateral connections are often important
Update rules
Firing of neurons is instantaneous upon
receiving inputs
Cf with k-mean which is batch
5PR , ANN, & ML
Learning Rules
Even though we can use the same networks,
we have to be careful about the learning
rules
Rules that require backpropagation of error
(knowing the correct I/O association) are
not applicable
E.g., Use Hebb rules instead
Reward for correlated pre- and post- firing
6PR , ANN, & ML
Learning Paradigm
Global learning
Nice guy, democratic approach
Cooperative
Try to maintain some kind of local structure
with a radial basis attention function
Competitive learning
Playground bully approach
Mine only
Not only it is mine, stay far away as possible
7PR , ANN, & ML
Simplest case
One linear unit with Hebb’s learning rules
1x 2x 3x 4x
w1 w2 w4w3
y
xw
wxxwtt
yyxw
xwy
ii
i
ii
weights are adjusted to be “similar” to inputs
more frequent input patterns dominate
pattern “familiarity” is learned
Similarity measure
8PR , ANN, & ML
At equilibrium (with a lot of patterns
observed and weight vectors do not change
significantly)
jiij
t
i
j
jjii
xxC
xxwyxw
,,0
0
xxCCw
implies that w is the eigenvector of matrix C
with zero eigenvalue
but this cannot be stable!
Simplest case
xwt
i
ii xwy
Cww
d
d
d w
w
xx
x
x
1
1
1
9PR , ANN, & ML
Train a network with the same pattern over
and over again
weights will go to infinity, dominated by the
eigenvectors with the largest eigenvalues
111
)(
2
...
...
''"
'
u
uwwCCwww
wCCwwCwww
Cwww
n
i
ii
nn
a
a
Simplest case
10PR , ANN, & ML
Oja’s learning rule
similar learning effect as Hebb’s rule
If the weight already confirms to the pattern,
don’t learn
without divergence of weight vector
weight vector converges to the maximal
eigenvector
can be generalized to locate other eigenvectors
(principal component analysis)
)( iii ywxyw
11PR , ANN, & ML
Unsupervised Competitive Learning
Clustering or categorizing data
Only one output active (winner-take-all)
Lateral inhibition
Each output neuron y for one class
1y2y
3y
1x 2x 3x 4x
12PR , ANN, & ML
Simple competitive learning
one-layer network
decision rule: (most) similar one learns
update rule: closer to the input pattern)1||,i allfor (||||
)iallfor (
xxwxw
xwxw
ii
ii
*
*
)(
1)(
**
***
*
ji
u
jji
jjiji
j
u
j
u
j
ji
u
jji
wxw
wwx
xw
xw
13PR , ANN, & ML
Competitive Learning ExampleInput data Initial placement Final placement
14PR , ANN, & ML
More Examples
15PR , ANN, & ML
Vector Quantization
A compression technique to represent input
vectors with a smaller number of “code”
(representative, prototype) vectors
Standard decision rule + learning rule
Learning Vector Quantization
standard decision rule +
classcorrect in )(
classcorrect )(
*
*
*
ji
u
j
ji
u
j
ji wx
wxw
Playground bully, push others away
16PR , ANN, & ML
17PR , ANN, & ML
Feature Mapping
A topology preserving map
Similar inputs map to outputs which are
close-by
1x 2x
18PR , ANN, & ML
Kohonen Map (Self-Organizing Map)
Preserve neighborhood relations
Decision rule
Update rule
initially, the neighborhood is large
gradually the neighborhood narrows down
)i allfor (|||| * xwxw ii
22* 2/||*
*
),(
))(,(
ii rr
ijjij
eii
wxiiw
Learning rate
Neighborhood size
Both drop through iterations
19PR , ANN, & ML
20PR , ANN, & ML
SOM Example
21PR , ANN, & ML
1 2
22PR , ANN, & ML
23PR , ANN, & ML
More Examples
24PR , ANN, & ML
More Examples
25PR , ANN, & ML
Three data clusters in 3D
Projection
From 3D to 2D
SOM
26PR , ANN, & ML
Traveling salesman problem
Mapping from a plane to a 1D ring
Modified Kohenen algorithm
Standard decision rule + update rule:
1st term: pulling weight to a particular city
2nd term: minimize inter-city distance
j
wx
wxu
iii
u
i
uu
i
ju
iu
e
ei
wwwkwxiw
22
22
2/||
2/||
11
)(
))2())(((
27PR , ANN, & ML
Hybrid Learning Schemes
Improved speed
Satisfactory performance
Unsupervised layer: clustering (divide input space in a Voronoi tessellation)
Supervised layer: key-value lookup
unsupervised
supervised
x
y
z
28PR , ANN, & ML
Example 1:
input to hidden layer: competitive learning
hidden to output layer: general delta rule
Example 2 (radial basis function):
input to hidden layer:
)(|||| * iallforxwxw ii
j
u
i
u
iij yOzw )(
j
ux
ux
ijj
ii
e
exg
22
22
2/)(
2/)(
)(