Unsupervised Learning - UBCcourses.ece.ubc.ca/592/PDFfiles/Unsupervised_Learning.pdfIn unsupervised...

1

Unsupervised Learning

Learning without a teacher

2

November 2006 EECE 592 -Unsupervised Learning

Introduction

Learning without a “teacher”

Clustering

– Input data is grouped or clustered

Supervised learning employs a teacher, that represents a target output for someinput pattern. The teacher is used to generated a feedback signal which isapplied to correct the classification.In unsupervised learning algorithms, there exists no such teacher. Instead, theintrinsic nature of such approaches is to perform some sort of clustering of theinput data. Many forms of clustering exist.

3


A Simple Clustering Algorithm

Example– k-Means clustering

k clusters formed A centroid is calculated for each cluster Sample data is added to cluster with closest

centroid Centroid is re-computed as average of samples in

cluster

k-Means ClusteringOne of the simplest clustering algorithms is k-Means. So called because itgenerates k clusters. The algorithm categorizes input vectors into k clustersbased upon squared distances. k is an arbitrary number.

Initially, k samples are randomly chosen from the input data as initialvalues of the k centroids. The goal is then to add remaining samplesand form clusters.Each input pattern is added to the “nearest” cluster. This is the one withthe nearest “centroid”. Distance being computed in the Euclidean sense.For each cluster, a “centroid” is calculated as the average or mean ofthe points in that cluster.The centroid is then re-computed.If no more points remain to be added, then existing samples withintheir clusters are re-checked to see if they are still in the correct clusterand moved if necessary.When no more changes take place the algorithm ends.

4


Topology Preserving Maps

Basic Idea– A grid of cells– Each cell recognizes a different pattern– Cells close to each other in the grid,

recognize similar patterns. Proposed by Teuvo Kohonen

– Coined the term self-organising feature maps(SOFM)

Consistent with structures seen in the brain.

Kohonen put forward the idea of topology preserving maps. The generalapproach here is for similar input patterns to be mapped to cells which arephysically close to each other in a map or grid of such cells.This notion is interesting as it is consistent with structures seen in themammalian brain where visual, audio and tactile inputs are mapped into anumber of sheets or folded planes of neurons. A good example of this is thestriate or visual cortex.

5


Topology Preserving Maps in the Brain

Hubel & Weisel (circa 1960s?)– Discovered that the visual cortex of mammals

is highly organized. I.e. exhibits preservation of topology.

A layer of line detectorneurons from thestriate cortex

Orientation preference of neuron

Hubel and Weisel studied the visual cortex of mammals. Their findingsindicated a highly organized structure. The visual cortex was found to haveneurons that responded when the visual stimuli consisted of straight lines oredges. Different neurons in this region were found to respond to different edgeorientations. However, of note was the finding that those neurons thatresponded to very similar orientations, were physically very close together.The visual cortex is also often referred to as the striate (meaning layered)cortex. These layers are a direct result of organized nature of the neurons in thevisual cortex.Some information on this can be found athttp://www.eri.harvard.edu/faculty/peli/lab/slehar/webstuff/pcave/hubel.html(link no longer works, but this researchers website is still active).

6


Kohonens SOFM Algorithm

Kohonen formulated an algorithm thatproduced topology preserving maps(SOFM).

Consider a grid of neurons uh,iwhere h,i represent the location ofneuron u.

Goal is to find the winning neuronfor a given input pattern.

….

uh,i

inputs

SOFM

Kohonen’s approach is a classic example of unsupervised learning. It is able torepresent the data in a meaningful way without the need for any target value.The approach is defined in terms of a physical grid of neurons where thelocation of each neuron is fixed within the grid. What is not fixed are the inputpattern the neurons exhibit a preference for. The algorithm is based on theability to find a “winning” neuron.

7


What is a winning neuron?

Winning neuron:

– Technically, one that generates greatest(weighted sum) response to a given stimulus.

– Practically, one whose weights most closelymatch the input pattern.

In the algorithm for topology preserving maps, the winning neuron is definedas the one whose weights most closely match those of the input pattern.Traditionally, in neural net terms this would mean the neuron with the greatestweighted sum (linear activation function). However, this only works if theinput vectors are all normalized to equal length. (viz: weighted sum equalsscalar product).Practically however, computation of the winning neuron is based uponcomputing the Euclidean distance between the pattern and the weights of theneuron. The neuron with the smallest difference (I.e. closest to the patternpresented) wins.

8


SOFM Algorithm cont

Winning or closest neuron located.

Winning cell and its neighbours moved closer tocurrent input pattern.

The idea is that the winning neuron and its neighbours are then moved slightlycloser to the input vector.Since the winning neuron has a fixed location within a grid of neurons, aneighbourhood of neurons can be defined for it.

9


Neighbourhood

Nature exhibits a high degree of lateralconnectivity.

Typically, influence of a neuron follows aMexican hat function.

The “Mexican hat”Biological systems exhibit a large degree of lateral connectivity and thestrength of excitation or inhibition generated by competing cells is often foundto influence neighbouring cells according to a difference of Gaussians or“Mexican hat” distribution.Lateral connectivity would be quite complex to model as well as being highlycomputationally intensive. Instead, Kohonen achieves the same effect byconsidering a neighbourhood of cells.

10


Neighbourhood cont.

Kohonen defined a much simplerneighbourhood function.

Only winning cell and its’ neighboursupdated

Neighbours of winning cell

Winning cell

11


Neighbourhood cont.

Activity pattern may look like:

An update step will move weights of winning celland its neighbours closer to input pattern.

Only the weights of the winning cell uh,i and those in its neighbourhood areupdated. Weights of cells not in the neighbourhood remain intact.Weights of cells in the neighbourhood are moved closer to the input vector.

12


Algorithm cont.

How to stop the algorithm?– The update increment is gradually diminished

each iteration.– The number of iterations is known and fixed.

Theoretically, the Mexican-Hat distribution dictates that the weights of allneighbouring cells are updated by an amount which decreases gradually withthe distance from the closest cell.Practically it’s often easier to select a neighbourhood based on the next one ortwo direct neighbouring cells and to gradually diminish the increment made toweights as the number of iterations reaches a pre-defined number.

13


Algorithm cont.

Assume T iterations,– Then let the step size α(t), at the tth iteration,

decrease as t approaches T.

– Then the weight update is defined by

Ek is the kth input pattern.Note that ideally, an iteration is defined as an epoch. I.e. one presentation ofthe entire training set.

14


Example of SOFM Training

Consider a 3x3 grid of neurons as follows Each neuron has two inputs Weights have current values as shown:

(-3, +2) (+5, +4) (+7, -6)

(-5, -5) (+1, 0) (-1, +3)

(+4, -2) (-3, -3) (+5, -2)

h,i 1 2

1

2

3

3

15



Next training sample is (6,-5)

(-3, +2) (+5, +4) (+7, -6)

(-5, -5) (+1, 0) (-1, +3)

(+4, -2) (-3, -3) (+5, -2)

winner

neighbours

h,i 1 2

1

2

3

3

If we assume that at t=501 α(t)=0.5, then the updates are computed as follows:

W1,2 = ( 5, 4) + 0.5[(6,-5) - ( 5, 4)] = (5.5, -0.5)

W1,3 = ( 7,-6) + 0.5[(6,-5) - ( 7, -6)] = (6.5, -5.5)

W2,2 = ( 1, 0) + 0.5[(6,-5) - ( 1, 0)] = (3.5, -2.5)

W2,3 = (-1, 3) + 0.5[(6,-5) - (-1, 3)] = (2.5, -1)

16



After update, winning neuron and its neighbourshave moved closer to (6,-5)

All other weights remain unchanged

(-3, +2) (+5.5, -0.5) (+6.5, -5.5)

(-5, -5) (+3.5, -2.5) (+2.5, -1)

(+4, -2) (-3, -3) (+5, -2)

h,i 1 2

1

2

3

3

17


2D Examples of SOFM

Two dimensional inputs mapped to a twodimensional grid

The four figures show stages of learning of a Kohonen map with a 10 x 10 gridof cells with two inputs each. The inputs represent 2-D coordinates.

18


Visualization of 2D SOFM Training

Input examples are random coordinates takenfrom a square distribution area.

Blue dots are the neurons Location of neuron in grid

is its current preferredpattern.

The net develops so that each cell will match one of the inputs and is such thatcells matching coordinates 1,1 are close to those matching 1,2 2,1 2,2.

19


1D Feature Maps

Space filling curves– Linear topology– Two dimensional inputs mapped to a one

dimensional map of cells.

In this case the inputs are coordinates taken from locations within a triangle.In fact, the feature maps can accommodate any arbitrary shape.

20


2D Maps - Examples cont.

From Hertz, Krogh & Palmer, "Introduction to the Theory of Neural Computation". 1993

In these examples, 2D SOFM are presented with coordinates from a circle,triangle and an irregular shape. It can be seen that the SOFM will arrange itselfto reflect the topology characteristic of the input distribution.Note that in all these cases, the density of the distributions is regular. Shouldthe distribution density vary, then of course the resulting SOFM wouldpreserve this topology.

21


Pattern Recognition using the SOFM

A SOFM is trained witha collection of patterns.

The location of thesepatterns in the topologyis interrogated.

The SOFM can beused to categorise newpatterns.

The diagram represents a Kohonen net trained with a simple set of shapes.This problem is an example of mapping high dimensional inputs onto a twodimensional map.The input representation used to describe each pattern in this case, has 7dimensions. These factors measure size, symmetry and closure.

22


Kohonen’s Phonetic Typewriter

Kohonen demonstrated how SOFMs can beused for speech recognition.– A SOFM is trained to map a 15 dimensional

input to a 2D map.– The trained map is then used to classify

phonemes.

– Developed for Finnish. A language which is

written as it is said.

Kohonen used this approach to develop what is known as the “PhoneticTypewriter”. A grid of cells was developed from spoken inputs from theFinnish language. After training, the net was examined and each cell waslabelled with the phoneme that it had learnt to match.This net was used to perform a form of speech recognition. In ademonstration, Kohonen set up the map so that it would highlight each labelledcell that matched the input in real time.This too, is a projection of high-dimensional space (in this case 15) onto a twodimensional map and was found useful for visualizing similarities andstructures in the original input space.

Kohonen, T., (1988), The "Neutral" Phonetic Typewriter, Computer, pp 11-22.

23


Constraint Satisfaction

The Travelling Salesman Problem (TSP)– A modified 1D Kohonen net can be applied.– The “map” is modified to act as an elastic

ring.

By modifying a Kohonen net consisting of a one dimensional line of cells (aspace filling curve) it is possible to devise an alternative method for solvingthe TSP.Each cell in the line has two inputs which represent the coordinates of the city.The aim is to modify the Kohonen approach so that the line acts as an elasticring. Edges of the ring are gradually pulled towards toward each city. Durbinand Willshaw claim that their approach described above should find theshortest tour.The four figures show the gradual development of the tour. In general, thereare more cells in the map than there are cities.

24


Feature Extraction

Principle of InformationPreservation - Linsker.

– Self organisation andHebbian learning usedto develop featuredetectors.

Linsker proposed a model of self-organization in the visual system based upona version of Hebbian learning in feed forward layered networks.The network (shown above) consisted of a number of layers labelled A, B, Cetc. Without going into details, the net was trained layer by layer, A then B etc.Linsker’s learning rule tried to maximize the output variance and was basedupon his principle of preserving maximum information.Interestingly enough, inputs to A were random noise.

25


Feature Extraction cont.

Results

Layer C neurons became centre-surround detectors.

And layer G neurons becameorientation-selective.

Centre-surround cells and orientation selective cells have been readilyobserved in the mammalian visual cortex.It is remarkable then that Linsker’s comparatively simple model is capable ofdeveloping a similar structure.This does not imply that biological cells actually develop in the same way.However, it does illustrate very well that simple mechanisms based uponHebbian learning could produce such structures without either visual input orgenetic programming.

26


Other Unsupervised Learning Mechanisms

Adaptive Resonance Theory ART– Carpenter and Grossberg

Vector Quantization– Kohonen

Neural Nets for Principle Component Analysis– Oja

27


Practical Application of SOFM

WEBSOM (c2004)

– Uses the SOFM to organize text documents– Documents placed onto a 2D map– Related documents appear close to each

other

See http://websom.hut.fi/websom/

WEBSOM is an interesting practical application of the self organizing featuremap. The basic idea is simple. WEBSOM organizes a set of documents so thatrelated documents, based on their content, appear closer together on a 2D map.The motivation is to help in exploration and search of documents on the web!The project is lead by the ‘inventor’ of the SOFM, Teuvo Kohonen(http://www.cis.hut.fi/teuvo/).A demo of WEBSOM and links to many articles can be found athttp://websom.hut.fi/websom/A closely related application of WEBSOM is data mining. Excellent articleavailable on this topic athttp://websom.hut.fi/websom/doc/publications.html#lagus04infosci

28


WEBSOM cont.

Application to usenetsearch– See http://websom.hut.fi/websom/comp.ai.neural-nets-

new/html/root.html

Diagram shows 1st levelmap ofcomp.ai.neural-nets

Lighter colours representareas of greater density

A demo of WEBSOM is available athttp://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.htmlThis site shows an application of WEBSOM for searching articles in the usenetnews group; comp.ai.neural-nets.The diagram above shows the 1st level map generated by WEBSOM. The mapis clickable and allows the user to the narrow in on documents related to any ofthe topics appearing at the 1st level.

29


WEBSOM cont.

For exampleclicking on“phoneme”generates this nextmap

This is only a demo and WEBSOM has been used to map over 7 million patentabstracts! However, the project license prohibits this map from being openedup to the public.

Date post:	12-Apr-2018
Category:	Documents
Upload:	vannga
View:	218 times
Download:	3 times