+ All Categories
Home > Documents > Unsupervised Learning - Computer Science Departmentbejar/apren/docum/trans/08-clustering-eng.pdf ·...

Unsupervised Learning - Computer Science Departmentbejar/apren/docum/trans/08-clustering-eng.pdf ·...

Date post: 12-Oct-2018
Category:
Upload: vanmien
View: 223 times
Download: 0 times
Share this document with a friend
65
Unsupervised Learning Javier B´ ejar cbea LSI - FIB Term 2012/2013 Javier B´ ejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 1 / 65
Transcript

Unsupervised Learning

Javier Bejar cbea

LSI - FIB

Term 2012/2013

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 1 / 65

Outline

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 2 / 65

Introduction

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 3 / 65

Introduction

Unsupervised Learning

Usually learning can be done in a supervised or unsupervised way

There are a strong bias in the machine learning community towardssupervised learning

But a lot of concepts are learned unsupervisedly

The discovery of new concepts is always unsupervised

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 4 / 65

Introduction

Unsupervised Learning

Goals:

Summarization: To obtain a representation that describes an unlabeleddatasetUnderstanding: To discover concepts inside data

These task are difficult because the discovery process is biased bycontext

Different answers can be valid depending on the discovery goal or thedomainThere are few criterion to validate the results

Knowledge representation: Unstructured (partitions/clusters) orrelational (hierarchies)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 5 / 65

Algorithms for unsupervised learning

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 6 / 65

Algorithms for unsupervised learning

Unsupervised Learning

Learning by the discovery of predefined structures

For example: probability distributions/models using parametric or nonparametric estimation

It is assumed that the data is embedded in a N-dimensional spacethat has a similarity/dissimilarity function defined

Bias:

Examples are more related to the nearest examples than to the farthestLook for compact groups that are maximally separated from each other

Areas related: Statistics, machine learning, graph theory, fuzzy theory,physics

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 7 / 65

Algorithms for unsupervised learning

Algorithms for unsupervised learning

Two main strategies:

Hierarchical algorithms

Examples are usually organized as a binary treeUsually no explicit division in groups

Partitional algorithms

Only a partition of the dataset is obtained

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 8 / 65

Hierarchical algorithms

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 9 / 65

Hierarchical algorithms

Hierarchical algorithms

Based on graph theory

The examples form a full connected graphSimilarity define the length of the edgesThe clustering is decided using connectivity criteria

Based on matrix algebra

A distance matrix is calculated from the examplesThe clustering is computed using the distance matrixThe distance matrix is updated after each step (different updatingcriteria)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 10 / 65

Hierarchical algorithms

Hierarchical algorithms

Graphs

Single Linkage, Complete Linkage, MSTDivisive, Agglomerative

Matrices

Johnson algorithmDifferent update criteria (S-L, C-L, Centroid, minimum variance)

Computational cost

O(n inst3 × num dimensions)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 11 / 65

Hierarchical algorithms

Agglomerative Graph Algorithm

Algorithm: Agglomerative graph algorithm)

Compute Distance/similarity matrixrepeat

Find the pair of examples with higher similarityAdd an edge to the graph corresponding to this pairif Agglomeration criteria holds then

Merge the clusters the pair belongs toend

until Only one Cluster

Single linkage = The new edge is between to disconnected graphs

Complete linkage = The new edge creates a clique with all the nodesof both subgraphs

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 12 / 65

Hierarchical algorithms

Hierarchical algorithms - Graphs

2 3 4 5

1 6 8 2 72 1 5 33 10 94 4 5 2 3 1 4 2 3 1 54

Single Link Complete Link

2

1

3

4

5

2

1

3

4

5

2

1

3

4

5

2

1

3

4

5

2

1

3

4

5

2

1

3

4

5

2

1

3

4

5

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 13 / 65

Hierarchical algorithms

Agglomerative Johnson algorithm

Algorithm: Agglomerative Johnson algorithm

Compute Distance/similarity matrixrepeat

Find the pair of groups/examples with the higher similarityMerge the pair of groups/examplesDelete the rows and columns corresponding to the pair ofgroups/examplesAdd a new row and column with the new distances to the new group

until Matrix has one element

Single linkage = New distance is the distance between the nearestexamples

Complete linkage = New distance is the distance between the farthestexamples

Average linkage = New distance is the distance between centroids

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 14 / 65

Hierarchical algorithms

Hierarchical algorithms - Matrices

2 3 4 5

1 6 8 2 72 1 5 33 10 94 4

2,3 4 5

1 7 2 72,3 7.5 64 4

1,4 5

2,3 7.25 61,4 5.5

1,4,5

2,3 6.725

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 15 / 65

Hierarchical algorithms

Hierarchical algorithms - Example

Data Single Link Complete Link

−1 0 1 2 3 4 5

−2

02

46

8

x1

x2

●●

● ●

2.0 1.5 1.0 0.5 0.0

151723162018211922241113141012241798635

12 10 8 6 4 2 0

151723192224182116201014118126351314279

Median Centroid Ward

4 3 2 1 0

111381263510241791418211620192224151723

4 3 2 1 0

182116201922241517231114138126351042179

50 40 30 20 10 0

142796810123511131517231418211620192224

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 16 / 65

Hierarchical algorithms

Hierarchical algorithms - Shortcomings

A partition of the data is not given, it has to be decided a posteriori

Some undesirable and strange behaviours could appear (chaining,inversions) distorting the results

Dendrogram is not a practical representation for large amount of data

Its computational cost is high

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 17 / 65

Concept Formation

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 18 / 65

Concept Formation

Other hierarchical algorithms - Concept Formation

Learning has an incremental nature (experience is acquired fromcontinuous observation, not at once)

Concepts are learned with their relationships (polithetic hierarchies ofconcepts)

Search in the space of hierarchies

An objective function measures the utility of the learned structure

The updating of the structure is performed by a set of conceptualoperators

The result depends on the order of the examples

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 19 / 65

Concept Formation

Concept Formation - COBWEB (Fisher, 1989)

Based on ideas from cognitive psychology

Learning is incrementalConcepts are organized in a hierarchyConcepts are organized around a prototype and describedprobabilisticallyHierarchical concept representation is modified via cognitive operators

Builds a hierarchy top/down

Four conceptual operators

Uses an heuristic measure to find the basic level (Category utility)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 20 / 65

Concept Formation

COBWEB - Category utility

Category utility is defined for a set of categories

Bias the search towards categories with high intra-similarity and lowinter-similarity

Maximized by the categories in the basic level (preferred level forprediction)

These classes maximize the predictivity of their attributes

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 21 / 65

Concept Formation

COBWEB - Category utility

Intra class similarity: P(Ai = Vij |Ck)

Maximize → most of the examples in the class share this value forthis attribute

Inter class similarity: P(Ck |Ai = Vij)

Maximize → fewer examples from other classes share this value forthis attribute

Maximize the trade off between the two measures for a given set ofcategories:

K∑k=1

P(Ai = Vij)I∑

i=1

J∑j=1

P(Ai = Vij |Ck)P(Ck |Ai = Vij)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 22 / 65

Concept Formation

COBWEB - Category utility

Using Bayes theorem:

K∑k=1

P(Ck)I∑

i=1

J∑j=1

P(Ai = Vij |Ck)2

∑Ii=1

∑Jj=1 P(Ai = Vij |Ck)2 represents the number of attributes that

can be correctly predicted for a class

We look for a partition that increases this number of attributescompared to a baseline (no partition)

I∑i=1

J∑j=1

P(Ai = Vij)2

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 23 / 65

Concept Formation

COBWEB - Category utility

Category utility for qualitative attributes for a set of k categories {C1,... Ck}

∑Kk=1 P(Ck)

∑Ii=1

∑Jj=1 P(Ai = Vij |Ck)2 −

∑Ii=1

∑Jj=1 P(Ai = Vij)

2

K

Category utility for quantitative attributes (Gaussian distributions)∑Kk=1 P(Ck)

∑Ii=1

1σik−

∑Ii=1

1σip

K

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 24 / 65

Concept Formation

Probabilistic hierarchy

Negro

Blanco

Triángulo

Cuadrado

Círculo

Negro

Blanco

Triángulo

Cuadrado

Círculo

Negro

Blanco

Triángulo

Cuadrado

Círculo

Negro

Blanco

Triángulo

Cuadrado

Círculo

Negro

Blanco

Triángulo

Cuadrado

Círculo

1.0

0.0

1.00.0

0.0

0.0

1.0

0.0

0.0

1.0

0.0

1.0

0.01.0

0.0

0.0

0.0

1.0

0.66

0.33

0.25

0.75

0.25

0.25

0.50

P(V|C)

Forma

Color

P(C0)=1.0 P(V|C)

Forma

Color

P(V|C)

Forma

Color

P(V|C)

Forma

Color

P(V|C)

Forma

Color

P(C0)=0.25

P(C0)=0.50

P(C0)=0.75

P(C0)=0.25

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 25 / 65

Concept Formation

Algorithm

Incremental insertion of each example in the hierarchy

Look for the path from the root that puts the example in a leaf

Decide at each level how to modify the hierarchy (which operatorapply) to maximize CU and descend recursively the tree

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 26 / 65

Concept Formation

Operators

Incorporate: Put the example inside an existing class

New class: Create a new class at this level

Merge: Two concepts are merge and the example is incorporatedinside the new class

Divide: A concept is substituted by its children

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 27 / 65

Concept Formation

Split - Merge

Oi

Oi

MERGE

SPLIT

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 28 / 65

Concept Formation

COBWEB

Procedure: Depth-first limited search COBWEB (x: Example, H:Hierarchy)

Update the father with the new exampleif we are in a leaf then

Create a new level with this exampleelse

Compute CU of incorporating the example to each classSave the two best CUCompute CU of merging the best two classesCompute CU of splitting the best classCompute CU of creating a new class with the exampleRecursive call with the best choice

end

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 29 / 65

Partitional algorithms

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 30 / 65

Partitional algorithms

Partitional algorithms

The computational cost to find the optimal partition of N objects in Kgroups is NP-hard

Model/prototype based algorithms (K-means, Gaussian MixtureModels, Fuzzy K-means, Leader algorithm, ...)

Density based algorithms

Grid based algorithms

Graph theory based algorithms (spectral Clustering

Unsupervised Neural networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 31 / 65

Partitional algorithms Model/Prototype Based Clustering

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 32 / 65

Partitional algorithms Model/Prototype Based Clustering

K-means

We assume that the shape of the clusters is hyperspherical

An iterative algorithm assigns each example to one of K groups (K isa parameter)

Hill Climbing search

Optimization criteria (square error, minimize the distance of eachexample to the centroid of the class)

Distorsion =K∑

k=1

∑i∈Ck

‖ xi − µk ‖2

The algorithm converges to a local minima

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 33 / 65

Partitional algorithms Model/Prototype Based Clustering

K-means

Algorithm: K-means (X: Examples, k:integer)

Generate k prototypes with the k first examplesAssign the n-k examples to its nearest prototypeSumD = Sum of square distances examples-prototypesrepeat

Recalculate prototypesReassign examples to its nearest prototypeSumI = SumDSumD = Sum of square distances examples-prototypes

until SumI - SumD < ε

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 34 / 65

Partitional algorithms Model/Prototype Based Clustering

K-means

1

11 1

1

22

2

2

22

22

1

1

11

2

1

12

1

11 1

1

2

2 2

2

22

2

2

22

22

1

11 1

1

2

22

2

2

22

22

2

1

1

1

1

11 1

1

22

2

2

22

22

1

1

11

2

1

1

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 35 / 65

Partitional algorithms Model/Prototype Based Clustering

K-means - practical problems

The algorithm is sensitive to the initialization (to run the algorithmfrom random initializations could be a god idea)

Find the value of k is not an easy problem (experimentation withdifferent values is needed)

You can obtain a solution even if the classes are not hyperspherical(some classes could be splitted)

No guarantee about the quality of the solution

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 36 / 65

Partitional algorithms Model/Prototype Based Clustering

Mixture Decomposition - EM algorithm

We assume that the data are drawn from a mixture of probabilitydistribution functions (usually Gaussian), we are looking for theparameters of the distributions that explain better the data

The model of the data is:

P(x |θ) =K∑i=1

wiP(x |θi ,wi )

Being K the number of clusters and∑K

i=1 wi = 1

The membership of an example is a probability distribution

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 37 / 65

Partitional algorithms Model/Prototype Based Clustering

Mixture Decomposition - EM algorithm

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 38 / 65

Partitional algorithms Model/Prototype Based Clustering

Mixture Decomposition - EM algorithm

The goal is to estimate the parameters of the distribution thatdescribes each class (e.g.: means and standard deviations)

The algorithm maximizes the likelihood of the distribution withrespect to the dataset

It performs iteratively two steps

Expectation: We calculate a function that assigns a degree ofmembership to all the instances to any of the K probability distributionsMaximization: We re-estimate the parameters of the distributions tomaximize the membership

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 39 / 65

Partitional algorithms Model/Prototype Based Clustering

EM Algorithm (K Gaussian)

For the Gaussian case:

P(x |−→µ ,Σ) =K∑i=1

P(wi )P(x |−→µi ,Σi ,wi )

Being −→µ the vectors of means and Σ the covariance matrices

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 40 / 65

Partitional algorithms Model/Prototype Based Clustering

EM Algorithm (K Gaussian)

The computations depend on the assumptions that we make about theattributes (independent or not, same σ, ...)

The attributes are independent: µi and σi have to be computed foreach class (O(k) parameters) (model: hyper spheres or ellipsoidsparallel to coordinate axis)

The attributes are not independent: µi , σi and σij have to becomputed for each class (O(k2) parameters) (model: hyper ellipsoidsnon parallel to coordinate axis)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 41 / 65

Partitional algorithms Model/Prototype Based Clustering

EM Algorithm (K Gaussian)

For the case of A independent attributes:

P(x |−→µi ,Σi ,wi ) =A∏

j=1

P(x |µij , σij ,wi )

The model to fit is

P(x |−→µ ,−→σ ) =K∑i=1

P(wi )A∏

j=1

p(x |µij , σij ,wi )

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 42 / 65

Partitional algorithms Model/Prototype Based Clustering

EM Algorithm (K Gaussian)

The update of the parameters in the maximization step is:

µi =

∑Nk=1 P(wi |xk ,−→µ ,−→σ )xk∑Nk=1 P(wi |xk ,−→µ ,−→σ )

σi =

∑Nk=1 P(wi |xk ,−→µ ,−→σ )(xk − µi )2∑N

k=1 P(wi |xk ,−→µ ,−→σ )

P(wi ) =1

N

N∑k=1

P(wi |xk ,−→µ ,−→σ )

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 43 / 65

Partitional algorithms Model/Prototype Based Clustering

EM Algorithm (K Gaussian)

A set of K initial distributions is generated, N(µi , σi ), µi and σi arevectors corresponding to the mean and the variance of each attribute

We repeat until convergence:1 Expectation: Compute the membership of each instance to each

probability distribution. Usually we use the log likelihood function ofthe distribution

Each instance will have a weight depending of the probability assignedby the previous step wxj,i = log(P(xj |N(µi , σi ))) (MLE)

2 Maximization: Recompute the parameters using the weights from theprevious steps and obtain the new µi and σi for each distribution

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 44 / 65

Partitional algorithms Model/Prototype Based Clustering

EM algorithm - Comments

K-means is a particular case of this algorithm

The main advantage is that we obtain a membership as a probability(soft assignments)

Using different probability distribution we can find different kinds ofstructures.

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 45 / 65

Partitional algorithms Model/Prototype Based Clustering

Incremental algorithms: Neighbourhood relationship

The commonality among all the algorithms until this point is thatthey are not incremental

Incrementality allows to update a model with new data withoutstarting from scratch

These algorithms use the neighbourhood relationship defined from asimilarity/distance function

This neighbourhood determines what instances belong to the samegroup

Examples: Nearest Neighbour, Mutual Neighbour

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 46 / 65

Partitional algorithms Model/Prototype Based Clustering

Nearest Neighbour/Leader Algorithm

Algorithm: LeaderAlgorithm (X: Examples, D:double)

Generate a prototype with the first examplewhile there are examples do

e= current exampled= distance of e to the the nearest prototypeif d ≤ D then

Introduce the example in the classRecompute the prototype

elseCreate a new prototype with this example

end

end

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 47 / 65

Partitional algorithms Model/Prototype Based Clustering

Nearest Neighbour

1

2

111

1

11

1 1

111

1

11

1

2

2

2

2

22

2

211

1

31

11

1

11

1

2

2

2

2

22

2

11

1

22

22

2

2

3

33

3

3

33

33

1

1

111 1

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 48 / 65

Partitional algorithms Model/Prototype Based Clustering

Fuzzy Clustering

Fuzzy clustering relax the hard partition constraint of K-means

Each instance has a degree of membership to each partition

A new optimization function is introduced:

L =K∑

k=1

N∑i=1

δ(Ck , xi )b‖xi − µk‖2

where∑K

k=1 δ(Ck , xi ) = 1 and b is a blending factor

This is an advantage over other algorithms when clusters overlapped

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 49 / 65

Partitional algorithms Model/Prototype Based Clustering

Fuzzy Clustering - Fuzzy C-means

Fuzzy C-means is the most known fuzzy clustering algorithm, it is thefuzzy version of K-means

The algorithm performs the optimization of the objective function ina similar way

The updating of the cluster centers are computed as:

µj =

∑Ni=1 δ(Cj , xi )

bxi∑Ni=1 δ(Cj , xi )b

And the updating of the memberships:

δ(Cj , xi ) =(1/dij)

1/(1−b)∑Kk=1(1/dik)1/(1−b)

, dij = ‖xi − µj‖2

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 50 / 65

Partitional algorithms Model/Prototype Based Clustering

Fuzzy Clustering

Other membership and distance functions can be used

Different functions have specific purposes like to detect specificshapes in the data (lines, rectangles, ...)

This algorithm is broadly used in image recognition

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 51 / 65

Partitional algorithms Density/Grid Based Clustering

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 52 / 65

Partitional algorithms Density/Grid Based Clustering

Density estimation

The number of groups is not decided beforehand

We are looking for regions with high density of examples

We are no limited to a predefined set of shapes (non parametricmodel)

Different approaches:

Space partitioning (multidimensional grid)Multidimensional histograms (we look for high density regions with lessdimensions)

Usually it is more suited to datasets with low dimensionality (e.g.geographical data)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 53 / 65

Partitional algorithms Density/Grid Based Clustering

Density estimation - Space partitioning

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 54 / 65

Partitional algorithms Density/Grid Based Clustering

Density estimation - Multidimensional grids

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 55 / 65

Partitional algorithms Graph Based Clustering

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 56 / 65

Partitional algorithms Graph Based Clustering

Based in graph theory

We create different kinds of graphs with the dataset (MST, Voronoi,Delanau, ...)

We give consistency criteria for the edges of the graph (delete)

The result is a set of unconnected components

Two advantages: we do not need to know the number of classes, wedo not look for a specific model (any shape is possible)

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 57 / 65

Partitional algorithms Graph Based Clustering

Based in graph theory

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 58 / 65

Partitional algorithms Graph Based Clustering

Spectral Clustering

Spectral graph theory defines properties that hold the eigenvalues andeigenvectors of the adjacency matrix or Laplacian matrix of a graph

Spectral clustering uses spectral properties of the distance matrix

The distance matrix represents a graph that connects the examples

Complete graphNeighbourhood graph (different definitions)

From the diagonalization of this matrix some clustering algorithmscan be defined

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 59 / 65

Partitional algorithms Graph Based Clustering

Spectral Clustering

We start with the similarity matrix (W ) of a dataset (complete or not)

This matrix represents the similarity graph of the instances

The degree of an edge is defined as:

di =n∑

j=1

wij

We define the degree matrix D as the matrix with valuesd1, d2, . . . , dn as diagonal

We can define different Laplace matrices:

Unnormalized: L = D −WNormalized: Lsym = D−1/2LD−1/2 or also Lrw = D−1L

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 60 / 65

Partitional algorithms Graph Based Clustering

Spectral Clustering

We can cluster a dataset following this steps:1 Compute the Laplace matrix from the similarity matrix2 Compute the first K eigenvalues of the Laplace matrix3 Use the eigenvectors as new datapoints4 Apply K-means as clustering algorithm

We are embedding the dataset in a space with less dimensions usingthe neighbourhood relations among the data

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 61 / 65

Partitional algorithms Unsupervised Neural Networks

1 Introduction

2 Algorithms for unsupervised learning

3 Hierarchical algorithms

4 Concept Formation

5 Partitional algorithmsModel/Prototype Based ClusteringDensity/Grid Based ClusteringGraph Based ClusteringUnsupervised Neural Networks

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 62 / 65

Partitional algorithms Unsupervised Neural Networks

Unsupervised Neural Networks

Self-organizing maps are an unsupervised neural network method

Can be seen as an on-line constrained version of K-means

The data is transformed to fit in a 1-d or 2-d mesh

The nodes of this mesh are the prototypes

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 63 / 65

Partitional algorithms Unsupervised Neural Networks

Self-Organizing Maps

To build the map we have to decide the size and shape of the mesh(rectangular/hexagonal)

Each node of the mesh is a multidimensional prototype of p features

Algorithm: Self-Organizing Maps algorithm

Initial prototypes are distributed regularly on the meshfor Predefined number of iterations do

foreach Example xi doFind the nearest prototype (mj)Determine the neighborhood of mj (M)foreach Prototype mk ∈M do

mk = mk + α(xi −mk)end

end

end

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 64 / 65

Partitional algorithms Unsupervised Neural Networks

Self-Organizing Maps

During the iterations the mesh is transformed to be closer to the data,but maintaining the bidimensional relationship between prototypes

The performance of the algorithm depends on the learning rate α,usually is decreased from 1 to 0 during the iterations

The neighborhood of a prototype is defined by the adjacency of thecells and the distance of the prototypes

The number of neighbors used in the update is decreased during theiterations from a predefined number to 1 (only the prototype nearestto the observation)

Different variations of the algorithm give more weight depending onthe distance of the prototypes

Javier Bejar cbea (LSI - FIB) Unsupervised Learning Term 2012/2013 65 / 65


Recommended