1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

transcript

Kunstmatige Intelligentie / RuG

KI2 - 7

Clustering Algorithms

Johan Everts

What is Clustering?

Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

The Goals of Clustering

Determine the intrinsic grouping in a set of unlabeled data.

What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them

There is no golden standard, depends on goal: data reduction “natural clusters” “useful” clusters outlier detection

Stages in clustering

Taxonomy of Clustering Approaches

Hierarchical Clustering

Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

Single link

Agglomerative Clustering

In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

Complete link

Agglomerative Clustering

In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

Example – Single Link AC

BA FI MI NA RM TO

BA 0 662 877 255 412 996

FI 662 0 295 468 268 400

MI 877 295 0 754 564 138

NA 255 468 754 0 219 869

RM 412 268 564 219 0 669

TO 996 400 138 869 669 0

BA FI MI/TO NA RM

BA 0 662 877 255 412

FI 662 0 295 468 268

MI/TO 877 295 0 754 564

NA 255 468 754 0 219

RM 412 268 564 219 0

BA FI MI/TO NA/RM

BA 0 662 877 255

FI 662 0 295 268

MI/TO 877 295 0 564

NA/RM 255 268 564 0

BA/NA/RM FI MI/TO

BA/NA/RM 0 268 564

FI 268 0 295

MI/TO 564 295 0

BA/FI/NA/RM MI/TO

BA/FI/NA/RM 0 295

MI/TO 295 0

Taxonomy of Clustering Approaches

Square error

K-Means

Step 0: Start with a random partition into K clusters

Step 1: Generate a new partition by assigning each pattern to its closest cluster center

Step 2: Compute new cluster centers as the centroids of the clusters.

Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

K-Means

K-Means – How many K’s ?

Locating the ‘knee’

The knee of a curve is defined as the point of maximum curvature.

Leader - Follower

Online Specify threshold distance

Find the closest cluster center Distance above threshold ? Create new

cluster Or else, add instance to cluster

Leader - Follower

cluster Or else, add instance to cluster

Leader - Follower

cluster Or else, add instance to cluster and update

cluster center

Distance < Threshold

Leader - Follower

cluster center

Leader - Follower

cluster center

Distance > Threshold

Kohonen SOM’s

The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

Kohonen SOM’s

Each weight is representative of a certain input. Input patterns are shown to all neurons simultaneously. Competitive learning: the neuron with the largest response is chosen.

Kohonen SOM’s

Initialize weights Repeat until convergence

Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size

Learning rate & neighbourhood size

Kohonen SOM’s

Distance related learning

Kohonen SOM’s

Some nice illustrations

Kohonen SOM’s

Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map

Performance Analysis

K-Means Depends a lot on a priori knowledge (K) Very Stable

Leader Follower Depends a lot on a priori knowledge

(Threshold) Faster but unstable

Performance Analysis

Self Organizing Map Stability and Convergence Assured

Principle of self-ordering Slow and many iterations needed for

convergence Computationally intensive

Conclusion

No Free Lunch theorema Any elevated performance over one class, is

exactly paid for in performance over another class

Ensemble clustering ? Use SOM and Basic Leader Follower to

identify clusters and then use k-mean clustering to refine.

Any Questions ?

1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

Documents