Fuzzy ClusteringPresented By: Omid Sayadi*
Supervisor: Dr. Bagheri
* PhD. student, Biomedical Image and Signal Processing Lab (BiSIPL),Department of Electrical Engineering, Sharif University of Technology,
Spring 2008 Sharif University of Technology 2
Fuzzy Clustering
• Problem Statement• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications• Discussion and Conclusions
Spring 2008 Sharif University of Technology 3
Introduction• Cluster
A number of similar individuals that occur together as a two or more consecutive features that span a specific subspace of a concept.
orCollection of data objects,
similar to one anotherwithin the same clusterdissimilar to the objectin other clusters
Lorries
Sport Cars
Medium Market Cars
Weight (Kg)
Top Speed (Km/h)
Spring 2008 Sharif University of Technology 4
Introduction (cont.)• Clustering
Process of grouping a set of physical or abstract objects into classes of similar objects.
“ Clustering is the art of finding groups in data ”Kaufmann & Rousseeu
Cluster analysis is an important human activity: • Distinguishing in early chilhood,• Learn a new object or understand a new
phenomenon (feature extraction and comparison)
Spring 2008 Sharif University of Technology 5
Introduction (cont.)• Motivation
• Discovering hidden patterns and structures,
• Discovering large sets of data into small number of meaningful groups (clusters),
• Dealing with a managable number of homogenous groups, instead of dealing with a vast number of single data objects,
• Data reduction and information compaction.
Spring 2008 Sharif University of Technology 6
Introduction (cont.)• Clustering vs. Classification
• Clustering: Unsupervised Learning• No class labels defined.
• Classification: Supervised Learning• Predefined (priori known) clas labels,• Training set (labeled) and test set.
Clustering is unsupervised classification, where no classes are predefined (labeled).
Spring 2008 Sharif University of Technology 7
Introduction (cont.)• Similarity measures
• Clustering:
Max intra-similarity
Min inter-similarity
Spring 2008 Sharif University of Technology 8
Introduction (cont.)• Similarity measure functions
Minkowski
Tchebyschev
Hamming
Euclidean
Spring 2008 Sharif University of Technology 9
Introduction (cont.)• Clustering Approaches
• Hierarchy algorithms• Find successive clusters using previously
established clusters.• Partitioning algorithms
• Construct various partitions and then evaluate.• Determine all clusters at once.
• Model-based algorithms• Grid-based algorithms• Density-based algorithms
Spring 2008 Sharif University of Technology 10
Introduction (cont.)• Hierarchical Clustering
• Create a hierarchical decomposition of the data set using some criterion and a termination condition.
• Divisive (Top-down ) • Agglomerative (Bottom-up)
Spring 2008 Sharif University of Technology 11
Introduction (cont.)• Divisive vs Agglomerative
☺
Spring 2008 Sharif University of Technology 12
Introduction (cont.)• Partitional Clustering
• Given a database of N objects, partition the objectsinto a pre-specified number of K clusters.
Liu, 1968
• The clusters are formed to optimize a similarityfunction (max intra-similarity and min inter-similarity).
• Popular Partitioning Algorithms: • K-means • EM (Expectation Maximization)
∑=
−⎟⎟⎠
⎞⎜⎜⎝
⎛−=
K
i
Ni iKiK
KKNM
0)()1(
!1),(
Number of clustering ways
Spring 2008 Sharif University of Technology 13
Introduction (cont.)• Challenges
• Hierarchy algorithms• The tree of clusters (dendogram) needs satisfaction
of a termination criteria → dendogram cutting• Agglomerative or Divisive• Irreversible split and merge
• Partitioning algorithms• Pre-selection of number of clusters (K).
Spring 2008 Sharif University of Technology 14
Introduction (cont.)• K-means algorithm
• Given the number of clusters (K), partition objects (randomly) into K nonempty subsets,
• While new assignments occur, do: • Compute seed points as the centroids (virtual
mean point) of the clusters of the current partition.
• Assign each object to the cluster with the nearest seed point.
Spring 2008 Sharif University of Technology 15
Introduction (cont.)• K-means example
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Problem:
Equal distance to centroids !
Spring 2008 Sharif University of Technology 16
Introduction (cont.)• Taxonomy of Clustering Approaches
Spring 2008 Sharif University of Technology 17
Fuzzy Clustering
• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications• Discussion and Conclusions
• Introduction
Spring 2008 Sharif University of Technology 18
Problem Statement• HCM (K-means) Formulation
• Set of data in the feature space},,,{ 21 nxxxX• Ci ith cluster
rK
rr=
KciallforUCØ
jiallforØCC
UC
i
ji
c
ii
≤≤⊂⊂
≠=∩
==
2
1U
All clusters C together fills the
whole universe U
Clusters do not overlap
A cluster C is never empty and it is
smaller than the whole universe U
There must be at least 2 clusters in a c-partition and at most as many as the number
of data points K
Spring 2008 Sharif University of Technology 19
Problem Statement (cont.)• K-means Failures
• The objective function in classical clustering:
• Each data must be assigned to exactly one cluster.• The problem of data points that are equally distant.
∑ ∑∑= ∈= ⎟⎟
⎟
⎠
⎞
⎜⎜⎜
⎝
⎛−==
c
i Ckik
c
ii
ik
JJ1
2
,1 ucu
Minimise the total sum of all distances
Spring 2008 Sharif University of Technology 20
Problem Statement (cont.)
• Equi-distant data points• Butterfly data points (Ruspini’s Butterfly 1969)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Spring 2008 Sharif University of Technology 21
Problem Statement (cont.)• Towards Fuzzy Clustering
• We need to support uncertainity → Each data can belong to multiple clusters with varying degree of membership.
• The space is partitionedinto overlaping groups.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Crisp Clusters
Fuzzy Clusters
Spring 2008 Sharif University of Technology 22
Problem Statement (cont.)• Fuzzy C-Partition Formulation
• Set of data in the feature space},,,{ 21 nxxxX• Ci ith cluster
rK
rr=
KciallforUCØ
jiallforØCC
UC
i
ji
c
ii
≤≤⊂⊂
≠=∩
==
2
1U
All clusters C together fills the
whole universe U
Clusters do overlap
A cluster C is never empty and it is
smaller than the whole universe U
There must be at least 2 clusters in a c-partition and at most as many as the number
of data points K
or ≠ Ø
Spring 2008 Sharif University of Technology 23
Problem Statement (cont.)• Fuzzy Clustering Types
Hard Clustering
Fuzzy Clustering
Omitting thenon-overlapping
condition
Probabilistic Fuzzy Clustering
Bezdek, 1981
Possibilistic Fuzzy Clustering
Krishnapuram & Keller,
1993
Spring 2008 Sharif University of Technology 24
Problem Statement (cont.)• Probabilistic Fuzzy Clustering
• A constraint optimization
• The membership degree of a datum Xj to ith cluster
• Fuzzy label vector to each data point Xj
• Fuzzy label vector to each data point Xj
]1,0[)( ∈= jCij xui
rμT
cjjj uuu ),,( 1 Kr=
),,,(][ 21 nncij uuuuU rK
rr== ×
∑∑= =
=c
i
n
jij
mijff duCUXJ
1 1
2),,( & },,1{0
1ciu
n
jij K∈∀>∑
=
},,1{11
njuc
iij K∈∀=∑
=
No empty cluster
Normalization Constraint
Spring 2008 Sharif University of Technology 25
Problem Statement (cont.)• Probabilistic Fuzzy Clustering (cont.)
• m determines the “fuzziness”of the clustering:
∑∑= =
=c
i
n
jij
mijff duCUXJ
1 1
2),,( Distance between datum Xj and cluster i
Fuzzifier exponent (m≥1)
Usually m=2
m→1 : more crisp clustering
m→∞ : more fuzzy clustering
m=1.1
m=2
Spring 2008 Sharif University of Technology 26
Problem Statement (cont.)• Probabilistic Fuzzy Clustering (cont.)
• The cost function Jf cannot be minimized directly, hence an alternative optimization scheme (AO) must be used.
• The iterative algorithm:• First, the membership degrees are optimized for fixed
cluster parameters:
• Then, the cluster parameters are optimized for fixedmembership degrees:
0),( 1 >= − tCjU tUt
)( tct UjC =
Spring 2008 Sharif University of Technology 27
Problem Statement (cont.)• Probabilistic Fuzzy Clustering (cont.)
• Minimization result:• Update formula for the membership degree:
• It depends not only to the distance of the datum Xj to cluster i, but also on the distances between this data point and other clusters.
∑=
−−
−−+ = c
l
mlj
mijt
ij
d
du
1
)1/(2
)1/(2)1( Gravitation to cluster i
relative to total gravitation
Spring 2008 Sharif University of Technology 28
Problem Statement (cont.)• Probabilistic Fuzzy Clustering (cont.)
• What about the cluster prototypes (C) ?• They are algorithm dependent, i.e. they depend on:
• Describing parameters of the cluster (location, shape, size)• Distance measure d.
• Problem: Lack of Typicality• The notmalization constraint
causes the cluster to tend tothe outliers.
• No difference betweenx1 and x2 (0.5 for both)
Spring 2008 Sharif University of Technology 29
Problem Statement (cont.)• Possibilistic Fuzzy Clustering
• Idea: Drop the normalization condition in probabilistic fuzzy clustering.
• Remainig constraint:
• The cost function to be minimized:
},,1{11
njuc
iij K∈∀=∑
=
∑ ∑∑∑= == =
−+=c
i
n
j
miji
c
i
n
jij
mijff uduCUXJ
1 11 1
2 )1(),,( η
A penalty term which forces the membership degrees away from zero.
Spring 2008 Sharif University of Technology 30
Problem Statement (cont.)• Possibilistic Fuzzy Clustering (cont.)
• ηi>0: Used to balance the contrary objectives expressedin the two above terms.
• Minimization result:
• Result: The membership degree of a datum Xj to cluster idepends only on its distance to this cluster.
∑ ∑∑∑= == =
−+=c
i
n
j
miji
c
i
n
jij
mijff uduCUXJ
1 11 1
2 )1(),,( η
)1/(12
1
1−
⎟⎟⎠
⎞⎜⎜⎝
⎛+
= m
i
ij
ijd
u
η
Spring 2008 Sharif University of Technology 31
Problem Statement (cont.)• More about ηi
• Let m=2 in the previous update equation. If ηi equals (dij)2, then uij=0.5 . Hence, ηi determines the distance to the cluster i at which the membership degree should be 0.5 .
• Permitted extension of the cluster can be controlled by this parameter.
• ηi can be estimated by the fuzzy intra-class distance in probabilistic fuzzy clustering model:
∑
∑
=
== n
j
mij
n
jij
mij
i
u
du
1
1
2
η
Spring 2008 Sharif University of Technology 32
Fuzzy Clustering
• Fuzzy Clustering Applications• Discussion and Conclusions
• Introduction• Problem Statement
Spring 2008 Sharif University of Technology 33
FC algorithms• Major algorithms
• Fuzzy c-means (FCM)• Possibilistic c-means (PCM)
• Assumptions:• Input: data matrix (Xp×n), and number of clusters (c).
• Output: cluster centers (C), and fuzzy partition matrix (U).
• Initialize cluster centers randomly for all algorithms.
• Gustafson-Kessel (GK)
Spring 2008 Sharif University of Technology 34
FC algorithms (cont.)• FCM
• A probabilistic fuzzy clustering approach,• Finds c spherical clusters → the cluster prototype is
the cluster center (C),• The found clusters are approximately the same size,• Distance measure: Euclidian distance,• According to the objective function Jf the cluster
prototype is updated as:
∑
∑+
=
+
+ = nmt
ij
n
jj
mtij
ti
u
xuC
)1(
1
)1(
)1(
)(
)( r
=j 1
Spring 2008 Sharif University of Technology 35
FC algorithms (cont.)• FCM algorithm
• While repeat:• Compute distances.• Compute membership values (Partition matrix):
• Compute cluster centers:
ciforu
xuC n
j
mtij
n
jj
mtij
ti ,...,1
)(
)(
1
)1(
1
)1(
)1( ==
∑
∑
=
+
=
+
+
r
εε <−<− ++ )()1()()1( tttt CCorUU
Njandciford
du c
l
mlj
mijt
ij ,,1,,1
1
)1/(2
)1/(2)1( KK ===
∑=
−−
−−+
Spring 2008 Sharif University of Technology 36
FC algorithms (cont.)• FCM (cont.)
• The probabilistic FCM is widely used as an initializer for other clustering methods.
• It is a fast, reliable and stable method.• In practice, FCM is not likely to stuck in local
minimums.
• But it has problems:• Lack of typicality,• Sensitive to outliers.
solution
PCM
Spring 2008 Sharif University of Technology 37
FC algorithms (cont.)• PCM algorithm
• While repeat:• Compute distances.• Compute membership values (Partition matrix):
• Compute cluster centers:
ciforu
xuC n
j
mtij
n
jj
mtij
ti ,...,1
)(
)(
1
)1(
1
)1(
)1( ==
∑
∑
=
+
=
+
+
r
εε <−<− ++ )()1()()1( tttt CCorUU
Njandciford
u m
i
ij
ij ,...,1,...,1
1
1)1/(12
==
⎟⎟⎠
⎞⎜⎜⎝
⎛+
= −
η Different from FCM
The same as FCM
Spring 2008 Sharif University of Technology 38
FC algorithms (cont.)• FCM vs. PCM
• PCM has solved the problems of FCM, but we face a new problem :
Cluster Coincidence and Cluster Repulsion
FCM PCM
Spring 2008 Sharif University of Technology 39
FC algorithms (cont.)• GK
• Problem of FCM and PCM: only spherical clusters.• In GK, each cluster is characterized by its center and
covariance matrix: • GK finds ellipsoidal clusters with approximately the
same size.• Clusters adapt themselves to the shape and location
of data, because of the covariance matrix.• Cluster size can be controlled by:
• Usually
cicC iii ,...,1},,{ =Σ=r
( )iΣdet( ) 1det =Σi
Spring 2008 Sharif University of Technology 40
FC algorithms (cont.)• GK (cont.)
• The Mahalanobis distance is used in GK:
• Each cluster have its special size and shape,• The algorithm is locally adaptive,• We need an update equation for covariance matrix, to minimize the objective function (either in probabilistic or possibilistic):
( ) ( ) ( )ijiT
ijp
iij cxcxCxd rrrrr−Σ−Σ= −112 det),(
∑
∑
=
+
=
+++
+
−−=Σ n
j
tij
n
j
Ttij
tij
tij
ti
u
cxcxu
1
)1(
1
)1()1()1(
)1())(( rrrr
Spring 2008 Sharif University of Technology 41
FC algorithms (cont.)• GK algorithm
• While repeat:• Compute distances.• Compute membership values (Partition matrix):
• Compute cluster centers and cluster covariance matrix:
ciforu
xuC n
j
mtij
n
jj
mtij
ti ,...,1
)(
)(
1
)1(
1
)1(
)1( ==
∑
∑
=
+
=
+
+
r
εε <−<− ++ )()1()()1( tttt CCorUU
∑=
−−
−−+ = c
l
mlj
mijt
ij
d
du
1
)1/(2
)1/(2)1( or )1/(12
1
1−
⎟⎟⎠
⎞⎜⎜⎝
⎛+
= m
i
ij
ijd
u
η
∑
∑
=
+
=
+++
+
−−=Σ n
j
tij
n
j
Ttij
tij
tij
ti
u
cxcxu
1
)1(
1
)1()1()1(
)1())(( rrrr
Spring 2008 Sharif University of Technology 42
FC algorithms (cont.)• FCM vs. GK
FCM GK
Spring 2008 Sharif University of Technology 43
FC algorithms (cont.)• Other non-point-prototypes clustering models
• Shell clustering algorithms are used for segmentation and the detection of special geometrical contours.
Spring 2008 Sharif University of Technology 44
Fuzzy Clustering
• Discussion and Conclusions
• Introduction• Problem Statement• Fuzzy Clustering Algorithms
Spring 2008 Sharif University of Technology 45
FC applications• Typical Applications
• Fuzzy Inference System (FIS),• Image Processing,• Pattern Recognition, • Machine learning,• Data minig,• Social network analysis,• and ...
Spring 2008 Sharif University of Technology 46
FC applications (cont.)• FIS
• Fuzzy inference mechanism is summerized as:
• Q1: Where do the membership functions come from?• Q2: How are the if-then rules extracted from data?
Spring 2008 Sharif University of Technology 47
FC applications (cont.)• FIS from Fuzzy Clustering
• Clustering data in:• Input-Output feature space,
• Output space (induce clusters in inputs).• Obtain membership functions by:
• Projection onto variables,• Parametrization of the membership function.
• Extract one rule per cluster,• Usually, FCM + Mamdani FIS is used.
• Input and Output spaces separately,
Spring 2008 Sharif University of Technology 48
FC applications (cont.)• FIS from Fuzzy Clustering (cont.)
Spring 2008 Sharif University of Technology 49
FC applications (cont.)• FIS from Fuzzy Clustering (cont.)
Spring 2008 Sharif University of Technology 50
FC applications (cont.)• The same idea for Ruspini’s Butterfly
m=1.25
m=2
Spring 2008 Sharif University of Technology 51
FC applications (cont.)• Biomedical applications
• Tumor detection and extraction (cancer, mamograpgym, ...).
• Image segmentation (MRI images, Cephalic radiograohy, ...).
Spring 2008 Sharif University of Technology 52
FC applications (cont.)• Tumor detection
Crisp
Adaptive methods
Fuzzy methods
Spring 2008 Sharif University of Technology 53
Fuzzy Clustering
• Introduction• Problem Statement• Fuzzy Clustering Algorithms• Fuzzy Clustering Applications
Spring 2008 Sharif University of Technology 54
Conclusion• In summary
• The ability to cluster data (concepts, perceptions, etc.) is an essential feature of human intelligence.
groups based on the similarity amongst patterns.• The result of clustering is a set of clusters, cluster centers,
and a matrix containig the membership degrees.• FCM results in spherical clusters, but confusing in equally
distant data objects.• PCM doen not have the normal constraint, but suffers
from coincidence or cluster repulsion.
• The main idea of FC is to partition data into overlaping
Spring 2008 Sharif University of Technology 55
Conclusion (cont.)• Summary (cont.)
• KG uses the covariance matrix, hence it yields ellipsoidal clusters.
• The algorithms incorporate a fuzziness exponent which determines the intention of the algorithm towards fuzzy.
Spring 2008 Sharif University of Technology 56
Conclusion (cont.)• Summary (cont.)
• FCM is widely used as an initializer for other clustering methods.
function generation to model fuzzy rule bases and inference systems.
• Compared to the classical (Crisp) clustering, FC methods show more efficiency in many applications.
• FC methods are widely used in Fuzzy membership
Spring 2008 Sharif University of Technology 57
Discussion• Related issues
• Number of clusters (c):• Yang Shanlin & Malay proved that: c ≤ n0.5
• Elbow criterion: Define a validity measure, and evaluate it using different number of clusters to find an optimum point (elbow), where adding another cluster shouldn’t add sufficient information.
Spring 2008 Sharif University of Technology 58
Discussion (cont.)• Related issues (cont.)
Spring 2008 Sharif University of Technology 59
Discussion (cont.)• Shape of membership function
• Semantically fuzzy sets are required to be convex, monotonous and with limited support.
• Does PCM lead to convex membership functions?• We should choose another cluster estimation to have
proper clusters with flexibility to choose the membership functions support.
• Does FCM not support the above conditions?
Spring 2008 Sharif University of Technology 60
Discussion (cont.)• Shape of membership function (cont.)
• A typical approach: Triangular Fuzzy Membership Functions.
FCM
Spring 2008 Sharif University of Technology 61
References• Journal papers:
• C. Dring, M.J. Lesot, and R. Kruse, “Data Analysis with Fuzzy Clustering Methods”, Comp. statistics & data analysis, 2006.
• A. Baraldi, and P. Blonda, “A Survey of Fuzzy Clustering Algorithms for Pattern Recognition—Part I and II”, IEEE Trans. Systems, Man and Cybernetics, vol. 29, no. 6, 1999.
• J.C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press, New York, 1981.
• A. K.Jain, M. N. Murtyand, and P. J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, vol. 31, no. 3, 1999.
• Thesis:
• A. I. Shihab, “Fuzzy Clustering Algorithms and Their Application to Medical Image Analysis”, PhD. Thesis, University of London, 2000.