Date post: | 12-Jan-2017 |
Category: |
Engineering |
Upload: | maede-maftouni |
View: | 603 times |
Download: | 5 times |
Outline
FCM, PCM
Hard C-Means
Fuzzy C-Means Clustering
(FCM)
Possibilistic C-Means
Clustering (PCM)
Comparison of FCM, PCM
Example and Results
About Clustering
What is data clustering?
Unsupervised Learning
Similarity Measures
Quality Measures
SOM (Self Organizing Map)
Topology
Learning algorithm
Example
Application
What is data clustering?
Page 3 of 60
Data clustering is the process of identifying natural groupings or clusters within unlabelled data based on some similarity measure [Jain et al. 2000].
Unsupervised learning
How could we know what constitutes “different” clusters?
• Green Apple and Banana Example.
• two features: shape and color.
Page 4 of 60
Similarity Measures
Euclidean distance
Manhattan distance (city block)
Cosine similarity (vector dot product)
Mahalanobis distance
Page 7 of 60
Similarity Measures
Euclidean distance
Manhattan distance (city block)
Cosine similarity (vector dot product)
Mahalanobis distance
Page 8 of 60
Similarity Measures
Euclidean distance
Manhattan distance (city block)
Cosine similarity (vector dot product)
Mahalanobis distance
Page 9 of 60
Similarity Measures
Euclidean distance
Manhattan distance (city block)
Cosine similarity (vector dot product)
Mahalanobis distance
Page 10 of 60
Types of Clustering Methods
Partitioning clustering
o SOM clustering
o K-means clustering
o K-medoids clustering
o Fuzzy c-means clustering
o Evolutionary-based clustering
Hierarchical clustering
o Agglomerative clustering
o Divisive clustering
Density based clustering
Grid methods
Page 11 of 60
Cluster Quality Measures
Compactness measure
Separation measure
Combined measure
Number of samples in cluster, h
j-th sample in cluster, h
Center of cluster, h
Page 12 of 60
Cluster Quality Measures
Compactness measure
Separation measure
Combined measure Center of cluster, h
Center of cluster, h’
Page 13 of 60
Topology
The Kohonen Self-Organizing Network (KSON) belongs to the class of unsupervised learning networks.
Nodes distribute themselves across the input space to recognize groups of similar input vectors.
In other words, the nodes of the KSON can recognize groups of similar input vectors.
Page 16 of 60
Topology
This process is known as competitive learning.
It is based on the competitive learning technique also known as the winner take all strategy.
SOM is a technique which of data through the use of self-organizing neural networks reduce the dimensions
Page 17 of 60
Example
A Kohonen self-organizing map is used to cluster four vectors given by:
(1,1,0,0)
(0,0,0,1)
(1,1,0,0)
(0,0,1,1)
The maximum numbers of clusters to be formed is m = 3.
Page 26 of 60
Example
Suppose the learning rate (geometric decreasing) is given by:
α(0)=0.3
α(t+1)=0.2α(t)
With only three clusters available and the weights of only one cluster are updated at each step (i.e., Nc = 0), find the weight matrix. Use one single epoch of training.
Page 27 of 60
Step 1
Step 1:
The initial weight matrix is:
0.2 0.4 0.1
W= 0.3 0.2 0.2
0.5 0.3 0.5
0.1 0.1 0.1
Initial radius: Nc = 0
Initial learning rate: α(0)=0.3
Page 28 of 60
Step 2,3 (Pattern 1)
Step 2: For the first input vector (1,1,1,0), do steps 3 – 5
Step 3:
The input vector is closest to output node 1. Thus node 1 is the winner. The weights for node 1 should be updated.
Page 29 of 60
Step 2,3 (Pattern 2)
Step 2: For the second input vector (0,0,0,1), do steps 3 – 5
Step 3:
The input vector is closest to output node 2. Thus node 2 is the winner. The weights for node 2 should be updated.
Page 31 of 60
Step 2,3 (Pattern 3)
Step 2: For the third input vector (1,1,0,0), do steps 3 – 5
Step 3:
The input vector is closest to output node 1. Thus node 1 is
the winner. The weights for node 1 should be updated.
Page 33 of 60
Step 2,3 (Pattern 4)
Step 2: For the forth input vector (0,0,1,1), do steps 3 – 5
Step 3:
The input vector is closest to output node 3. Thus node 3 is
the winner. The weights for node 3 should be updated.
Page 35 of 60
Step 5
Epoch 1 is complete.
Reduce the learning rate:
α(t+1)=0.2α(t)=0.2(0.3)=0.06
Repeat from the start for new epochs until Δwj becomes steady for all input patterns or the error is within a tolerable range.
Page 37 of 60
Applications
based on the competitive learning rule, KSONs have been used extensively for clustering applications such as:
Character recognition, Speech recognition, Vector coding, Robotics applications, and Texture segmentation.
Page 38 of 60
Character Recognition (Example)
21 input patterns, 7 letters from 3 different fonts
25 cluster units are available, which means that a maximum of 25 clusters may be formed.
Page 39 of 60
Example (cont.)
No topological structure
only the winning unit is allowed to learn the pattern presented, the 21 patterns form 5 clusters:
UNIT PATTERNS
3 C1, C2, C3
13 B1, B3, D1, D3, E1, K1, K3
16 A1, A2, A3
18 J1, J2, J3
24 B2, D2, E2, K2
Page 40 of 60
Example (cont.)
A linear structure (with R = 1)
The winning node J and its topological neighbors (J+1 and J-1) are allowed to learn on each iteration.
UNIT PATTERNS UNIT PATTERNS
6 K2 20 C1, C2, C3
10 J1, J2, J3 22 D2
14 E1, E3 23 B2, E2
16 K1, K3 25 A1, A2, A3
18 B1, B3, D1, D3
Page 41 of 60
Example (cont.)
Diamond structure
Each cluster unit is indexed by two subscripts.
If unit Xij is the winning unit, the units Xi+1,j; Xi-1,j Xi,j+1, and Xi,j-1 also learn.
i\j 1 2 3 4 5
1 J1, J2, J3 D2
2 C1, C2, C3 D1, D3 B2, E2
3 B1 K2
4 E1, E3, B3 A3
5 K1, K3 A1, A2
Page 42 of 60
Hard C-Means Clustering
{0,1}
The k-means algorithm is an algorithm to cluster n objects based on attributes into k disjoint partitions, where k < n .
Objective function depends on the cluster centers c and the assignment of data points to clusters U.
Constraint 1: ensure that each data point is assigned exactly to one cluster
Constraint 2: ensure that no cluster left empty
Page 43 of 60
Hard C-Means Clustering
The iterative optimization scheme works as follows: At first initial cluster numbers & centers are chosen Then each data point is assigned to its closest cluster center:
Then the data partition U is held fixed and new cluster centers are computed:
The last two steps, are iterated until no change in C or U can be observed
Page 44 of 60
Fuzzy C-Means Clustering
In contrast with Hard C-Means, Fuzzy cluster allows gradual memberships of data points to clusters.
Constraint 1: guarantees that no cluster is empty
Constraint 2: ensures that the sum of the membership degrees for each datum equals 1. This means that each datum receives the same weight in comparison to all other data.
Page 46 of 60
Fuzzy C-Means Clustering
First the membership degrees are optimized for fixed cluster parameters, then the cluster prototypes are optimized for fixed membership degrees, according to the following update formulas:
The parameter m, m > 1, is called the fuzzifier
or weighting exponent. It determines the
‘fuzziness’ of the classification.
Page 47 of 60
Possibililstic C-Means Clustering
Noise
The normalization of memberships in FCM, can lead to undesired effects in the presence of noise and outliers.
The fixed data point weight may result in high membership of noises to clusters.
By dropping the normalization constraint in the Possibililstic C-Means Clustering one tries to achieve a more intuitive assignment of degrees of membership and to avoid undesirable normalization effects.
Page 50 of 60
Possibililstic C-Means Clustering
,
Constraint 1: guarantees that no cluster is empty
The 𝑢𝑖𝑗 ∈ 0,1 interpreted as the degree of representativity or
typicality of the datum 𝑥𝑗 to cluster г𝑖.
𝐽𝑓is modified to:
The second term suppresses the trivial solution since this sum rewards high memberships (close to 1).
Page 51 of 60
Possibililstic C-Means Clustering
The formula for updating the membership degrees that is derived from Jp by setting its derivative to zero is
Considering the case m=2 and substituting η𝑖 for 𝑑𝑖𝑗2yields 𝑢𝑖𝑗=0.5. Therefore η𝑖 is a
parameter that determines the distance to the cluster i at which the membership degree should be 0.5.
Depending on the cluster’s shape the η𝑖 have different geometrical interpretation and can be set to the desired value
Page 52 of 60
Possibililstic C-Means Clustering
However, the information on the actual shape property described by η𝑖 is often unknown. In that case these parameters are estimated by the following formula:
The update equations for the cluster prototypes in the possibilistic algorithms must be identical to their probabilistic counterparts due to the fact that the second, additional term in Jp vanishes in the derivative for fixed (constant) memberships 𝑢𝑖𝑗 .
Page 53 of 60
Possibililstic C-Means Clustering
The interpretation of m is different in the FCM and
the PCM. In the FCM, increasing values of m
represent increased sharing of points among all
clusters, whereas in the PCM, it represent increased
possibility of all points in the data set completely
belonging to a given cluster. Thus, the value of m
that gives us satisfactory performance is different in
the two algorithms
A plot of the PCM membership function for various values of the m
Page 54 of 60
Possibililstic C-Means Clustering - Examples
Original image Cluster1
Cluster2 Cluster3
Cluster1
Cluster2
Cluster3
Page 55 of 60
Possibililstic C-Means Clustering - Examples
Original image Cluster1
Cluster2 Cluster3
Page 56 of 60
Example and Results- using FCM
Original image Cluster1
Cluster2 Cluster3
Cluster1
Cluster2 Cluster3
Noisy image
Page 57 of 60
Original image Cluster1
Cluster2 Cluster3
Cluster1
Cluster2 Cluster3
Example and Results- using PCM
Page 58 of 60
Noisy image
[1] M. Zarinbal, “Designing a fuzzy expert system for diagnosing the brain toumors,” Amirkabir University of Technology, 2009.
[2] M. Egmont-Petersen, D. de Ridder, and H. Handels, “Image processing with neural networks—a review,” Pattern Recognit., vol. 35, no. 10, pp. 2279–2301, 2002.
[3] I. Bankman, Handbook of medical image processing and analysis. academic press, 2008.
[4] R. Archibald, K. Chen, A. Gelb, and R. Renaut, “Improving tissue segmentation of human brain MRI through preprocessing by the Gegenbauer reconstruction
method,” Neuroimage, vol. 20, no. 1, pp. 489–502, 2003.
[5] B. E. Chapman, J. O. Stapelton, and D. L. Parker, “Intracranial vessel segmentation from time-of-flight MRA using pre-processing of the MIP Z-buffer: accuracy
of the ZBS algorithm,” Med. Image Anal., vol. 8, no. 2, pp. 113–126, 2004.
[6] A. Candolfi, R. De Maesschalck, D. Jouan-Rimbaud, P. A. Hailey, and D. L. Massart, “The influence of data pre-processing in the pattern recognition of
excipients near-infrared spectra,” J. Pharm. Biomed. Anal., vol. 21, no. 1, pp. 115–132, 1999.
[7] N. J. Pizzi, “Fuzzy pre-processing of gold standards as applied to biomedical spectra classification,” Artif. Intell. Med., vol. 16, no. 2, pp. 171–182, 1999.
[8] D. Van De Ville, M. Nachtegael, D. Van der Weken, E. E. Kerre, W. Philips, and I. Lemahieu, “Noise reduction by fuzzy image filtering,” Fuzzy Syst. IEEE
Trans., vol. 11, no. 4, pp. 429–436, 2003.
[9] F. Di Martino, “An image coding/decoding method based on direct and inverse fuzzy transforms,” Int. J. Approx. Reason., pp. 110–131, 2008.
References
[10] M. and B. l. S. Sezgin, “Survey over image thresholding techniques and quantitative performance evaluation,” J. Electron. Imaging, pp. 146–165, 2004.
[11] H.-D. Cheng and H. Xu, “A novel fuzzy logic approach to contrast enhancement,” Pattern Recognit., vol. 33, no. 5, pp. 809–819, 2000.
[12] J. C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and algorithms for pattern recognition and image processing, vol. 4. Springer, 2005.
[13] L. Cinque, G. Foresti, and L. Lombardi, “A clustering fuzzy approach for image segmentation,” Pattern Recognit., vol. 37, no. 9, pp. 1797–1807, 2004.
[14] J. V. de Oliveira and W. Pedrycz, Advances in fuzzy clustering and its applications. Wiley Online Library, 2007.
[15] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” J. Intell. Inf. Syst., vol. 17, no. 2–3, pp. 107–145, 2001.
[16] P. H. N. Mladenovic, Belacel, N., “FuzzyJ-Means: a new heuristic for fuzzyclustering,” Pattern Recognit., pp. 2193–2200, 2002.
[17] K. P. Detroja, R. D. Gudi, and S. C. Patwardhan, “A possibilistic clustering approach to novel fault detection and isolation,” J. Process Control, vol. 16, no. 10,
pp. 1055–1073, 2006.
[18] A. Flores-Sintas, J. Cadenas, and F. Martin, “A local geometrical properties application to fuzzy clustering,” Fuzzy Sets Syst., vol. 100, no. 1, pp. 245–256,
1998.
[19] R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering,” Fuzzy Systems, IEEE Transactions on, vol. 1, no. 2. pp. 98–110, 1993.
[20] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999.
[21] M. H. F. Zarandi, M. Zarinbal, and I. B. Türksen, “Type-II Fuzzy Possibilistic C-Mean Clustering.,” in IFSA/EUSFLAT Conf., 2009, pp. 30–3
References