Grey relational analysis based approach for dataclustering
K.-C. Chang and M.-F. Yeh
Abstract: This paper generalises the concept of grey relational analysis to develop a technique,called grey relational pattern analysis, for analysing the similarity between given patterns. Based onthis technique, a clustering algorithm is proposed for finding cluster centres of a given data set. Thisapproach can be categorised as an unsupervised clustering algorithm because it does not needpredetermination of appropriate cluster centres in the initialisation. The problem of determining theoptimal number of clusters and optimal locations of cluster centres is also considered. Finally,the approach is used to solve several data clustering problems as examples. In each example, theperformance of the proposed algorithm is compared with other well-known algorithms such asthe fuzzy c-means method and the hard c-means method. Simulation results demonstrate theeffectiveness and feasibility of the proposed method.
1 Introduction
Grey system theory, initiated by Deng [1, 2], can performgrey relational analysis for sequences. For a given referencesequence and a given set of comparative sequences, greyrelational analysis can be used to determine the relationalgrade between the reference and each element in the givenset. Then the best comparative one can be found by furtheranalysing the resultant relational grades. In other words,grey relational analysis can be viewed as a measure ofsimilarity for finite sequences. The method of greyrelational analysis has been successfully applied on clusteranalysis [3–8], robot path planning [9], multiple criteriadecision-making [10], and so on.
Cluster analysis is a basic tool for finding the underlyingstructure of a given data set that is one of the most fundamentalissues in pattern recognition. The primary objective of clusteranalysis is to partition a given data set of multidimensionalpatterns into a number of subgroups (clusters), where the objectsinside a cluster show a certain degree of similarity. Overviewsof clustering algorithms can be found in the studies [11, 12]. Inrecent work [13], clustering algorithms are categorised into twoconceptually different families, namely, input clustering andinput–output clustering. Input clustering algorithm depends onan analysis of the input training patterns, completely ignoringinformation about dependent output variables. Importantexamples of input clustering algorithm are the hard c-means[14] and the fuzzy c-means [11] methods. Some neuralnetworks such as self-organising feature maps [3, 8, 15–17]are also examples of input clustering algorithm. On the otherhand, input–output clustering algorithms incorporate outputvariables, and can be performed in alternating cluster estimation[18] and conditional fuzzy clustering algorithms [19].
The performance of most clustering algorithms is greatlyinfluenced by the number of cluster centroids, the selectionof initial cluster centroids, and the geometrical properties(e.g. shapes, densities and distributions) of data [20]. Thenumber of cluster centroids cannot always be defined inadvance. Therefore, a cluster validity criterion has to bedefined to determine an optimal number of clusters in a dataset [11, 21–23]. One can make initial assumptions and usethe mountain clustering method [24] to obtain initial valuesof cluster centroids. To sum up, cluster exploring is veryexperiment-oriented in the sense that clustering algorithmsthat can deal with all situations are not yet available.
While it is easy to consider the idea of a data cluster on arather informal basis, it is very difficult to give a formal anduniversal definition of a cluster. To mathematically identifyclusters of a data set, it is usually necessary to first define ameasure of similarity and then establish a rule for assigningpatterns to the domain of a particular cluster center. Greyrelational analysis is a useful tool for measuring the degreeof similarity between sequences as mentioned above, but itfails to analyse the relations between patterns. In order toanalyse the pattern relation, it has to be modified. Based ongrey relational analysis, this paper proposes a so-called greyrelational pattern analysis for determining the degree ofsimilarity between patterns while maintaining their geo-metric features in the analysis process. The resultant degreeof similarity is called the grey relational pattern grade. Itsrange is within a normalised interval [0, 1], and it guaranteesthat the smaller the Euclidean distance between twopatterns, the larger the grey relational pattern grade.
Together with grey relational pattern analysis and theconcept of input clustering algorithm, we develop a so-calledgrey clustering algorithm and then demonstrate its effec-tiveness and feasibility in the classification problem usingseveral examples. The proposed method can be categorisedas an unsupervised algorithm which estimates cluster centreswithout predetermining them in the initialisation. Like otherunsupervised clustering algorithms [3, 4, 22, 25–26], greyrelational analysis based clustering algorithm will not sufferfrom the choice of the initial values of the cluster centers orthe number of clusters. The performance index given in [12]is used to determine the optimal number of clusters and theoptimal locations of cluster centers.
q IEE, 2005
IEE Proceedings online no. 20041209
doi: 10.1049/ip-vis:20041209
The authors are with the Department of Electrical Engineering, LunghwaUniversity of Science and Technology, Taoyuan, Taiwan 33306
E-mail: [email protected]
Paper first received 13th January and in revised form 23rd September 2004.Originally published online 4th March 2005
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 165
2 Preliminaries
This section gives a brief discussion of grey relationalanalysis. Grey relational analysis is a tool for analysing therelationships between one major (reference) sequence andthe other comparative ones in a given set [1, 2]. Let Sr ¼fsr1; sr2; . . . ; srng denote a collection of n ðn � 1Þ referencesequences. The element in Sr is of the form sri ¼srið1Þ; srið2Þ; . . . ; sriðpÞh i for a finite positive integer p.
Similarly, let Sc ¼ fsc1; sc2; . . . ; scmg denote a collection ofm ðm � 1Þ comparative sequences and each element in Sc isalso of the form scj ¼ scjð1Þ; scjð2Þ; . . . ; scjð pÞ
� �: The
objective of grey relational analysis is to find the mostsimilar sequence from Sc to a specified reference sequencein Sr: Then the grey relational coefficient between thespecified reference sequence sri 2 Sr; i 2 f1; 2; . . . ; ng andall comparative sequences scj 2 Sc; j ¼ 1; 2; . . . ;m; at thekth datum can be defined as
gðsriðkÞ; scjðkÞÞ ¼Dmin þ � � Dmax
DijðkÞ þ � � Dmax
; k ¼ 1; 2; . . . ; p ð1Þ
where DijðkÞ ¼ jsriðkÞ scjðkÞj; Dmax ¼ maxj maxk DijðkÞ;Dmin ¼ minj mink DijðkÞ; for j ¼ 1; 2; . . . ;m; and � 2 ð0; 1�:The factor � in (1) controls the resolution between Dmax andDmin: Once the grey relational coefficients are all deter-mined, their weighted average, termed by the grey relationalgrade, can be computed by
rðsri; scjÞ ¼Xp
k¼1
½aiðkÞ � gðsriðkÞ; scjðkÞÞ� ð2Þ
where aiðkÞ is the weighting factor of the grey relationalcoefficient gðsriðkÞ; scjðkÞÞ and
Ppk¼1 aiðkÞ ¼ 1 for the
reference sequence sri and the comparative sequences scj;j ¼ 1; 2; . . . ;m: In general, we can choose aiðkÞ ¼ 1=p forall k. From (1) and (2), we have 0< gðsriðkÞ; scjðkÞÞ � 1and 0< rðsri; scjÞ � 1 for any i 2 f1; 2; . . . ; ng and j ¼1; 2; . . . ;m:
Some important properties of grey relational analysis canbe summarised below. Before we state the properties, weneed to introduce a new notation, rðscj; sriÞ: By the notationrðscj; sriÞ; we mean that the grey relational grade is nowcomputed with a reference scj and all of the sequences inSc ¼ fsri; sclg; l ¼ 1; 2; . . . ;m; l – j; i.e. a comparativesequence scj is picked out from original Sc as a newreference, and the other comparative sequences togetherwith the original reference sri form a new comparativesequence set Sc ¼ fsri; sclg:
2.1 Properties of grey relational analysis
For any reference sequence sri 2 Sr; i 2 f1; 2; . . . ; ng; andall comparative sequences scj 2 Sc; j ¼ 1; 2; . . . ;m;grey relational analysis satisfies the following fourproperties [1, 2]:
(P1) Norm interval: (i) 0< rðsri; scjÞ � 1:(ii) rðsri; scjÞ ¼ 1 if sri ¼ scj:
(P2) Dual symmetry: If m ¼ 1; then rðsri; scjÞ ¼ rðscj; sriÞ:(P3) Wholeness: If m � 2, then rðsri; scjÞ –
often
rðscj; sriÞ;where the meaning of the symbol ‘ –
often
’ is ‘be often unequalto’.(P4) Approachability: The smaller the absolute value DijðkÞ;the larger the grey relational coefficient gðsriðkÞ; scjðkÞÞ, andvice versa.
An alternative grey relational coefficient proposed by Wongand Lai [27] is defined as
~ggðsriðkÞ; scjðkÞÞ ¼Dmax DijðkÞDmax Dmin
� �z
ð3Þ
where z 2 ð0;1Þ denotes the distinguishing coefficient. It isshown that ~ggðsriðkÞ; scjðkÞÞ 2 ½0; 1�: The range of thisalternative form is within a closed interval, and it can beused to solve an uneven distributing problem caused byusing (1). The corresponding grey relational grade ~rrðsri; scjÞis also a weighted average of ~ggðsriðkÞ; scjðkÞÞ0s:
3 Grey relational pattern analysis
Grey relational analysis, introduced in the above section,can be simply viewed as it measures every distancedifference between elements with the same index of twosequences and uses an average of the distance differences torepresent the similarity of the two sequences. This conceptcan be used for the analysis of the pattern relation. In orderto analyse the pattern relation in a similar manner, greyrelational analysis has to be modified such that geometricfeatures can be maintained in the analysis process.
Let Xr ¼ fxr1; xr2; . . . ; xrng denote a collection ofn ðn � 1Þ reference patterns and Xc ¼ fxc1; xc2; . . . ; xcmgdenote a collection of m ðm � 2Þ comparative patterns. Theelement in Xr is of the form xri ¼ ðxrið1Þ; xrið2Þ; . . . ; xrið pÞÞand the element in Xc is of the form xcj ¼ ðxcjð1Þ; xcjð2Þ;. . . ; xcjð pÞÞ for a finite positive integer p. Note that the leastnumber of the comparative patterns, m, is two, but the leastnumber of the comparative sequences in grey relationalanalysis is one. Again, we can pick a specific patternxri from Xr as the reference, and all elements in Xc; xcj;j ¼ 1; 2; . . . ;m; are the comparative patterns. Denote theEuclidean distance between two patterns, xri and xcj; by
dij ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXp
k¼1ðxriðkÞ xcjðkÞÞ2
qð4Þ
Let dmax ¼ maxjðdijÞ and dmin ¼ minjðdijÞ; 8j ¼ 1; 2; . . . ;m:
A trivial case is that dmax ¼ dmin: In this case, it is simply acircle centred at xri with all comparative patterns, xcj; on thecircle. Hence, the patterns in the trivial case can be made acluster alone, and we do not need to analyse them anyfurther. Hereafter, we assume that dmax – dmin: Then thegrey relational pattern grade can be defined as
vðxri; xcjÞ ¼dmax dij
dmax dmin
� �z
ð5Þ
where z 2 ð0;1Þ denotes a distinguishing coefficient.From (4), we see that the Euclidean distance dij counts
every component difference, xriðkÞ xcjðkÞ; between twoconsidered patterns. Therefore, the computation of the greyrelational pattern ‘coefficient’ becomes unnecessary. This isthe main difference between grey relational analysis andgrey relational pattern analysis.
For a specified xri; we see that dmin � dij � dmax: Itfollows that
0 �dmax dij
dmax dmin
� 1
and then vðxri; xcjÞ 2 ½0; 1�; 8z 2 ð0;1Þ: Moreover,vðxri; xcjÞ approaches one as dij is near to dmin; andapproaches zero as dij is near to dmax: Hence, the greyrelational pattern grade vðxri; xcjÞ can be used to measure thedegree of similarity between the reference and comparativepatterns in the sense that the smallest Euclidean distancerepresents the largest similarity. Indeed, it maps a Euclideandistance to a normalised measurement of similarity. Thus, no
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005166
matter how large the Euclidean distance, the range of thegrey relational pattern grade is a closed interval [0, 1]. Thedistinguishing coefficient in (5) affects only the ‘magnitude’of the grey relational pattern grade, but does not change therelative relationships between the comparative patterns.Therefore, the selection of the distinguishing coefficientdepends on the numerical considerations in programming.For example, if the difference between dmin and dmax is verysmall for a given data set, a nature choice of z is to selectz� 1 to make resultant grey relational pattern grades moredistinguishable.
The properties of grey relational pattern analysis can besummarised as follows. The notation vðxcj; xriÞ in the follo-wing is defined analogously as rðscj; sriÞ in the above section.
3.1 Properties of grey relational patternanalysis
For any reference pattern xri 2 Xr; i 2 f1; 2; . . . ; ng andall comparative patterns xcj 2 Xc; j ¼ 1; 2; . . . ;m; greyrelational pattern analysis satisfies the following fourproperties:
(B1) Norm interval: (i) vðxri; xcjÞ 2 ½0; 1�:(ii) vðxri; xcjÞ ¼ 1 if xri ¼ xcj:
(B2) Dual symmetry: If m ¼ 2; then vðxri; xcjÞ ¼ vðxcj; xriÞ:(B3) Wholeness: If m � 3; then vðxri; xcjÞ –
often
vðxcj; xriÞ:(B4) Approachability: The smaller the Euclidean distancedij, the larger the relational grade vðxri; xcjÞ; and vice versa.
Proof: Part of the proof is already shown in the above. Therest is straightforward and is omitted. A
4 Unsupervised clustering algorithm
For a given data set X, the objective of cluster analysis is toanalyse the similarity between data and then partition thegiven data set into a number of clusters, where the objectsinside a cluster show a certain degree of similarity. To do so,we can choose Xr ¼ Xc ¼ X and use grey relational patternanalysis developed in Section 3 to analyse the similaritybetween patterns in Xr and Xc: In order to finish thepartition, we use a neurocomputing technique together withgrey relational pattern analysis to develop an unsupervisedinput clustering algorithm below. This algorithm is thencalled grey clustering algorithm.
4.1 Grey clustering algorithm
Step 1: Define a temporary setDefine a temporary set C ¼ fc1; c2; . . . ; cqg: A possibletemporary set is the considered data set C ¼ X: In this case,q ¼ m and ci ¼ xi: The temporary set is used as a data set inthe learning process, and elements in C are updated after acomplete learning iteration.Step 2: Set the initial value of the thresholdPick a real number from the interval [0, 1] as the initialthreshold e. The initial threshold strongly affects the finalclustering results (i.e. the number of clusters and thelocation of cluster centres), and will be discussed later.Step 3: Initialise a learning processLet Xr ¼ Xc ¼ C; and set i ¼ 1:Step 4: Measure the grey relational pattern gradesMake the corresponding reference xri 2 Xr the trainingpattern for this iteration. Then the grey relational patterngrades between the reference xri and all the patterns in Xc;denoted by vðxri; xcjÞ; j ¼ 1; 2; . . . ; q; can be determinedby (5).
Step 5: Update the active patternFor a comparative pattern xcj; if the grey relational patterngrade satisfies vðxri; xcjÞ � e; then the correspondingelement in C is called a significant pattern. Among thesesignificant patterns, an active pattern cl is defined such thatits corresponding comparative pattern has the largest greyrelational pattern grade,
vðxri; clÞ ¼ maxj2f1;2;...;qg
vðxri; xcjÞ ð6Þ
Suppose that there are N significant patterns, denoted byfc�1; c�2; . . . ; c�Ng: Then the active pattern cl is updated by
cl ¼XN
k¼1wk � c�k
XN
k¼1wk ð7Þ
where wk is the weighting factor of the significant pattern c�k :In general, we can select it as wk ¼ 1=N for all k: Anotherchoice is to let wk ¼ vðxri; c�kÞ to make the new location ofthe active pattern closer to the patterns with larger relationalgrade. If there are more than one active pattern, all of themare simultaneously updated by (7).
If every training data is presented, i.e. i ¼ q; go to Step 6.Otherwise, increase i by 1 and then go to Step 4.Step 6: Check the resultIf the temporary set remains the same after updating, i.e.convergence of C is achieved, then stop the learning processand go to Step 8 to determine the cluster centres. Otherwise,go to the next step.Step 7: Increase the thresholdIncrease e; and go to step 3 with the updated set C foranother learning iteration. The threshold e may increaselinearly or exponentially until e ¼ 1:Step 8: Determine the clusters and the cluster centresAfter convergence of C is achieved, some of the elements inC may be equal. Partition the last updated set C into severaldisjoint subsets, where all the elements in the same subsetare identical (i.e. multiple copies). The number of subsets isthe number of clusters. For the elements in a subset of C,their corresponding patterns in the original data set X can bepartitioned as a cluster, and the cluster centre is the elementin that subset.
In order to group the patterns with high relational grades as acluster, we select N significant patterns that are highlysimilar to the reference and then average them as the newlocation of the active pattern in Step 5. Step 5 also revealsthat the patterns with low relational grades remainunchanged after being trained. Increasing the threshold inStep 7 is very similar to the idea of shrinking theneighbourhood size in the self-organising feature maps[17]. As the threshold increases, those significant patternsmove towards a convergent state. The final results are thedesired cluster centres. Because the whole procedure reliesonly on the basis of the underlying structure of data set, theproposed method is an unsupervised clustering approach.
In Step 5, we see that selection of the significant patternsdepends on the value of the threshold. In fact, the initialvalue of the threshold we choose in Step 2 stronglyinfluences the final classification results of our method.Generally speaking, different initial thresholds may yielddifferent results in the data clustering problem. Thesensitivity of the initial threshold parameter will be shownby examples in the next section. The grey clusteringalgorithm proposed above is used to partition a data set for afixed initial threshold. We may use an extra procedure in thegrey clustering algorithm to get results for different initialthresholds within [0, 1]. Among those results, we need acriterion to help us to find an appropriate clustering result.Define a performance index as [12].
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 167
PI ¼Ph
d¼1 mdsdPhd¼1 md
ð8Þ
where h is the number of clusters, md; d ¼ 1; 2; . . . ; h;denotes the number of patterns which belong to the dthcluster and the value sd is calculated by
sd ¼minj; j–d kod ojk2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPmd
t¼1 kxdt odk=p
md
ð9Þ
where od represents the cluster centre of the dth cluster andxdt; t ¼ 1; 2; . . . ;md; denotes the patterns which belong to thedth cluster. This index reflects the objective of data clusteringin the sense that the patterns in a cluster should be close to thecentre, and cluster centres should be separated far away fromeach other. From (9), we can see that the larger theperformance index, the better the clustering result [28].
Now, we give the last step of grey clustering algorithm todetermine an optimal result in the sense that the resultmaximises the performance index (8).Step 9: Determine the optimal resultCalculate the performance index (8) for each result obtainedby using different initial thresholds in Step 8. The optimalclustering result is the one that maximises the performanceindex.
Instead of (8), different performance indexes may also beused in Step 9. For instance, if the mean of the distancesbetween cluster centre and patterns in the cluster is the onlyobjective under consideration, Step 9 should be modified tosearch the result that minimises the alternative performanceindex.
5 Simulation results
The purpose of this Section is to show the performance andeffectiveness of our method using three examples withvarious data sets. In each example, we compare the resultobtained using the grey clustering algorithm with thatobtained using the fuzzy c-means method [11] and the hardc-means method [14], in which c is the number of clusters tobe predetermined. In each case, the initial threshold startsfrom 0.1 and increases by 0.01 for an optimal solution.In the first learning iteration of the grey clustering algorithm,
the threshold in Step 5 is equal to the initial threshold. Then,as stated in Step 7, the threshold increases also by 0.01 for thenext learning iteration for a clustering result. In the followingexamples, for convenience, figures showing the data sets italso show the clustering results. For example, the points inFig. 1 show the geometric distribution of the data setconsidered in Example 1, and the number beside the pointdenotes the clustering result using the proposed method.
Example 1: This example uses a simple data set, denoted S1;listed in the first column of Table 1 and shown in Fig. 1.To perform the proposed clustering procedure we selectthe temporary set as the entire data set, C ¼ S1; andthe weighting factor in Step 4 as wk ¼ 1=N where N is thenumber of significant patterns. Figure 2 shows thatthe performance index is influenced by the initial thresholdparameter. The maximum performance index occurs whenthe initial threshold is within [0.47, 0.63]. Figure 3 illustratesthat no matter what the initial threshold value is; the proposedclustering method converges after a finite number of learningiterations.
With z ¼ 1:0 and initial threshold 0.6, we see that thegrey clustering algorithm stops at the end of the firstlearning iteration with the maximal performance index.
Fig. 1 Data set and classification result of proposed method(Example 1) Fig. 2 Plot of performance index vs initial threshold (Example 1)
Table 1: Data set considered and updated patterns forExample 1
k xk (Original pattern) ck (First iteration)
1 (0.3293, 0.2293) (0.4000, 0.3000)
2 (0.3293, 0.3707) (0.4000, 0.3000)
3 (0.4000, 0.3000) (0.4000, 0.3000)
4 (0.4707, 0.2293) (0.4000, 0.3000)
5 (0.4707, 0.3707) (0.4000, 0.3000)
6 (0.6134, 0.6500) (0.7000, 0.7000)
7 (0.7866, 0.6500) (0.7000, 0.7000)
8 (0.7000, 0.7000) (0.7000, 0.7000)
9 (0.7000, 0.8000) (0.7000, 0.7000)
10 (0.2000, 0.9000) (0.2000, 0.8000)
11 (0.3000, 0.8000) (0.2000, 0.8000)
12 (0.2000, 0.8000) (0.2000, 0.8000)
13 (0.2000, 0.7000) (0.2000, 0.8000)
14 (0.1000, 0.8000) (0.2000, 0.8000)
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005168
The numerical results of the updated patterns at the end oflearning are listed in the second column of Table 1. It isshown that the data set can be partitioned into three clustersand the estimated cluster centres are c1 ¼ ð0:4; 0:3Þ; c2 ¼ð0:7; 0:7Þ; and c3 ¼ ð0:2; 0:8Þ; which are consistent with theactual cluster centre of the data set. The estimated clustercentres and the classification results are also illustratedin Fig. 1 where these cluster centres are marked by thesymbol ‘�’.
Table 2 compares the clustering results obtained by theproposed clustering method, hard 3-means algorithm, andfuzzy 3-means algorithm. It is shown that all methodspartition the data set considered into three clusters, andthe cluster centres obtained by these three methods aresimilar, but the proposed method needs only one iterationto obtain the desired clustering result. Table 3 containsthe estimated cluster centres and performance indexes forvarious initial thresholds within the region [0.63, 0.74]where the initial threshold parameter is sensitive.
Example 2: Consider a synthetic data set, denoted S2, withtwo clusters of different shapes and different sizes asshown in Fig. 4. With C ¼ S2; z ¼ 1:0; and weightingfactor wk ¼ 1=N; the plots of performance index andnumber of learning iterations versus initial threshold aregiven in Figs. 5 and 6, respectively. In this case, the initialthreshold parameter is very sensitivite. The initial thresholdvalue for the optimal solution is 0.67. For this initialthreshold value, the grey clustering algorithm is terminatedat the end of the 14th learning iteration and results in two
Fig. 3 Plot of number of learning iterations vs initial threshold(Example 1)
Table 3: Results for various initial thresholds in sensitiveregion (Example 1)
Initial threshold Estimated cluster centres PI
0.63 (0.4000, 0.3000), (0.7000, 0.7000),
(0.2000, 0.8000)
0.9048
0.64, 0.65 (0.3965, 0.3035), (0.7000, 0.7000),
(0.2000, 0.8000)
0.9012
0.66 (0.4000, 0.3071), (0.7000, 0.7000),
(0.2050, 0.7950)
0.8785
0.67, 0.68, 0.69 (0.4000, 0.3071), (0.6892, 0.6938),
(0.2050, 0.7950)
0.8356
0.70 (0.3965, 0.3035), (0.6892, 0.6938),
(0.2050, 0.7950)
0.8474
0.71 (0.3965, 0.3035), (0.6892, 0.6938),
(0.2000, 0.7950)
0.8544
0.72, 0.73 (0.4000, 0.3000), (0.6892, 0.6938),
(0.2000, 0.8000)
0.8613
0.74 (0.4000, 0.3000), (0.6892, 0.7063),
(0.2000, 0.8000)
0.8803
Fig. 4 Data set and classification result of proposed method(Example 2)
Table 2: Comparison of results of different algorithms(Example 1)
Grey clustering
algorithm HCM with c ¼ 3 FCM with c ¼ 3
c1 (0.4000, 0.3000) (0.4000, 0.3000) (0.3997, 0.2989)
c2 (0.7000, 0.7000) (0.7000, 0.7000) (0.7006, 0.7003)
c3 (0.2000, 0.8000) (0.2000, 0.8000) (0.1991, 0.8006)
iterations 1 2 7
Fig. 5 Plot of performance index vs initial threshold(Example 2)
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 169
cluster centres (0.1937, 20.1029) and (2.6356, 21.6741).The estimated cluster centres and the classification resultsare also illustrated in Fig. 4 where the cluster centres aremarked by the symbol ‘�’.
Table 4 compares the results obtained by the greyclustering algorithm, hard 2-means algorithm, and fuzzy
2-means algorithm. Figures 7 and 8 show the final clusteringresults of the hard 2-means algorithm and fuzzy 2-meansalgorithm, respectively. From these figures, it is observedthat both the hard 2-means algorithm and fuzzy 2-meansalgorithm misclassify several patterns, but the grey cluster-ing algorithm does not.
Example 3: Consider a two-dimensional data set S3
consisting of 200 points as shown in Fig. 9. There are fourclusters of different shapes and different sizes in this dataset. With C ¼ S3; z ¼ 1:0; and wk ¼ 1=N; the performanceindex plot shown in Fig. 10 indicates that the initialthreshold value for the optimal solution is 0.78. For thisinitial threshold value, grey clustering algorithmconverges at the end of the 5th learning iteration as shownin Fig. 11 and results in four cluster centres, (0.7810,0.3451), (0.2965, 0.1999), (0.2372, 0.6910) and (0.8055,0.7784).
Fig. 6 Plot of number of learning iterations vs initial threshold(Example 2)
Table 4: Comparison of results of different algorithms(Example 2)
Grey clustering
algorithm HCM with c ¼ 2 FCM with c ¼ 2
c1 (0.1937, �0:1029) (�0:0392; 0.0110) (�0:3381; 0.0444)
c2 (2.6356, �1:6741) (2.6435, �1:6296) (2.1127, �1:1647)
PI 7.7098 9.0061# 6.5223#
iterations 14 17 26
#Incorrect clustering result is obtained by this method
Fig. 7 Classification result of hard 2-means algorithm(Example 2)
Fig. 8 Classification result of fuzzy 2-means algorithm (Example 2)
Fig. 9 Data set and classification result of proposed method(Example 3)
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005170
Figures 9, 12 and 13 show clustering results of thegrey clustering algorithm, hard 4-means algorithm andfuzzy 4-means algorithm, respectively, where the clustercentres are marked by the symbol ‘�’. All methods partitionthe data set S3 into four clusters. Table 5 lists the details ofthese methods. The cluster estimated centres by these threemethods are similar, but the grey clustering algorithmachieves the largest performance index. Table 6 contains theestimated cluster centres and performance indexes forvarious initial thresholds within the region [0.75, 0.78]where the initial threshold parameter is sensitive.
From the examples in the above, we see that introductionof the performance index (8) is necessary. It helps us dealwith the sensitivity of the proposed method to the initialthreshold parameter that is chosen.
Fig. 11 Plot of number of learning iterations vs initial threshold(Example 3)
Fig. 12 Classification result of hard 4-means algorithm(Example 3)
Fig. 13 Classification result of fuzzy 4-means algorithm(Example 3)
Table 5: Comparison of results of different algorithms (Example 3)
Grey clustering
algorithm HCM with c ¼ 4 FCM with c ¼ 4
c1 (0.7810, 0.3451) (0.7882, 0.3349) (0.7805, 0.3383)
c2 (0.2965, 0.1999) (0.2931, 0.1954) (0.2860, 0.1935)
c3 (0.2372, 0.6910) (0.2113, 0.6933) (0.2226, 0.7035)
c4 (0.8055, 0.7784) (0.8093, 0.7918) (0.8258, 0.7889)
PI 0.6199 0.6084 0.5546
iterations 5 5 15
Fig. 10 Plot of performance index vs initial threshold(Example 3)
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 171
6 Conclusions
For the data clustering problem, most algorithms require aninitial guess of cluster centres. In this paper, we havepresented a grey relational analysis based clusteringalgorithm for finding the cluster centres in an unsupervisedmanner. Together with the performance index given in [12],this algorithm can obtain the optimal result. To demonstratethe effectiveness of the approach, several examples havebeen given. In each example, performance of the proposedalgorithm was compared with that of the fuzzy c-meansmethod and the hard c-means method. Simulation resultsshowed that the proposed approach outperformed the others.
7 Acknowledgment
The authors would like to express their thanks to theAssociate Editor and anonymous referees for their help inimproving the paper, and to the National Science Council,Taiwan who have financially supported this work throughGrant NSC 91-2218-E-262-007.
8 References
1 Deng, J.-L.: ‘Control problems of grey systems’, Syst. Control, 1982,5, (1), pp. 284–294
2 Deng, J.-L.: ‘Introduction to grey system theory’, J. Grey Syst., 1989, 1,(1), pp. 1–24
3 Hu, Y.-C., Chen, R.-S., Hsu, Y.-T., and Tzebg, G.-H.: ‘Grey self-organizing feature maps’, Neurocomputing, 2002, 48, (1 – 4),pp. 863–877
4 Wong, C.-C., and Chen, C.-C.: ‘Data clustering by grey relationalanalysis’, J. Grey Syst., 1998, 10, (3), pp. 281–288
5 Yeh, M.-F., Chang, J.-C., and Lu, H.-C.: ‘Unsupervised clusteringalgorithm via grey relational pattern analysis’, J. Chinese Grey Syst.Assoc., 2002, 5, (1), pp. 17–22
6 Yeh, M.-F.: ‘Data clustering via grey relational pattern analysis’, J. GreySyst., 2002, 14, (3), pp. 259–264
7 Yeh, M.-F., and Chang, K.-C.: ‘Grey unsupervised clustering method’.Proc. Joint Conf. on AI, Fuzzy System, and Grey System, Taipei,Taiwan, 2003
8 Yeh, M.-F., and Wang, T.-Y.: ‘A new grey self-organizing featuremap’. Proc. Joint Conf. AI, Fuzzy System, and Grey System, Taipei,Taiwan, 2003
9 Lu, H.-C., and Yeh, M.-F.: ‘Robot path planning based on modifiedgrey relational analysis’, Cybern. Syst., 2002, 33, (2), pp. 129–159
10 Yeh, M.-F., and Lu, H.-C.: ‘Evaluating weapon systems based on greyrelational analysis and fuzzy arithmetic operations’, J. Chinese Inst.Eng., 2000, 23, (2), pp. 211–221
11 Bezdek, J.C.: ‘Pattern recognition with fuzzy objective functionalgorithm’ (Plenum Press, 1981)
12 Jain, A.K., and Dubes, R.C.: ‘Algorithms for clustering data’ (PrenticeHall, New Jersey, 1988)
13 Gonzalez, J., Rojas, I., Pomares, H., Ortega, J., and Prieto, A.: ‘A newclustering technique for function approximation’, IEEE Trans. NeuralNetw., 2002, 13, (1), pp. 132–142
14 Duda, R.O., and Hart, P.E.: ‘Pattern classification and scene analysis’(John Wiley & Sons, Inc., New York, 1973)
15 Carpenter, G.A., and Grossberg, S.: ‘A massively parallel architecturefor a self-organizing neural pattern recognition machine’, Comput. Vis.Graph. Image Process., 1987, 37, (1), pp. 54–115
16 Kohonen, T.: ‘The neural phonetic typewriter’, IEEE Trans. Comput.,1988, 27, (1), pp. 11–12
17 Kohonen, T.: ‘Self-organizing maps’ (Springer-Verlag, New York,1995)
18 Runkler, T.A., and Bezdek, J.C.: ‘Alternating cluster estimation: a newtool for clustering and function approximation’, IEEE Trans. FuzzySyst., 1999, 7, (4), pp. 377–393
19 Pedrycz, W.: ‘Conditional fuzzy c-means’, Pattern Recogn. Lett., 1996,17, (6), pp. 625–632
20 Su, M.-C., Declaris, N., and Liu, T.-K.: ‘Application of neural networkin cluster analysis’. Proc. IEEE Int. Conf. on System, Man, andCybernetics, 1997, Vol. 1, pp. 1–6
21 Bezdek, J.C., and Pal, N.R.: ‘Some new indexes of cluster validity’,IEEE Trans. Syst. Man Cybern. B, Cybern., 1998, 28, (3), pp. 301–315
22 Kwon, S.H.: ‘Cluster validity index for fuzzy cluster’, Electron. Lett.,1998, 34, (22), pp. 2176–2177
23 Pal, N.R., and Bezdek, J.C.: ‘On cluster validity for the fuzzy c-meansmodel’, IEEE Trans. Fuzzy Syst., 1995, 3, (3), pp. 370–379
24 Yager, R.R., and Filev, D.P.: ‘Approximate clustering via the mountainmethod’, IEEE Trans. Syst. Man Cybern., 1994, 24, (8), pp. 1279–1284
25 Wang, J.-H., and Rau, J.-D.: ‘VQ-agglomeration: a novel approach toclustering’, IEE Proc. Vis. Image Signal Process., 2001, 148, (1),pp. 36–44
26 Chien, J.-T.: ‘Online unsupervised learning of hidden Markov modelsfor adaptive speech recognition’, IEE Proc. Vis. Image Signal Process.,2001, 148, (5), pp. 315–324
27 Wong, C.-C., and Lai, H.-R.: ‘A new grey relational measurement’,J. Grey Syst., 2000, 12, (4), pp. 341–346
28 Wong, C.-C., Chen, C.-C., and Su, M.-C.: ‘A novel algorithm for dataclustering’, Pattern Recogn., 2001, 34, (2), pp. 425–442
Table 6: Results for various initial thresholds in sensitive region (Example 3)
Initial threshold 0.75 0.76 0.77 0.78
c1 (0.7799, 0.3509) (0.7803, 0.3471) (0.7812, 0.3449) (0.7810, 0.3451)
c2 (0.2971, 0.2009) (0.2974, 0.2004) (0.2987, 0.2006) (0.2965, 0.1999)
c3 (0.2359, 0.6885) (0.2351, 0.6889) (0.2322, 0.6883) (0.2372, 0.6910)
c4 (0.8021, 0.7768) (0.8030, 0.7764) (0.8073, 0.7766) (0.8055, 0.7784)
c5 – – (0.3673, 0.7691) –
PI 0.6064 0.6119 0.4717 0.6199
IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005172