+ All Categories
Home > Documents > Grey relational analysis based approach for data clustering

Grey relational analysis based approach for data clustering

Date post: 21-Sep-2016
Category:
Upload: m-f
View: 214 times
Download: 0 times
Share this document with a friend
8
Grey relational analysis based approach for data clustering K.-C. Chang and M.-F. Yeh Abstract: This paper generalises the concept of grey relational analysis to develop a technique, called grey relational pattern analysis, for analysing the similarity between given patterns. Based on this technique, a clustering algorithm is proposed for finding cluster centres of a given data set. This approach can be categorised as an unsupervised clustering algorithm because it does not need predetermination of appropriate cluster centres in the initialisation. The problem of determining the optimal number of clusters and optimal locations of cluster centres is also considered. Finally, the approach is used to solve several data clustering problems as examples. In each example, the performance of the proposed algorithm is compared with other well-known algorithms such as the fuzzy c-means method and the hard c-means method. Simulation results demonstrate the effectiveness and feasibility of the proposed method. 1 Introduction Grey system theory, initiated by Deng [1, 2], can perform grey relational analysis for sequences. For a given reference sequence and a given set of comparative sequences, grey relational analysis can be used to determine the relational grade between the reference and each element in the given set. Then the best comparative one can be found by further analysing the resultant relational grades. In other words, grey relational analysis can be viewed as a measure of similarity for finite sequences. The method of grey relational analysis has been successfully applied on cluster analysis [3–8], robot path planning [9], multiple criteria decision-making [10], and so on. Cluster analysis is a basic tool for finding the underlying structure of a given data set that is one of the most fundamental issues in pattern recognition. The primary objective of cluster analysis is to partition a given data set of multidimensional patterns into a number of subgroups (clusters), where the objects inside a cluster show a certain degree of similarity. Overviews of clustering algorithms can be found in the studies [11, 12]. In recent work [13], clustering algorithms are categorised into two conceptually different families, namely, input clustering and input – output clustering. Input clustering algorithm depends on an analysis of the input training patterns, completely ignoring information about dependent output variables. Important examples of input clustering algorithm are the hard c-means [14] and the fuzzy c-means [11] methods. Some neural networks such as self-organising feature maps [3, 8, 15–17] are also examples of input clustering algorithm. On the other hand, input–output clustering algorithms incorporate output variables, and can be performed in alternating cluster estimation [18] and conditional fuzzy clustering algorithms [19]. The performance of most clustering algorithms is greatly influenced by the number of cluster centroids, the selection of initial cluster centroids, and the geometrical properties (e.g. shapes, densities and distributions) of data [20]. The number of cluster centroids cannot always be defined in advance. Therefore, a cluster validity criterion has to be defined to determine an optimal number of clusters in a data set [11, 21–23]. One can make initial assumptions and use the mountain clustering method [24] to obtain initial values of cluster centroids. To sum up, cluster exploring is very experiment-oriented in the sense that clustering algorithms that can deal with all situations are not yet available. While it is easy to consider the idea of a data cluster on a rather informal basis, it is very difficult to give a formal and universal definition of a cluster. To mathematically identify clusters of a data set, it is usually necessary to first define a measure of similarity and then establish a rule for assigning patterns to the domain of a particular cluster center. Grey relational analysis is a useful tool for measuring the degree of similarity between sequences as mentioned above, but it fails to analyse the relations between patterns. In order to analyse the pattern relation, it has to be modified. Based on grey relational analysis, this paper proposes a so-called grey relational pattern analysis for determining the degree of similarity between patterns while maintaining their geo- metric features in the analysis process. The resultant degree of similarity is called the grey relational pattern grade. Its range is within a normalised interval [0, 1], and it guarantees that the smaller the Euclidean distance between two patterns, the larger the grey relational pattern grade. Together with grey relational pattern analysis and the concept of input clustering algorithm, we develop a so-called grey clustering algorithm and then demonstrate its effec- tiveness and feasibility in the classification problem using several examples. The proposed method can be categorised as an unsupervised algorithm which estimates cluster centres without predetermining them in the initialisation. Like other unsupervised clustering algorithms [3, 4, 22, 25–26], grey relational analysis based clustering algorithm will not suffer from the choice of the initial values of the cluster centers or the number of clusters. The performance index given in [12] is used to determine the optimal number of clusters and the optimal locations of cluster centers. q IEE, 2005 IEE Proceedings online no. 20041209 doi: 10.1049/ip-vis:20041209 The authors are with the Department of Electrical Engineering, Lunghwa University of Science and Technology, Taoyuan, Taiwan 33306 E-mail: [email protected] Paper first received 13th January and in revised form 23rd September 2004. Originally published online 4th March 2005 IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 165
Transcript

Grey relational analysis based approach for dataclustering

K.-C. Chang and M.-F. Yeh

Abstract: This paper generalises the concept of grey relational analysis to develop a technique,called grey relational pattern analysis, for analysing the similarity between given patterns. Based onthis technique, a clustering algorithm is proposed for finding cluster centres of a given data set. Thisapproach can be categorised as an unsupervised clustering algorithm because it does not needpredetermination of appropriate cluster centres in the initialisation. The problem of determining theoptimal number of clusters and optimal locations of cluster centres is also considered. Finally,the approach is used to solve several data clustering problems as examples. In each example, theperformance of the proposed algorithm is compared with other well-known algorithms such asthe fuzzy c-means method and the hard c-means method. Simulation results demonstrate theeffectiveness and feasibility of the proposed method.

1 Introduction

Grey system theory, initiated by Deng [1, 2], can performgrey relational analysis for sequences. For a given referencesequence and a given set of comparative sequences, greyrelational analysis can be used to determine the relationalgrade between the reference and each element in the givenset. Then the best comparative one can be found by furtheranalysing the resultant relational grades. In other words,grey relational analysis can be viewed as a measure ofsimilarity for finite sequences. The method of greyrelational analysis has been successfully applied on clusteranalysis [3–8], robot path planning [9], multiple criteriadecision-making [10], and so on.

Cluster analysis is a basic tool for finding the underlyingstructure of a given data set that is one of the most fundamentalissues in pattern recognition. The primary objective of clusteranalysis is to partition a given data set of multidimensionalpatterns into a number of subgroups (clusters), where the objectsinside a cluster show a certain degree of similarity. Overviewsof clustering algorithms can be found in the studies [11, 12]. Inrecent work [13], clustering algorithms are categorised into twoconceptually different families, namely, input clustering andinput–output clustering. Input clustering algorithm depends onan analysis of the input training patterns, completely ignoringinformation about dependent output variables. Importantexamples of input clustering algorithm are the hard c-means[14] and the fuzzy c-means [11] methods. Some neuralnetworks such as self-organising feature maps [3, 8, 15–17]are also examples of input clustering algorithm. On the otherhand, input–output clustering algorithms incorporate outputvariables, and can be performed in alternating cluster estimation[18] and conditional fuzzy clustering algorithms [19].

The performance of most clustering algorithms is greatlyinfluenced by the number of cluster centroids, the selectionof initial cluster centroids, and the geometrical properties(e.g. shapes, densities and distributions) of data [20]. Thenumber of cluster centroids cannot always be defined inadvance. Therefore, a cluster validity criterion has to bedefined to determine an optimal number of clusters in a dataset [11, 21–23]. One can make initial assumptions and usethe mountain clustering method [24] to obtain initial valuesof cluster centroids. To sum up, cluster exploring is veryexperiment-oriented in the sense that clustering algorithmsthat can deal with all situations are not yet available.

While it is easy to consider the idea of a data cluster on arather informal basis, it is very difficult to give a formal anduniversal definition of a cluster. To mathematically identifyclusters of a data set, it is usually necessary to first define ameasure of similarity and then establish a rule for assigningpatterns to the domain of a particular cluster center. Greyrelational analysis is a useful tool for measuring the degreeof similarity between sequences as mentioned above, but itfails to analyse the relations between patterns. In order toanalyse the pattern relation, it has to be modified. Based ongrey relational analysis, this paper proposes a so-called greyrelational pattern analysis for determining the degree ofsimilarity between patterns while maintaining their geo-metric features in the analysis process. The resultant degreeof similarity is called the grey relational pattern grade. Itsrange is within a normalised interval [0, 1], and it guaranteesthat the smaller the Euclidean distance between twopatterns, the larger the grey relational pattern grade.

Together with grey relational pattern analysis and theconcept of input clustering algorithm, we develop a so-calledgrey clustering algorithm and then demonstrate its effec-tiveness and feasibility in the classification problem usingseveral examples. The proposed method can be categorisedas an unsupervised algorithm which estimates cluster centreswithout predetermining them in the initialisation. Like otherunsupervised clustering algorithms [3, 4, 22, 25–26], greyrelational analysis based clustering algorithm will not sufferfrom the choice of the initial values of the cluster centers orthe number of clusters. The performance index given in [12]is used to determine the optimal number of clusters and theoptimal locations of cluster centers.

q IEE, 2005

IEE Proceedings online no. 20041209

doi: 10.1049/ip-vis:20041209

The authors are with the Department of Electrical Engineering, LunghwaUniversity of Science and Technology, Taoyuan, Taiwan 33306

E-mail: [email protected]

Paper first received 13th January and in revised form 23rd September 2004.Originally published online 4th March 2005

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 165

2 Preliminaries

This section gives a brief discussion of grey relationalanalysis. Grey relational analysis is a tool for analysing therelationships between one major (reference) sequence andthe other comparative ones in a given set [1, 2]. Let Sr ¼fsr1; sr2; . . . ; srng denote a collection of n ðn � 1Þ referencesequences. The element in Sr is of the form sri ¼srið1Þ; srið2Þ; . . . ; sriðpÞh i for a finite positive integer p.

Similarly, let Sc ¼ fsc1; sc2; . . . ; scmg denote a collection ofm ðm � 1Þ comparative sequences and each element in Sc isalso of the form scj ¼ scjð1Þ; scjð2Þ; . . . ; scjð pÞ

� �: The

objective of grey relational analysis is to find the mostsimilar sequence from Sc to a specified reference sequencein Sr: Then the grey relational coefficient between thespecified reference sequence sri 2 Sr; i 2 f1; 2; . . . ; ng andall comparative sequences scj 2 Sc; j ¼ 1; 2; . . . ;m; at thekth datum can be defined as

gðsriðkÞ; scjðkÞÞ ¼Dmin þ � � Dmax

DijðkÞ þ � � Dmax

; k ¼ 1; 2; . . . ; p ð1Þ

where DijðkÞ ¼ jsriðkÞ scjðkÞj; Dmax ¼ maxj maxk DijðkÞ;Dmin ¼ minj mink DijðkÞ; for j ¼ 1; 2; . . . ;m; and � 2 ð0; 1�:The factor � in (1) controls the resolution between Dmax andDmin: Once the grey relational coefficients are all deter-mined, their weighted average, termed by the grey relationalgrade, can be computed by

rðsri; scjÞ ¼Xp

k¼1

½aiðkÞ � gðsriðkÞ; scjðkÞÞ� ð2Þ

where aiðkÞ is the weighting factor of the grey relationalcoefficient gðsriðkÞ; scjðkÞÞ and

Ppk¼1 aiðkÞ ¼ 1 for the

reference sequence sri and the comparative sequences scj;j ¼ 1; 2; . . . ;m: In general, we can choose aiðkÞ ¼ 1=p forall k. From (1) and (2), we have 0< gðsriðkÞ; scjðkÞÞ � 1and 0< rðsri; scjÞ � 1 for any i 2 f1; 2; . . . ; ng and j ¼1; 2; . . . ;m:

Some important properties of grey relational analysis canbe summarised below. Before we state the properties, weneed to introduce a new notation, rðscj; sriÞ: By the notationrðscj; sriÞ; we mean that the grey relational grade is nowcomputed with a reference scj and all of the sequences inSc ¼ fsri; sclg; l ¼ 1; 2; . . . ;m; l – j; i.e. a comparativesequence scj is picked out from original Sc as a newreference, and the other comparative sequences togetherwith the original reference sri form a new comparativesequence set Sc ¼ fsri; sclg:

2.1 Properties of grey relational analysis

For any reference sequence sri 2 Sr; i 2 f1; 2; . . . ; ng; andall comparative sequences scj 2 Sc; j ¼ 1; 2; . . . ;m;grey relational analysis satisfies the following fourproperties [1, 2]:

(P1) Norm interval: (i) 0< rðsri; scjÞ � 1:(ii) rðsri; scjÞ ¼ 1 if sri ¼ scj:

(P2) Dual symmetry: If m ¼ 1; then rðsri; scjÞ ¼ rðscj; sriÞ:(P3) Wholeness: If m � 2, then rðsri; scjÞ –

often

rðscj; sriÞ;where the meaning of the symbol ‘ –

often

’ is ‘be often unequalto’.(P4) Approachability: The smaller the absolute value DijðkÞ;the larger the grey relational coefficient gðsriðkÞ; scjðkÞÞ, andvice versa.

An alternative grey relational coefficient proposed by Wongand Lai [27] is defined as

~ggðsriðkÞ; scjðkÞÞ ¼Dmax DijðkÞDmax Dmin

� �z

ð3Þ

where z 2 ð0;1Þ denotes the distinguishing coefficient. It isshown that ~ggðsriðkÞ; scjðkÞÞ 2 ½0; 1�: The range of thisalternative form is within a closed interval, and it can beused to solve an uneven distributing problem caused byusing (1). The corresponding grey relational grade ~rrðsri; scjÞis also a weighted average of ~ggðsriðkÞ; scjðkÞÞ0s:

3 Grey relational pattern analysis

Grey relational analysis, introduced in the above section,can be simply viewed as it measures every distancedifference between elements with the same index of twosequences and uses an average of the distance differences torepresent the similarity of the two sequences. This conceptcan be used for the analysis of the pattern relation. In orderto analyse the pattern relation in a similar manner, greyrelational analysis has to be modified such that geometricfeatures can be maintained in the analysis process.

Let Xr ¼ fxr1; xr2; . . . ; xrng denote a collection ofn ðn � 1Þ reference patterns and Xc ¼ fxc1; xc2; . . . ; xcmgdenote a collection of m ðm � 2Þ comparative patterns. Theelement in Xr is of the form xri ¼ ðxrið1Þ; xrið2Þ; . . . ; xrið pÞÞand the element in Xc is of the form xcj ¼ ðxcjð1Þ; xcjð2Þ;. . . ; xcjð pÞÞ for a finite positive integer p. Note that the leastnumber of the comparative patterns, m, is two, but the leastnumber of the comparative sequences in grey relationalanalysis is one. Again, we can pick a specific patternxri from Xr as the reference, and all elements in Xc; xcj;j ¼ 1; 2; . . . ;m; are the comparative patterns. Denote theEuclidean distance between two patterns, xri and xcj; by

dij ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXp

k¼1ðxriðkÞ xcjðkÞÞ2

qð4Þ

Let dmax ¼ maxjðdijÞ and dmin ¼ minjðdijÞ; 8j ¼ 1; 2; . . . ;m:

A trivial case is that dmax ¼ dmin: In this case, it is simply acircle centred at xri with all comparative patterns, xcj; on thecircle. Hence, the patterns in the trivial case can be made acluster alone, and we do not need to analyse them anyfurther. Hereafter, we assume that dmax – dmin: Then thegrey relational pattern grade can be defined as

vðxri; xcjÞ ¼dmax dij

dmax dmin

� �z

ð5Þ

where z 2 ð0;1Þ denotes a distinguishing coefficient.From (4), we see that the Euclidean distance dij counts

every component difference, xriðkÞ xcjðkÞ; between twoconsidered patterns. Therefore, the computation of the greyrelational pattern ‘coefficient’ becomes unnecessary. This isthe main difference between grey relational analysis andgrey relational pattern analysis.

For a specified xri; we see that dmin � dij � dmax: Itfollows that

0 �dmax dij

dmax dmin

� 1

and then vðxri; xcjÞ 2 ½0; 1�; 8z 2 ð0;1Þ: Moreover,vðxri; xcjÞ approaches one as dij is near to dmin; andapproaches zero as dij is near to dmax: Hence, the greyrelational pattern grade vðxri; xcjÞ can be used to measure thedegree of similarity between the reference and comparativepatterns in the sense that the smallest Euclidean distancerepresents the largest similarity. Indeed, it maps a Euclideandistance to a normalised measurement of similarity. Thus, no

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005166

matter how large the Euclidean distance, the range of thegrey relational pattern grade is a closed interval [0, 1]. Thedistinguishing coefficient in (5) affects only the ‘magnitude’of the grey relational pattern grade, but does not change therelative relationships between the comparative patterns.Therefore, the selection of the distinguishing coefficientdepends on the numerical considerations in programming.For example, if the difference between dmin and dmax is verysmall for a given data set, a nature choice of z is to selectz� 1 to make resultant grey relational pattern grades moredistinguishable.

The properties of grey relational pattern analysis can besummarised as follows. The notation vðxcj; xriÞ in the follo-wing is defined analogously as rðscj; sriÞ in the above section.

3.1 Properties of grey relational patternanalysis

For any reference pattern xri 2 Xr; i 2 f1; 2; . . . ; ng andall comparative patterns xcj 2 Xc; j ¼ 1; 2; . . . ;m; greyrelational pattern analysis satisfies the following fourproperties:

(B1) Norm interval: (i) vðxri; xcjÞ 2 ½0; 1�:(ii) vðxri; xcjÞ ¼ 1 if xri ¼ xcj:

(B2) Dual symmetry: If m ¼ 2; then vðxri; xcjÞ ¼ vðxcj; xriÞ:(B3) Wholeness: If m � 3; then vðxri; xcjÞ –

often

vðxcj; xriÞ:(B4) Approachability: The smaller the Euclidean distancedij, the larger the relational grade vðxri; xcjÞ; and vice versa.

Proof: Part of the proof is already shown in the above. Therest is straightforward and is omitted. A

4 Unsupervised clustering algorithm

For a given data set X, the objective of cluster analysis is toanalyse the similarity between data and then partition thegiven data set into a number of clusters, where the objectsinside a cluster show a certain degree of similarity. To do so,we can choose Xr ¼ Xc ¼ X and use grey relational patternanalysis developed in Section 3 to analyse the similaritybetween patterns in Xr and Xc: In order to finish thepartition, we use a neurocomputing technique together withgrey relational pattern analysis to develop an unsupervisedinput clustering algorithm below. This algorithm is thencalled grey clustering algorithm.

4.1 Grey clustering algorithm

Step 1: Define a temporary setDefine a temporary set C ¼ fc1; c2; . . . ; cqg: A possibletemporary set is the considered data set C ¼ X: In this case,q ¼ m and ci ¼ xi: The temporary set is used as a data set inthe learning process, and elements in C are updated after acomplete learning iteration.Step 2: Set the initial value of the thresholdPick a real number from the interval [0, 1] as the initialthreshold e. The initial threshold strongly affects the finalclustering results (i.e. the number of clusters and thelocation of cluster centres), and will be discussed later.Step 3: Initialise a learning processLet Xr ¼ Xc ¼ C; and set i ¼ 1:Step 4: Measure the grey relational pattern gradesMake the corresponding reference xri 2 Xr the trainingpattern for this iteration. Then the grey relational patterngrades between the reference xri and all the patterns in Xc;denoted by vðxri; xcjÞ; j ¼ 1; 2; . . . ; q; can be determinedby (5).

Step 5: Update the active patternFor a comparative pattern xcj; if the grey relational patterngrade satisfies vðxri; xcjÞ � e; then the correspondingelement in C is called a significant pattern. Among thesesignificant patterns, an active pattern cl is defined such thatits corresponding comparative pattern has the largest greyrelational pattern grade,

vðxri; clÞ ¼ maxj2f1;2;...;qg

vðxri; xcjÞ ð6Þ

Suppose that there are N significant patterns, denoted byfc�1; c�2; . . . ; c�Ng: Then the active pattern cl is updated by

cl ¼XN

k¼1wk � c�k

XN

k¼1wk ð7Þ

where wk is the weighting factor of the significant pattern c�k :In general, we can select it as wk ¼ 1=N for all k: Anotherchoice is to let wk ¼ vðxri; c�kÞ to make the new location ofthe active pattern closer to the patterns with larger relationalgrade. If there are more than one active pattern, all of themare simultaneously updated by (7).

If every training data is presented, i.e. i ¼ q; go to Step 6.Otherwise, increase i by 1 and then go to Step 4.Step 6: Check the resultIf the temporary set remains the same after updating, i.e.convergence of C is achieved, then stop the learning processand go to Step 8 to determine the cluster centres. Otherwise,go to the next step.Step 7: Increase the thresholdIncrease e; and go to step 3 with the updated set C foranother learning iteration. The threshold e may increaselinearly or exponentially until e ¼ 1:Step 8: Determine the clusters and the cluster centresAfter convergence of C is achieved, some of the elements inC may be equal. Partition the last updated set C into severaldisjoint subsets, where all the elements in the same subsetare identical (i.e. multiple copies). The number of subsets isthe number of clusters. For the elements in a subset of C,their corresponding patterns in the original data set X can bepartitioned as a cluster, and the cluster centre is the elementin that subset.

In order to group the patterns with high relational grades as acluster, we select N significant patterns that are highlysimilar to the reference and then average them as the newlocation of the active pattern in Step 5. Step 5 also revealsthat the patterns with low relational grades remainunchanged after being trained. Increasing the threshold inStep 7 is very similar to the idea of shrinking theneighbourhood size in the self-organising feature maps[17]. As the threshold increases, those significant patternsmove towards a convergent state. The final results are thedesired cluster centres. Because the whole procedure reliesonly on the basis of the underlying structure of data set, theproposed method is an unsupervised clustering approach.

In Step 5, we see that selection of the significant patternsdepends on the value of the threshold. In fact, the initialvalue of the threshold we choose in Step 2 stronglyinfluences the final classification results of our method.Generally speaking, different initial thresholds may yielddifferent results in the data clustering problem. Thesensitivity of the initial threshold parameter will be shownby examples in the next section. The grey clusteringalgorithm proposed above is used to partition a data set for afixed initial threshold. We may use an extra procedure in thegrey clustering algorithm to get results for different initialthresholds within [0, 1]. Among those results, we need acriterion to help us to find an appropriate clustering result.Define a performance index as [12].

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 167

PI ¼Ph

d¼1 mdsdPhd¼1 md

ð8Þ

where h is the number of clusters, md; d ¼ 1; 2; . . . ; h;denotes the number of patterns which belong to the dthcluster and the value sd is calculated by

sd ¼minj; j–d kod ojk2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPmd

t¼1 kxdt odk=p

md

ð9Þ

where od represents the cluster centre of the dth cluster andxdt; t ¼ 1; 2; . . . ;md; denotes the patterns which belong to thedth cluster. This index reflects the objective of data clusteringin the sense that the patterns in a cluster should be close to thecentre, and cluster centres should be separated far away fromeach other. From (9), we can see that the larger theperformance index, the better the clustering result [28].

Now, we give the last step of grey clustering algorithm todetermine an optimal result in the sense that the resultmaximises the performance index (8).Step 9: Determine the optimal resultCalculate the performance index (8) for each result obtainedby using different initial thresholds in Step 8. The optimalclustering result is the one that maximises the performanceindex.

Instead of (8), different performance indexes may also beused in Step 9. For instance, if the mean of the distancesbetween cluster centre and patterns in the cluster is the onlyobjective under consideration, Step 9 should be modified tosearch the result that minimises the alternative performanceindex.

5 Simulation results

The purpose of this Section is to show the performance andeffectiveness of our method using three examples withvarious data sets. In each example, we compare the resultobtained using the grey clustering algorithm with thatobtained using the fuzzy c-means method [11] and the hardc-means method [14], in which c is the number of clusters tobe predetermined. In each case, the initial threshold startsfrom 0.1 and increases by 0.01 for an optimal solution.In the first learning iteration of the grey clustering algorithm,

the threshold in Step 5 is equal to the initial threshold. Then,as stated in Step 7, the threshold increases also by 0.01 for thenext learning iteration for a clustering result. In the followingexamples, for convenience, figures showing the data sets italso show the clustering results. For example, the points inFig. 1 show the geometric distribution of the data setconsidered in Example 1, and the number beside the pointdenotes the clustering result using the proposed method.

Example 1: This example uses a simple data set, denoted S1;listed in the first column of Table 1 and shown in Fig. 1.To perform the proposed clustering procedure we selectthe temporary set as the entire data set, C ¼ S1; andthe weighting factor in Step 4 as wk ¼ 1=N where N is thenumber of significant patterns. Figure 2 shows thatthe performance index is influenced by the initial thresholdparameter. The maximum performance index occurs whenthe initial threshold is within [0.47, 0.63]. Figure 3 illustratesthat no matter what the initial threshold value is; the proposedclustering method converges after a finite number of learningiterations.

With z ¼ 1:0 and initial threshold 0.6, we see that thegrey clustering algorithm stops at the end of the firstlearning iteration with the maximal performance index.

Fig. 1 Data set and classification result of proposed method(Example 1) Fig. 2 Plot of performance index vs initial threshold (Example 1)

Table 1: Data set considered and updated patterns forExample 1

k xk (Original pattern) ck (First iteration)

1 (0.3293, 0.2293) (0.4000, 0.3000)

2 (0.3293, 0.3707) (0.4000, 0.3000)

3 (0.4000, 0.3000) (0.4000, 0.3000)

4 (0.4707, 0.2293) (0.4000, 0.3000)

5 (0.4707, 0.3707) (0.4000, 0.3000)

6 (0.6134, 0.6500) (0.7000, 0.7000)

7 (0.7866, 0.6500) (0.7000, 0.7000)

8 (0.7000, 0.7000) (0.7000, 0.7000)

9 (0.7000, 0.8000) (0.7000, 0.7000)

10 (0.2000, 0.9000) (0.2000, 0.8000)

11 (0.3000, 0.8000) (0.2000, 0.8000)

12 (0.2000, 0.8000) (0.2000, 0.8000)

13 (0.2000, 0.7000) (0.2000, 0.8000)

14 (0.1000, 0.8000) (0.2000, 0.8000)

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005168

The numerical results of the updated patterns at the end oflearning are listed in the second column of Table 1. It isshown that the data set can be partitioned into three clustersand the estimated cluster centres are c1 ¼ ð0:4; 0:3Þ; c2 ¼ð0:7; 0:7Þ; and c3 ¼ ð0:2; 0:8Þ; which are consistent with theactual cluster centre of the data set. The estimated clustercentres and the classification results are also illustratedin Fig. 1 where these cluster centres are marked by thesymbol ‘�’.

Table 2 compares the clustering results obtained by theproposed clustering method, hard 3-means algorithm, andfuzzy 3-means algorithm. It is shown that all methodspartition the data set considered into three clusters, andthe cluster centres obtained by these three methods aresimilar, but the proposed method needs only one iterationto obtain the desired clustering result. Table 3 containsthe estimated cluster centres and performance indexes forvarious initial thresholds within the region [0.63, 0.74]where the initial threshold parameter is sensitive.

Example 2: Consider a synthetic data set, denoted S2, withtwo clusters of different shapes and different sizes asshown in Fig. 4. With C ¼ S2; z ¼ 1:0; and weightingfactor wk ¼ 1=N; the plots of performance index andnumber of learning iterations versus initial threshold aregiven in Figs. 5 and 6, respectively. In this case, the initialthreshold parameter is very sensitivite. The initial thresholdvalue for the optimal solution is 0.67. For this initialthreshold value, the grey clustering algorithm is terminatedat the end of the 14th learning iteration and results in two

Fig. 3 Plot of number of learning iterations vs initial threshold(Example 1)

Table 3: Results for various initial thresholds in sensitiveregion (Example 1)

Initial threshold Estimated cluster centres PI

0.63 (0.4000, 0.3000), (0.7000, 0.7000),

(0.2000, 0.8000)

0.9048

0.64, 0.65 (0.3965, 0.3035), (0.7000, 0.7000),

(0.2000, 0.8000)

0.9012

0.66 (0.4000, 0.3071), (0.7000, 0.7000),

(0.2050, 0.7950)

0.8785

0.67, 0.68, 0.69 (0.4000, 0.3071), (0.6892, 0.6938),

(0.2050, 0.7950)

0.8356

0.70 (0.3965, 0.3035), (0.6892, 0.6938),

(0.2050, 0.7950)

0.8474

0.71 (0.3965, 0.3035), (0.6892, 0.6938),

(0.2000, 0.7950)

0.8544

0.72, 0.73 (0.4000, 0.3000), (0.6892, 0.6938),

(0.2000, 0.8000)

0.8613

0.74 (0.4000, 0.3000), (0.6892, 0.7063),

(0.2000, 0.8000)

0.8803

Fig. 4 Data set and classification result of proposed method(Example 2)

Table 2: Comparison of results of different algorithms(Example 1)

Grey clustering

algorithm HCM with c ¼ 3 FCM with c ¼ 3

c1 (0.4000, 0.3000) (0.4000, 0.3000) (0.3997, 0.2989)

c2 (0.7000, 0.7000) (0.7000, 0.7000) (0.7006, 0.7003)

c3 (0.2000, 0.8000) (0.2000, 0.8000) (0.1991, 0.8006)

iterations 1 2 7

Fig. 5 Plot of performance index vs initial threshold(Example 2)

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 169

cluster centres (0.1937, 20.1029) and (2.6356, 21.6741).The estimated cluster centres and the classification resultsare also illustrated in Fig. 4 where the cluster centres aremarked by the symbol ‘�’.

Table 4 compares the results obtained by the greyclustering algorithm, hard 2-means algorithm, and fuzzy

2-means algorithm. Figures 7 and 8 show the final clusteringresults of the hard 2-means algorithm and fuzzy 2-meansalgorithm, respectively. From these figures, it is observedthat both the hard 2-means algorithm and fuzzy 2-meansalgorithm misclassify several patterns, but the grey cluster-ing algorithm does not.

Example 3: Consider a two-dimensional data set S3

consisting of 200 points as shown in Fig. 9. There are fourclusters of different shapes and different sizes in this dataset. With C ¼ S3; z ¼ 1:0; and wk ¼ 1=N; the performanceindex plot shown in Fig. 10 indicates that the initialthreshold value for the optimal solution is 0.78. For thisinitial threshold value, grey clustering algorithmconverges at the end of the 5th learning iteration as shownin Fig. 11 and results in four cluster centres, (0.7810,0.3451), (0.2965, 0.1999), (0.2372, 0.6910) and (0.8055,0.7784).

Fig. 6 Plot of number of learning iterations vs initial threshold(Example 2)

Table 4: Comparison of results of different algorithms(Example 2)

Grey clustering

algorithm HCM with c ¼ 2 FCM with c ¼ 2

c1 (0.1937, �0:1029) (�0:0392; 0.0110) (�0:3381; 0.0444)

c2 (2.6356, �1:6741) (2.6435, �1:6296) (2.1127, �1:1647)

PI 7.7098 9.0061# 6.5223#

iterations 14 17 26

#Incorrect clustering result is obtained by this method

Fig. 7 Classification result of hard 2-means algorithm(Example 2)

Fig. 8 Classification result of fuzzy 2-means algorithm (Example 2)

Fig. 9 Data set and classification result of proposed method(Example 3)

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005170

Figures 9, 12 and 13 show clustering results of thegrey clustering algorithm, hard 4-means algorithm andfuzzy 4-means algorithm, respectively, where the clustercentres are marked by the symbol ‘�’. All methods partitionthe data set S3 into four clusters. Table 5 lists the details ofthese methods. The cluster estimated centres by these threemethods are similar, but the grey clustering algorithmachieves the largest performance index. Table 6 contains theestimated cluster centres and performance indexes forvarious initial thresholds within the region [0.75, 0.78]where the initial threshold parameter is sensitive.

From the examples in the above, we see that introductionof the performance index (8) is necessary. It helps us dealwith the sensitivity of the proposed method to the initialthreshold parameter that is chosen.

Fig. 11 Plot of number of learning iterations vs initial threshold(Example 3)

Fig. 12 Classification result of hard 4-means algorithm(Example 3)

Fig. 13 Classification result of fuzzy 4-means algorithm(Example 3)

Table 5: Comparison of results of different algorithms (Example 3)

Grey clustering

algorithm HCM with c ¼ 4 FCM with c ¼ 4

c1 (0.7810, 0.3451) (0.7882, 0.3349) (0.7805, 0.3383)

c2 (0.2965, 0.1999) (0.2931, 0.1954) (0.2860, 0.1935)

c3 (0.2372, 0.6910) (0.2113, 0.6933) (0.2226, 0.7035)

c4 (0.8055, 0.7784) (0.8093, 0.7918) (0.8258, 0.7889)

PI 0.6199 0.6084 0.5546

iterations 5 5 15

Fig. 10 Plot of performance index vs initial threshold(Example 3)

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005 171

6 Conclusions

For the data clustering problem, most algorithms require aninitial guess of cluster centres. In this paper, we havepresented a grey relational analysis based clusteringalgorithm for finding the cluster centres in an unsupervisedmanner. Together with the performance index given in [12],this algorithm can obtain the optimal result. To demonstratethe effectiveness of the approach, several examples havebeen given. In each example, performance of the proposedalgorithm was compared with that of the fuzzy c-meansmethod and the hard c-means method. Simulation resultsshowed that the proposed approach outperformed the others.

7 Acknowledgment

The authors would like to express their thanks to theAssociate Editor and anonymous referees for their help inimproving the paper, and to the National Science Council,Taiwan who have financially supported this work throughGrant NSC 91-2218-E-262-007.

8 References

1 Deng, J.-L.: ‘Control problems of grey systems’, Syst. Control, 1982,5, (1), pp. 284–294

2 Deng, J.-L.: ‘Introduction to grey system theory’, J. Grey Syst., 1989, 1,(1), pp. 1–24

3 Hu, Y.-C., Chen, R.-S., Hsu, Y.-T., and Tzebg, G.-H.: ‘Grey self-organizing feature maps’, Neurocomputing, 2002, 48, (1 – 4),pp. 863–877

4 Wong, C.-C., and Chen, C.-C.: ‘Data clustering by grey relationalanalysis’, J. Grey Syst., 1998, 10, (3), pp. 281–288

5 Yeh, M.-F., Chang, J.-C., and Lu, H.-C.: ‘Unsupervised clusteringalgorithm via grey relational pattern analysis’, J. Chinese Grey Syst.Assoc., 2002, 5, (1), pp. 17–22

6 Yeh, M.-F.: ‘Data clustering via grey relational pattern analysis’, J. GreySyst., 2002, 14, (3), pp. 259–264

7 Yeh, M.-F., and Chang, K.-C.: ‘Grey unsupervised clustering method’.Proc. Joint Conf. on AI, Fuzzy System, and Grey System, Taipei,Taiwan, 2003

8 Yeh, M.-F., and Wang, T.-Y.: ‘A new grey self-organizing featuremap’. Proc. Joint Conf. AI, Fuzzy System, and Grey System, Taipei,Taiwan, 2003

9 Lu, H.-C., and Yeh, M.-F.: ‘Robot path planning based on modifiedgrey relational analysis’, Cybern. Syst., 2002, 33, (2), pp. 129–159

10 Yeh, M.-F., and Lu, H.-C.: ‘Evaluating weapon systems based on greyrelational analysis and fuzzy arithmetic operations’, J. Chinese Inst.Eng., 2000, 23, (2), pp. 211–221

11 Bezdek, J.C.: ‘Pattern recognition with fuzzy objective functionalgorithm’ (Plenum Press, 1981)

12 Jain, A.K., and Dubes, R.C.: ‘Algorithms for clustering data’ (PrenticeHall, New Jersey, 1988)

13 Gonzalez, J., Rojas, I., Pomares, H., Ortega, J., and Prieto, A.: ‘A newclustering technique for function approximation’, IEEE Trans. NeuralNetw., 2002, 13, (1), pp. 132–142

14 Duda, R.O., and Hart, P.E.: ‘Pattern classification and scene analysis’(John Wiley & Sons, Inc., New York, 1973)

15 Carpenter, G.A., and Grossberg, S.: ‘A massively parallel architecturefor a self-organizing neural pattern recognition machine’, Comput. Vis.Graph. Image Process., 1987, 37, (1), pp. 54–115

16 Kohonen, T.: ‘The neural phonetic typewriter’, IEEE Trans. Comput.,1988, 27, (1), pp. 11–12

17 Kohonen, T.: ‘Self-organizing maps’ (Springer-Verlag, New York,1995)

18 Runkler, T.A., and Bezdek, J.C.: ‘Alternating cluster estimation: a newtool for clustering and function approximation’, IEEE Trans. FuzzySyst., 1999, 7, (4), pp. 377–393

19 Pedrycz, W.: ‘Conditional fuzzy c-means’, Pattern Recogn. Lett., 1996,17, (6), pp. 625–632

20 Su, M.-C., Declaris, N., and Liu, T.-K.: ‘Application of neural networkin cluster analysis’. Proc. IEEE Int. Conf. on System, Man, andCybernetics, 1997, Vol. 1, pp. 1–6

21 Bezdek, J.C., and Pal, N.R.: ‘Some new indexes of cluster validity’,IEEE Trans. Syst. Man Cybern. B, Cybern., 1998, 28, (3), pp. 301–315

22 Kwon, S.H.: ‘Cluster validity index for fuzzy cluster’, Electron. Lett.,1998, 34, (22), pp. 2176–2177

23 Pal, N.R., and Bezdek, J.C.: ‘On cluster validity for the fuzzy c-meansmodel’, IEEE Trans. Fuzzy Syst., 1995, 3, (3), pp. 370–379

24 Yager, R.R., and Filev, D.P.: ‘Approximate clustering via the mountainmethod’, IEEE Trans. Syst. Man Cybern., 1994, 24, (8), pp. 1279–1284

25 Wang, J.-H., and Rau, J.-D.: ‘VQ-agglomeration: a novel approach toclustering’, IEE Proc. Vis. Image Signal Process., 2001, 148, (1),pp. 36–44

26 Chien, J.-T.: ‘Online unsupervised learning of hidden Markov modelsfor adaptive speech recognition’, IEE Proc. Vis. Image Signal Process.,2001, 148, (5), pp. 315–324

27 Wong, C.-C., and Lai, H.-R.: ‘A new grey relational measurement’,J. Grey Syst., 2000, 12, (4), pp. 341–346

28 Wong, C.-C., Chen, C.-C., and Su, M.-C.: ‘A novel algorithm for dataclustering’, Pattern Recogn., 2001, 34, (2), pp. 425–442

Table 6: Results for various initial thresholds in sensitive region (Example 3)

Initial threshold 0.75 0.76 0.77 0.78

c1 (0.7799, 0.3509) (0.7803, 0.3471) (0.7812, 0.3449) (0.7810, 0.3451)

c2 (0.2971, 0.2009) (0.2974, 0.2004) (0.2987, 0.2006) (0.2965, 0.1999)

c3 (0.2359, 0.6885) (0.2351, 0.6889) (0.2322, 0.6883) (0.2372, 0.6910)

c4 (0.8021, 0.7768) (0.8030, 0.7764) (0.8073, 0.7766) (0.8055, 0.7784)

c5 – – (0.3673, 0.7691) –

PI 0.6064 0.6119 0.4717 0.6199

IEE Proc.-Vis. Image Signal Process., Vol. 152, No. 2, April 2005172


Recommended