J.-H.Wang and J.-D.Rau- VQ-agglomeration: a novel approach to clustering

8/3/2019 J.-H.Wang and J.-D.Rau- VQ-agglomeration: a novel approach to clustering

1/9


2/9

area. Therefore, this approac h is different from the methodof [SI in that it extracts clusters ina parallel fashion. Aftera few synchronous steps, codewords would stop movingand stay at a few positions called cluster centroids. Afterconvergence, codewords that moved to the same centroidare grouped as a cluster, as are their associated inputvectors. One should note that, unlike in [3,6, 71 th eproposed agglomeration algorithm does not actuallymerge prototypes in the progress of agglomeration.

One thing that VQ and clustering have in common is that

they both group an unlabelled data set intoa certainnumber of clusters such that data within the same clusterhave a high degree of similarity. However, clustering isoften a more difficult task, as it does not have specificcriteria, such as the MSE or information entropycomm only adop ted in vector quantisation. Practical aspectsof a clustering algorithm, such as its effectiveness inrevealing natural groups of data[9], are often consideredmore important than mathematical rigour. In contrast,communications experts tend to develop optimal vectorquantisers with respect to some abstract criteria frominformation theory. This leads to broadly applicable meth-ods [ lo ] that may have some assured minimum level ofperformance, but do not take advantage of the structurecharacteristics of the input data fora particular application.

In this paper, the rationale of adoptingV Q as the pre-process of an agglomeration algorithm is twofold: throughVQ, adjacent input feature vectors will be encodedas th esame codeword. Because the number of codewords is muchless than the num ber of input vectors, subsequent agglom-eration processing on codewords rather than directly onindividual input items would require much less computa-tion time. Moreover, because VQ can approximate theinput distribution, using the codewords as good initialprototypes, instead of random selections from the rawinput, can effectively reduce the sensitivity to initialprototypes.

To begin with, the paper proposes the VQ-agglomera-tion approach that not only can save much computationtime, but it is also free of the initial prototyp e problem thatplagues the conventional It-means algorithm [I]. We then

present a novel algorithm that employs the strategy of get-right-to-centroid to agglomerate codewords. The denserinput area, which acts as a black hole, will attractnearby codewo rds to it. The ag glome ration process iteratesuntil no more codewords can move. By then the finalresulting clusters as well the number of clusters areobtained. Thus, the main contributions of the paper arethe proposal of a novel agglomeration algorithm incorpo-rated with a VQ pre-process to facilitate fast and autono-mous clustering, characterising properties that closelyrelate to the efficiency of the agglomeration algorithm,proving the stability of the agglomeration process,contrasting the performance of the VQ-agglomerationapproach and other clustering techniques and further vali-dation through simulation studies.

In the remainder o f this paper the roles played,as well as

the advan tages provided, by a V Q pre-process in the taskofclustering are explained. The codeword-agglomeration(CA) algorithm essential to the success of the VQ-agglom-eration approach is presented. Important properties of theCA algorithm are characterised and its convergence stabi-lity is also proved. Sim ulation results are provided to verifythat the proposed VQ-agglomeration approach is fast incomputation time, and free of the initial prototypeproblem. We also present a recursive version of the VQ-agglomeration, and show its usefulness in facilitatingcluster validation.

IEE Proc -E s mage Signal Procem I Vol 148. No 1. February 2001

2 Advantages of the VQ pre-process

Advantages of conducting the pre-process of vector quan-tisation are: (i) because the number of codewords isgenerally much less than the number of input vectors,much computation time can be saved (ii) if the codewordsare used as initial prototypes, the initialisation problem inprototype-based clustering techniques can be alleviated tosome extent. Unfortunately, conventional prototype-basedtechniques would not allowa set of codeGords be usedas

input data and as initial prototypes at the same time. Thatis, the two advantages in general Fannot be achievedsimultaneously. In this paper, we incorporate the CAalgorithm and a VQ pre-process, naming the processVQ-agglomeration, to successfully acquire both theadvantages. The CA algorithm treats each codewcrdasan initial prototype and as an inpu t data item in .he courseof the agglomeration processes. To differentiate, anyapproach that incorporates VQ and any copventionalclustering techniques (suchas Ic-means [11, the minimumspann ing tree (MST ) algorithm [ l 1, 121, etc.) is referred toas VQ-clustering.

2.1Two conventional Clustering methods were tested, namelythe MST algorithm [ l l , 121 and fuzzy C-means (FCM)[ 131. The MST algorithm isa well known graph-theoreticaltechnique for discovering clusters ina data set. The methodfirst finds a minimum spanning tree for the input data.Then, by removing the longestK edges, K + 1 clusters areobtained. A simple example is shown inFig. 1 where theinput contains seven separate Gaussian distributions with1830 points. Fig. l a shows thc spanning tree andFig. 1 b isthe clustering result. By removing the six longest edges,the data set is correctly separated into seven clusters.However, as a graph-theoretical approach, the MSTalgo-rithm is rather computationally intensive. In this case, thecalculation time for obtaining the minimum spanning treeis 300.72s using a Pentinum-I11500. To reduce computa-tion time, we applied the LBG method [14] to quantise

(codebook size+=64) the input data, and the result isshown in Fig. IC. The com putation time for LBG is0.37 s.Fig. Id illustrates the corresponding spanning tree for theinput codewords of Fig.IC. Note that the computation timefor generating the spanning tree when the codewords,instead of original data, were used as inputs to the MSTalgorithm is only0.0 1 s. Thus, the VQ-clustering approachindeed can save compu tational time. M ore interestingly, wenote that the VQ-clustering approach can achieve almostthe same clustering results (but with much less computa-tion time).

Table 1 shows the experimental results of the VQ-clustering approach that incorporates the MST algorithmand various neural network quantisers, such as the

Saving computation time via VQ

Table 1: Simulation results of the VQ-clusteringapproach that incorporates neural VQ and MSTclustering

~~ ~

Quan tiser BN N LBG FSCL SCONN 2 Neural-Ga s GCS

V Q Ti m e 0.17 0.37 0.32 0.71 7.14 0.50

MS E 30.18 34.51 33.81 31.07 34.14 32.76

Correct yes yes yes yes Yes Yescluster ing?

M, zz 64

37


3/9


4/9

. , , . 1 C

a b

. .

C

Fig. 2 Exurnpie of use of variant clustering uigoritl~ni.s. d

a Input data and codewords areplotted. BN N consumed 0.55 s with M, = 256b Minimum spanning tree of the codewords inac Result of FC M without VQ . + denotes th e true cluster centroidsd Result of FCM after quantisation; thc four initial prototypes forFCM were tdndomly chosen from the256 codewords

where we first specifieda codebook sizeof k to train BNN ,and used the resulting codewords as initial centroids(prototypes) for the It-means algorithm. Fig.3a showsthe initial centroids as well as their converged centroidsafter applying the /-means to the original input items. Asshown in Fig. 3 6 , the original input data were correctly

grouped into seven clusters by using the good initialprototypes provided by BNN .In fact, for some clustering algorithm s based on iterative

scheme (e.g. FCM), provision ofa set of good initialprototypes could even increase the convergence speed.As shown in Fig. 4a where the initial prototypes areselected randomly from the input data; it took the FCM0.49 s to converge. But when the initial prototypes wereobtained via BNN withMr = 4 as shown in Fig.4b, it tookBNN and FCM 0.02 an d 0.22s, respectively. It is worthnoting that the computation time for BNN accounts foronly a small portion of the overall computation time. Thisresult implies that the provision of good initial prototypesby vector quantisation can lead to faster convergence forthe subsequent clustering algorithm.

2.3 From VQ- clustering to VQ-agglom erationHaving shown various advantages that can possibly bebenefited from adopting a VQ pre-process, we note thatbecause most prototype-based algorithms require that thenumber of input data be much more than that of initialprototypes, the VQ-clustering approach in general cannotsimultaneously acquire all the advantag es. In the followingSection, we present the VQ-agglom eration approach thatisfree ofthis limitation and can perform fast and autonomousclustering. By autonomous we mean that the numberof

IE E Proc.-viS. Image Signal P i ~ o c e . ~ . ~ . .l. 148, No. I , Fehriiaiy 2001

clusters needs not be pre-specified and the convergence ofthe clustering is guaranteed.

3 VQ-agglomeration approach

3.1 CA algorithmInitially, assume that a VQ process has dividedN inputvectors into M! partitions, each represented by a codeword.The CA algorithm is applied to agglomerate these code-words into some clpsters. Codewords locatedat a denserinput area will attract codewords froina sparser area, andthe movements of codewords in the feature space aresynchronous (i.e. simultaneous). After convergence, code-words movedto the same cen troid are labelled asa cluster.In pseudo-code, the algorithm is listed in Table2.

The underlying principle of moving codewords has itsorigin in classical physics. Each codeword c ] ,j = 1 , 2 .. M, , is regarded as a neutral particle with massmJ defined as the numberof input vectors representedbyc, . If a codeword is located neara denser area, its mass

Table 2: Pseudo-code of the CA algori thm

Given Mf codewor ds f r om the VQ pre-process

Do

M o v e simultaneously (using eqn. 3 ) codewor ds c j , = 1 t oM f , o the centroid of SD ( , t) .

Until convergence ( namely, no mo r e codewor ds i s mov ing)

Group codeword s that converge to the sam e cluster centroidinto a cluster.

39


5/9

a

b

Fig. 3a Initial centroids are obtained by applyingBN N to perform quantisatioii withM, =,7. lso shown are the converged centroids after applying the k-meansalgorithm to the original input data, given thc7 initial centroidsh Test data of 7 Gaussian distributions are correctly grouped into 7 clusters byusing the VQ-clustering framework.'D' denotes an initial prototype, and' f 'denotes the converged cluster centroid

k-means algoritlzm and VQ-cliistering

would be larger, and vice versa. Also, each particle cl isassociated with a gravisphere with radius given by

YJ = ' (1 )

where p , is a mean distance measure, say Euclideandistance, from codewordcJ to the input vectors representedby cJ . The parameter 0 is presumed to be a function ofMand is used to modulate I\. To proceed, define a directneighbour of cl as the codeword for which the gravispheredirectly intersects that of c J . Namely, if the distancebetween c k an d cJ is less than (p J+ k ) x 0, ck is a directneighbour of c Furthermore, denoteS D ( j , ) as the subsetthat contains &ect neighbou rs of an arbitrary codewordcIat the tth synchronous step. The centroid ofS D ( j , ) at thetth synchronous step is definedas

where w(k, t ) is the location of cIi in the feature space. Inthe course of a synchronous update, each codeword cjmoves directly toward its correspondingw ( j , t) , .e.

(3 )( j , + 1) = G ( j , )As will be seen, this get-right-to-centroid strategy gener-ates some interesting and desirable properties.Fig. 5 showssimulation results of three successive synchronous steps,where the input is the same as in Fig.2a and M f = 3 2 .After the third synchronous step, the32 codewords withvariable sizes of gravispheres have converged (agglomer-

40

a

, .b

Fig. 4U Initial prototypes selccted randomly from input points,FC M takes 0.49 s toconvergeb Initial prototypes provided byBNN, A4, = 4 BN N and FC M take 0.02 and0.22 s, respectively, to converge

Use o BNN to select initial psototypes

ated) into four separate concentric circles, correctly repre-senting the four Gaussian clusters.

The significance of the gravisphere is its usefulness inquantitatively and qualitatively representing the neighbour-hood of an arbitray codewordci . When the gravisphere ofcj directly intersects that of ck , these two codewords areconsidered as direct neighbours to each other. Note thatcodewords in the neighbourhood ofc j are likely to begrouped into the same cluster. On one hand, the gravi-sphere size of cj qualitatively represents the reign ofcodewords that could be of the same cluster to whichcjbelongs, and on the other hand, when the size of thegravisphere of cj increases, the dissimilarity among thecodewords that intersect cj could also increase and thechance of erroneously agglomerating different clusters intoone cluster increases. Clearly these two-sided constraintsrequire that the gravisphere of a codeword cj be largeenough to directly intersect the neighbouring codewords

around c j , yet it should not be excessively largeso as tointersect irrelevant codewords. To satisfy these twoconstraints, the size of the gravisphereof cj should bequantitatively related to the local distribution aroundcj ,rather than to mass as in classical physics. That is, ifcj islocated at a sparser input area, i.e. input vectors repre-sented by cj distribute widely over the feature space, thenits associated gravisphere should be larger, andvice versa.Because pj is a mean distance from c j to the input vectorsrepresented by c j , t is appropriate to use eqn.1 to computethe gravisphere radius ofc j . In this aspect, the conceptof

IE E Psoc.-Vis. Imuge Signul Pvocess., Vol. 148, No. I . February 2001


6/9

I I Ia b

. . . ." .". , . .

1 . I . .

C d

Fig. 5Test input data used4 Gaussian distributions with slight overlap andMr = 32 . Darker points denote codewords and their corresp onding gravisphcres. Vector quantisationwas conducted using BN Na r = Ob / = lc t = 2d t = 3

Progress ojugg/omemting codewords using gravispheres

the gravisphere is quite analogous to that of the Voronoiregion [20].

The sensitivity of the gravisphere size due to usingdifferent values of M f is mild, especially for input datawhere every cluster is distinctly separate from other clus-ters. For example, when the input data in Fig. l a are usedand the value of My = 64, the value of H ranging from 1.5to 3.5 would work well. In fact, the gravisphere sizespecified by eqn. 1 rarely results in intersecting codewordsthat belong to different clusters, unless the input datacontain clusters that are very close to each other or evenoverlap.

However, consider that My is very large and 0 is aconstant, then the numberof input vectors represented byeach codeword, the value of,uj and the gravispheres wouldbe improperly small. In that case, codewords in a sparseinput area that should be grouped together might erro-neously end up in several clusters. Reasoned as above,0should be a function ofM S . In principle, for an arbitraryinput distribution anda given value of Mf there exists an

optimalH

that yields optimal partition. Although it isvirtually impossible to access the optimal(3 analytically,nevertheless, a heuristic approach is feasible. To see this,we used the input data in Fig. 2a where two clusterson the right-hand side are slightly overlapping, andemployed a curve fitting function to figure out anoptimal value of M f . Empirical results have shown that

My+ 1 OS) works well for general Gaussian dstributions.Note that the gravisphere size for each codeword will notchange throughout the agglomeration process.

0 = ( ( 7 . 9 6 7 x i o - 8 j M ; + ( - 8 . ~ 2 8 x ~ o - ~ ) - M ~( 0 . 0 2 9 )

IEE Proc.-Vis. Imuge Signul hnces.s., Vol. 148, No. I , Febniury 2001

3.2 Properties of the CA algorithmThe CA algorithm differs the gravitational approach [21]proposed by W right in many ways. First, the CA algorithmworks directly on codewords (prototypes), rather than theinput data points. Wright's approach treats each data pointas a particle and agglomerates data points by gravitationalattraction. Secondly, the algorithm employs the get-right-to-centroid strategy to agglomerate the prototypes, anddoes not actually merge prototypes in the progress ofagglomeration. Finally, the algorithm need s not pre-specifythe number of clusters, the valueof which is autonomouslydetermined by the input nature. In the following, wecharacterise more interesting properties of the CA algo-rithm, including its stability and convergence:

(i ) Gravitational proper ty describes how an arbitrary code-word cj behaves around its neighbourhoodS D ( j , ). Byeqn. 2, the centroidof codewords inSD ( , ) must be closerto codewords with larger mass than those with smallermass. Hence, by eqn.3, an arbitrary codewordcj will move

towards a codeword that has larger mass inSD( j , ) . Thisgravitational property has a natural analogy in classicalphysics, namely the law of universal gravity wherebyaparticle with larger mass attracts neighbouring particleswith smaller mass. The property not only helps to discoverlocal dense areas quickly, it also effectively separatesoverlapping clusters.(ii) Relative-velocity property concerns the interactionsamong neighbouring codewords. According to eqn. 2,acodeword with smaller mass will move farther than the o newith larger mass. That is, during each synchronou s step,a

41


7/9

codeword in a sparser input area will move faster, inarelative sense, than the one in a denser area. This createsadesirable property that codewords ina denser area seem towait for codewords in a sparser area so as to prevent thelatter from being left behind and erroneously becomingseveral separate clusters. The relative-velocity propertyensures codewords always move in the desired direction(i.e. towards a concentrated area) an d prevents codewordsin a sparser input area from being trapped in local minima.

More significantly, combination o f the above two propertiesguarantees that, as the iterative synchronous operationproceeds, all codewords would move to denser areas, andmore and more gravispheres of codewords that (by inputnature) belong to the same cluster would intersect each otherand ultimately get agglomerated into some concentriccircles (gravispheres). In fact, the denser area can beperceived as a black hole that would attract codewords to it.

3.3. Convergence of the CA algorithmWe start by defining S D ( j , ) as the set of codewords forwhich their gravispheres have direct intersections (includ-ing the special case when one gravisphere is fully coveredby another gravisphere) with the codewordcJ at the tthsynchronous step. Furthermore,SI( , t ) s defined as the setof codewords for which their gravispheres have direct orindirect contacts with the codewordcI at the tth synchro-nous step. Fig. 6 is a zoom-in of the upper-left corner ofFig. 5a. Considering the codeword c,; the direct neigh-b o u r s o f c , a r e c 2 , c 3 a n d c 4 ,t h a t i s S D ( 1 , 0 ) = { c , , c 2 , c 3 ,c4}. Although the gravispheres of c5 and c8 directlyintersect with that ofc 2 , hey have only indirect contacts(through c 2 )with that of c , Thus, codewordsc5 and c8 areindirect neighbours ofc I Accordingly, S,( , O ) = (c, I = 1,2 , . . . , 9 ) . Moreover, in Fig. 6 we see that Sl(l ,0) = S , ( 2 ,

Lemrna 1: In the progress of synchronous movements, allcodewords inS,(k, t ) are bound to move inwardly to denserinput areas.

P r o o j Consider an arbitrary set S,(k, t ) for which their

gravispheres have direct or indirect contacts with the code-word ck at the t-th synchronou s step. Direct neighbours,( j ,t ) of each codewordcI in S,(k, t) may be different from eachother, thus according to eqn.2 their corresponding centroids6 ( j , ) may not be the same. Moreover, according to thegravitational property, the centroid&(, j , t ) of each codewordc, in S I ( k , ) would be near the codeword with larger mass,

O ) = . . . =S,(9, ).

Fig. 6of neighbotrring codew ods

42

Zoom-in i f th eupper-left cluster in Fig. 5a, showing relationship

and cJ will move to it. Hence, in the course of synchronousmoving, all codewords in&(k , t ) will gradually move to thecodeword inS , (k , t ) that has the largest mass. Therefore, allcodewords in S I ( k , ) are bound to move inwardly to somedenser areas. 0

Although during the synchronous update one or fewcodewords may depart fromS,(k, t ) , they will join neigh-bouring codewords to form another new set at the nextsynchronous step. Afterwards, the gravitational propertywill force codewords inS,(k, t + 1) to unanimously moveto the denser area, i.e. the inward agglomeration is stillbound to occur. This is illustrated in Fig.5a where initiallythe set Sl(k , t = 0), representing the overlapping Gaussians,split into two separate sets at t= 2.

Lemma 2: After finite synchrono us moving s teps, thelocations w(j, ) of all codewords inSl(/c, ) will eventuallybecome equal.

Proofi By lemma 1, codewords in S,(/c, t ) are bound tomove inwardly to some denser areas; thisis to say thatmore and more gravispheres of codewords inSI(/, ) willintersect each other. Consequently, the direct neighboursSD( j , ) of codewords c, in S I ( k , ) will eventually becomeidentical. By eqn.2, it follows that w(j, ) for codewords inS,(/c, t ) will become equal after finite synchronous steps.0

Consider the four codewords at the lower-left corner inFig. 5a , eachw ith different sizes ofgravisphere s.At t = 0, theSD( , ) of these four codewords are not the same. After onesynchronous step, any pair of these four gravispheres inter-sect, i.e. theS D ( j , = 1) of the four cod ewords are identical,and their locationsw ( j , )are equal att = 2. After this, we seeno more codewords moving in these concentric circles.

Convergence Theorem: After finite synchronous movingsteps, all codewordsas well as their gravispheres will formseveral sets of concentric circles and will not move anymore.

Prooj By lemma 2, after finite synchronous moving steps,the gravispheres of the codewords in an arbitrarySI(]


8/9

a

a

. ..

b

Fig. 8CI 64 initial prototypes (and thcir gravispheres) selected randomly from theoriginal input datah After seven synchronous steps, these prototypes erroneously converged intothree sets of concentric circles identifying three clusters

Results obtained without VQpre-pmcess

b

Fig. 7 VQ-agglomerution approacha 32 initial prototypes resulting fromVQ pre-process and their gravispheresb After five synchronous steps, codewords converged into six setsof concentriccircles identifying the six clusters

the CA algorithm. Therefore, this experiment and theresults in Fig. 5 have empirically shown that the proposedclustering approach can simultaneously acquire two bene-fits from using codew ords, namely freedom from the initialprototype problem, and the fast computation time.

conducting agglomeration. To see what improvementsthis feature can bring to the clustering performance, werandomly picked initial prototypes from the original inputdata. Without the VQ pre-process, the result isas shown inFig. 8. The input is the same as forFig. 2 and Mr = 64. Asshown in Fig. 8a , because the initial prototypes wererandomly selected, some codewords would have too largegravispheres. As a result, the two overlapping clusters wereerroneously grouped together after convergence, as shownin Fig. 8b. Comparing Figs. Sa an d 8a verifies that a V Qpre-process indeed can provide good initial prototypes for

Table 3: Insensitivity t o the change of M,

4.3. Effect of changing the codebook sizeIn general, if each cluster is distinctly separate from anyother the CA algorithm can correctly identify clusters. Inaddition, the tolerance to the change ofM f is good, exceptwhen Mf s too large and some clusters are very close toeach other, or even overlap. To see this, we ran thealgorithm 100 times for each different value ofM f rangingfrom 8 to 400. In performing the quantisation, inputsequences of feature vectors are made different for each

~

8

V Q Time (s ) 0.0225

A g g o m e a t o n 0.0040t ime (s)

Avg. numb er 2.07of sync. steps

0 1.28

Avg. num ber 4.00of clusters

Accuracy (%) 10 0

20

0.031 1

0.0102

3.40

1.60

4.00

10 0

32

0.0508

0.0184

4.10

1.90

4.00

10 0

64 100 200

0.0917 0.1817 0.3136

0.0357 0.0469 0.1866

4.21 4.82 5.79

2.60 3.23 4.25

4.00 4.03 4.03

100 97 97

300 400

0.4416 0.901 1

0.3475 0.61 68

6.74 7.38

4.61 4.78

4.06 4.15

94 86

All data were averaged over 100 different runs. The accuracy denotes the percentage o f obtaining correct number of clusters in 100runs.

IE E Pmc.-Vis. Imuge Signul Process., Vol. 14X, No. 1, Februnry 2001 43


9/9

run. Using the same 4-Gaussian data in Fig.5. Table 3shows that when Mf increases, the average number ofresulting clusters N , is always nearly 4. The chance ofachieving N, = 4 is still very large (i.e. accuracy is 86%)even when Mf=400. Hence, the resulting numberofclusters is not sensitive to the change ofM f . It is worthnoting that the CA algo rithm still converges very fast evenwhen Mf = 400.

Despite the fact that the CA algorithm is robust to thechange of Mf, alidity of clustering in the case of verylarge Mr can be improved. In fact,a too large Mf ot onlyincurs more computational time, it may also causes erro-neous clusters. Thus, we present a recursive scheme tosolve this problem of guessingMf lindly. The idea is torun the VQ-agglomeration recursively, with the nextcycle using a better trial of Mf han its preceding cycle.For example, assume an initial value ofMf(CyZ= ).After convergence, N,(Cyl= 1) is obtained. Then, setM f ( C y l+ 1) = 2 x N,(Cyl) and rerun the VQ-agglomera-tion. We check ifN,(Cyl+ 1) = N,(Cyl). If not, continue tothe next cycle, and so on. The feasibility of this schemestems from the fact that the probability of two successivecycles that use different values ofMf esult in the sameerroneous N , is extremely small. Equivalently speaking,two successive cycles that use differentMf nd converge tothe same N , would indicate that the resultingN , is reliable

(see Table 3) . When M f = 400, the chance of identifying 4clusters is 86%. Suppose N,( 1) # 4, say N,( 1) = I O afterthe first cycle, we then useMf 2 x IO = 20 in the secondcycle. In Table 3, when Mf 2 0 , N , (2 )=4 would beobtained after convergence. As Nc(2)# N,( l), we contin-ued to the third cycle. WhenM f = 2 x 4 = 8, N,(3) = 4 wasobtained after finishing the third cycle. Thus, we haveobtained N,(CyZ + 1) = N,(Cyl) after three cycles. Bycomparing the computation time of the three cycles, thesecond and third cycles consumed onlya small portion oftotal computation time, implying that the recursive schemeis computationally efficient. More significantly, even if theinitial guess of Mf s smaller than the valid numberofclusters k, the recursive scheme can quickly identify thevalid clusters by trying ano therMf Cy1 + ) = 2 x N,(Cyl).

5 Conclusions and discussions

The presented CA algorithm can quickly agglomeratecodewords into a valid number of clusters by the natureof the input, regardless of the input distribution functionand whether the input clusters overlapor not. The beautyof the proposed VQ-agglomeration approach is threefold.First, it is free of the initial prototype problem. Secondly,the clustering process is fully autonomous, because pre-specifying the initial prototypes is not necessary. Thirdly, itis flexible in im plementation, because the approa ch advan-tageously permits designers greater flexibility in the selec-tion of the quantiser, e.g. quantisation can be conducted

either via conventional methods or via neural networktechniques. Finally, characterisations of the CA algorithmhave shown that the C A algorithm nicely fits the proposedVQ-agglomeration approach in achieving the goal of fastand autonomously clustering. In the future, one mayconsider incorporating the recursive VQ-agglomerationan d a defined global/local validity measure to verifyacorrect partition as wellas the valid number of clusters.

6 Acknowledgments

The authors would like to thank the National ScienceCouncil of Taiwan (Grant No. NSC 89-2213-E-019-020)for support of this research.

7

1

2

3

4

5

6

7

8

9

References

GOSE, E., JOHNSONBAUGH, R., andJOST, S.: Pattem recognitionand image analvsis (Prentice-Hall. 1996)N A D L E k , M., an d MITH, E.P.1 Patiem recognition engineering(Wiley Interscience, 1993)FRIGUI, H ., and KRISHNAP URAM , R .: A robust competitive cluster-ing alogrithm with applications in computer vision,IEE2 Fans. PatternAnal. Mach. Intell.,1999, 21, (5 ) , p p . 4 5 0 4 6 5KWON, S.H.: Cluster validity index for fuzzy clustering,Electron.Lett., 1998, 34, (22), pp. 2176-2177BOUDRAA, A.O.: Dynamic estimation of number of clusters in datasets, Electron. Lett., 1999, 35 , (19), pp. 1606-1608SUH, H., KIM, J.H., andW E E , C.H .: Convex-set-based fuzzy cluster-ing, IEEE Trans. Fuzzy Syst., 1999, 7 , (3), pp. 271-285RAVI, T.V, and GOWDA, K.C.: Clustering of symbolic objects usinggravitational approach,IEEE Trans. Sy.st. Man Cybern .B, Cybern.,

ZHUNG, X., HUANG , Y., PALAN IAPPAN, K., and LEE,J.S.: Gaus-sian mixture modeling, decomposition and applications,IEEE Trans.Signal Process., 1996, 5, pp. 1293-1302DUBES, R.C.: How many clusters are best? An experiment,PatternRecoenit.. 1987. 20. (6). nu. 645-663

1999, 29 , (6), pp. 888-894

1 0 CH G RU N G RU E N G , SEQUIN, C.H.: Optimal adaptivek-means algorithms with dynamic adjustment of learning rate,IEEETrans. Neural Netw., 1995, 6, ( I ) , pp. 157-169

11 ZAHN, C.T.: Graph-theoretical methods for detecting and describinggestalt clusters, IEEE Trans. Comput.,1971, C-20, pp. 68-86

12 WU, Z., and L EAHY, R.: An optimal graph theoretic approach to dataclustering: theory and its application to image segmentation,IEEETrans. Pattern Anal. Mach. Intell., 1993, 15, (1 l ) , pp . 1101-1 113

13 BEZDEK, J.C.: Pattern recognition with fuzzy objective functionalgorithms (Plenum Press, New York, 1981)

14 LINDE, Y., BUZO , A., and GRAY, R.M.: An algorithm for vector

quantizer design, IEEE Trans. Commun.,1980,COM-28, (l) , pp. 84-9515 WANG, J.H., and PEN G, C.Y.: Competitive neural network schem e for

learning vector quantization,Electron. Lett.,1999 ,35, (9), pp. 725-726I6 G ALANO POULOS, S.A., and AHALT, S.C.: Codeword distribution

for frequency sensitive competitive learning with one-dimensional inputdata, IEEE Trans. Neural. Netw., 1996, 7, (3), pp. 752-756

17 CHOI, , DOO-IL, , and PARK, S.H.: Self-creating and organizingneural networks,IEEE Trans. Ne~ iralNehv., 1994, 5, (4), pp. 561-575

18 MARTTNETZ, T.M., BERKOVICH, S.G., and SCHULTEN,K.J.:Neural-gas network for vector quantization and its application totime-series prediction, IEEE Trans. Neural Net w, 1993, 4, (4), pp.558-5 69

19 FRITZKE, B.: Growing cell structures- self-organizing network forunsupervised and supervised learning,Neural New., 1994, 7, (9), pp.14 41-1460

20 HAYKIN, S.: Neural networks (Prentice-Hall, 1998,2nd edn.)21 WRIGHT, W.E.: Gravitational clustering,Pattern Recognit., 1977, 9,

(3), pp. 151-166

44 IE E Proc.-Vis. Image Signal Process., h l . 148, No . I , February 2001

Date post:	06-Apr-2018
Category:	Documents
Upload:	tuhma
View:	217 times
Download:	0 times

J.-H.Wang and J.-D.Rau- VQ-agglomeration: a novel approach to clustering

Documents