A new unsupervised approach for fuzzy clustering

Fuzzy Sets and Systems 158 (2007) 2118–2133www.elsevier.com/locate/fss

A new unsupervised approach for fuzzy clusteringEfendi N. Nasibov∗, Gözde Ulutagay

Department of Statistics, Faculty of Science & Arts, Dokuz Eylul University, Kaynaklar Campus, 35160 Buca, Izmir, Turkey

Received 8 September 2005; received in revised form 10 August 2006; accepted 27 February 2007Available online 12 March 2007

Abstract

In this paper, a new level-based (hierarchical) approach to the fuzzy clustering problem for spatial data is proposed. In thisapproach each point of the initial set is handled as a fuzzy point of the multidimensional space. Fuzzy point conical form, fuzzy�-neighbor points, fuzzy �-joint points are defined and their properties are explored. It is known that in classical fuzzy clusteringthe matter of fuzziness is usually a possibility of membership of each element into different classes with different positive degreesfrom [0,1]. In this study, the fuzziness of clustering is evaluated as how much in detail the properties of classified elements areinvestigated. In this extent, a new Fuzzy Joint Points (FJP) method which is robust through noises is proposed. Algorithm of FJPmethod is developed and some properties of the algorithm are explored. Also sufficient condition to recognize a hidden optimalstructure of clusters is proven. The main advantage of the FJP algorithm is that it combines determination of initial clusters, clustervalidity and direct clustering, which are the fundamental stages of a clustering process. It is possible to handle the fuzzy propertieswith various level-degrees of details and to recognize individual outlier elements as independent classes by the FJP method. Thismethod could be important in biological, medical, geographical information, mapping, etc. problems.© 2007 Elsevier B.V. All rights reserved.

Keywords: Clustering; Neighborhood relation; Fuzzy Joint Points (FJP); Fuzzy joint set

1. Introduction

There are some problems such as clustering, identification, optimization, etc. which have an important part among thedecision-making problems. The fuzzy sets theory could be widely used in solving these kinds of problems [6,7,14,15].Among these problems, clustering is the most important one in modern data mining technology which is used inprocessing large data bases [3]. The general philosophy of clustering is to divide the initial set into homogenous groupsbased on the similarity of properties. In such cases, patterns in the same group are tend to be as similar as possible toeach other while patterns in different groups are tend to be as dissimilar as possible.

In classical clustering, the boundary of different clusters is crisp such that each pattern is assigned to a unique class.On the other hand, the boundary between clusters could not be precisely defined in real life such that some of the patternscould belong to more than one cluster with different positive degrees of membership. In that case it is represented bythe fuzzy clustering instead of the classical clustering [7].

Various methods have been proposed for aforementioned problems in the literature [8,10,20,21,23]. Most of theearlier work is based on the Fuzzy c-Means (FCM) algorithm. They suppose the fuzziness of clustering with respect

∗ Corresponding author. Tel.: +90 536 509 79 69; fax: +90 232 453 42 65.E-mail address: [email protected] (E.N. Nasibov).

0165-0114/$ - see front matter © 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.fss.2007.02.019

http://www.elsevier.com/locate/fss

mailto:efendiprotect LY1extunderscore [email protected]

E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118–2133 2119

to the possibility of the membership of some elements into various classes. But in our research, a different approachof fuzziness based on a new Fuzzy Joint Points (FJP) method is proposed [17]. The FJP method’s basic difference,compared to the others, is its comprehension of fuzziness in a level-based (hierarchical) point of view. It means thathow much in detail the elements are considered in construction of homogenous groups. It is obvious that the elementsare more dissimilar from each other when they are discussed more in detail. The fuzzier the elements, more similar theyare. In this case, fuzziness of clustering points out the investigation of the considered properties more in detail. Since allof the elements will be dissimilar from each other in minimal fuzziness degree of zero, each element can be consideredas an individual cluster. On the other hand, in maximal degree of fuzziness, all of the elements can be considered to besimilar to each other in such a way that they belong to one class. The elements which are more similar to each otherwill belong to one class, while the elements which are more dissimilar from each other will belong to different classeshaving different membership degrees from the interval [0, 1]. In other words, the FJP algorithm could be consideredas a level-based clustering algorithm. At each iteration of the clustering process, unlike the classical fuzzy clusteringin which the membership degrees of the points to the clusters are determined, the points which constitute the �-levelsets are determined in FJP algorithm.

There are other kinds of clustering algorithms similar to FJP in the literature, e.g. DBSCAN, GDBSCAN, OPTICS[4,5,9,13,19]. DBSCAN is one of the clustering algorithms which is based on inter-cluster densities. In this algorithm,distance queries are made for each point in data set for pre-determined � value. It also investigates whether the pointsin �-neighborhood of the point are more than the given MinPts value or not [13]. The MinPts value is used in order toassign points to the clusters.

GDBSCAN algorithm is proposed for the density-skewed case [19]. In this method, � and MinPts values are de-termined by the user according to the densities. Set densities are arranged in increasing order and the sets with fewerdensities are joined by using Greedy algorithm. DBSCAN calculates many distance functions that increase the com-plexity of the algorithm. In order to reduce this complexity, OPTICS algorithm is proposed [4,5]. In this algorithm,distance queries of �′ which are smaller than � are made and distinct distance functions are used only if it is desired toobtain real clustering. A data set can be represented in OPTICS while multidimensional projection is not possible inDBSCAN.

Finding the optimal cluster number, specifying initial clusters and direct methods for clustering with iterative de-velopment are fundamental problems of FCM-type clustering algorithms. Among these methods, K-nearest neighbor(KNN) and Mountain method are widely used [20,21,23]. But these methods have some disadvantages. For instance,the basic disadvantages of KNN are necessity to a priori given number of clusters and to assign equal number ofelements to each class. The basic disadvantage of Mountain method is complexity of its calculations.

FJP method, presented in our study, does not have these disadvantages [17]. Another significance of the proposedmethod is that the noise robustness could be fine-tuning on and the outliers could be considered as individual classeswhile in most of the known methods are not. This situation could be important in biological, medical, etc. problems inorder to recognize new forms of living objects.

The fundamental idea of the FJP method is to compute the fuzzy relation matrix based on the distance betweenpoints. Then, for certain � ∈ [0, 1], �-level sets and equivalence classes are constructed. At the same time, these �-degree equivalence classes determine each �-level set of the fuzzy clusters. Also note that, these �-level sets are notcomputed for all � ∈ [0, 1] degrees, instead they are computed only for �-levels in which the number of clusters areaffected. Then, the final level set is computed based on the maximal change interval of the �’s. In other words, the �-leveldegree that reflects the cluster structure optimally and �-level set appropriate for this level are found simultaneously.In the third section of the paper, the FJP method is explained in detail.

2. Basic definitions and properties

Most of the distance-based clustering methods use the following classical Euclidian distance between the points aand b of p-dimensional space Ep

d(a, b) =√

p∑i=1

(ai − bi)2. (2.1)

2120 E.N. Nasibov, G. Ulutagay / Fuzzy Sets and Systems 158 (2007) 2118–2133

0

1

x1

� (x1,x2)

x2

a

A�

R

�

Fig. 1. Fuzzy conical point A = (a, R) ∈ F(E2) on the space E2.

a RR

0

1

x

�(x)

Fig. 2. Triangular fuzzy number as a point A = (a, R) ∈ F(E1) on the space E1.

Note that there are methods that use other distances. For example in [11] a clustering problem using FCM algorithmbased on the scaled distance is evaluated and its advantages are demonstrated. But in our work we use the classicaldistance given in the formula (2.1).

Let us denote the set of whole p-dimensional fuzzy sets of the space Ep by F(Ep). Let �A : Ep → [0, 1] denotethe membership function of the fuzzy set A ∈ F(Ep).

Definition 1. A conical fuzzy point A = (a, R) ∈ F(Ep) of the space Ep is a fuzzy set with membership function(Fig. 1)

�A(x) ={

1 − d(x, a)

Rif d(x, a)�R,

0 otherwise,(2.2)

where a ∈ Ep is the center of fuzzy point A, and R ∈ E1 is the radius of its support supp A, where

supp A = {x ∈ Ep| �A(x) > 0}.

The �-level set of conical fuzzy point A = (a, R) is calculated as

A� = {x ∈ Ep| �A(x)��} = {x ∈ Ep| d(x, a)�R · (1 − �)}. (2.3)

Note that an analogue of fuzzy conical point A = (a, R) ∈ F(E1) of space E1 is a triangular symmetrical fuzzynumber A = (a, R, R) (Fig. 2).

There are other definitions of fuzzy point in the literature. For example, in [22] a clustering problem with multidi-mensional fuzzy point data such as (2.2) is considered and a robust modification of the FCM algorithm is proposed.

In this study, we will use the short term “fuzzy point” instead of conical fuzzy point defined in (2.2).


x2

0

1

x1

a

A�

b

B�

T (A,B)

� (x1,x2)

�

Fig. 3. Fuzzy �-neighbor points A = (a, R) and B = (b, R) on the space E2.

Let A = (a, R) and B = (b, R) be fuzzy points from the set X ⊂ F(Ep). Denote a fuzzy neighborhood relationT : X × X → [0, 1] on the set X as following:

T (A, B) = 1 − d(a, b)

2R, (2.4)

where a ∈ Ep and b ∈ Ep are the centers of fuzzy points A and B, respectively, (Fig. 3). Eq. (2.4) may be written as

d(a, b) = 2R(1 − T (A, B)). (2.5)

It is obvious that the relation T is reflexive, i.e. ∀A ∈ X: T (A, A) = 1 is satisfied.

Definition 2. Let A and B be fuzzy points on the set X ⊂ F(Ep). If

T (A, B)�� (2.6)

is satisfied for fixed � ∈ (0, 1], then points A and B are called �-neighbor fuzzy points and it is denoted by A ∼� B

(Fig. 3).

The �-neighborhood approach given above is appropriate to the meaning of being �-degree similarity of the points.

Lemma 1. Fuzzy points A = (a, R) and B = (b, R) are �-neighbor fuzzy points, if and only if

d(a, b)�2R(1 − �) (2.7)

is satisfied where d(a, b) denotes the distance between centers of the fuzzy points A and B.

Proof. Let the fuzzy points A = (a, R) and B = (b, R) be �-neighbors. Then (2.6) holds with respect to Definition 2.Thus, (2.8) holds with respect to (2.4).

1 − d(a, b)

2R�� ⇒ d(a, b)�2R(1 − �). (2.8)

Now, suppose that (2.7) holds. Then

��1 − d(a, b)

2R= T (A, B),

i.e. (2.6) holds, which completes the proof. �

Definition 3. If there is a sequence of �-neighbor fuzzy points C1, . . . , Ck, k�0, for fixed � ∈ (0, 1], between thepoints A and B, i.e.

A ∼� C1, C1 ∼� C2, . . . , Ck−1 ∼� Ck and Ck ∼� B,

then the fuzzy points A and B are called �-joint fuzzy points.


R(1−�)

xa b

Fig. 4. Illustration of Lemma 2.

Definition 4. Let X ⊂ F(Ep) be a set of fuzzy points. If the fuzzy points A and B are �-joint for � ∈ (0, 1] and∀A, B ∈ X, then the set X is called fuzzy �-joint set.

Suppose that d(A�, B�) is the classical distance between the level sets A� and B�, i.e.

d(A�, B�) = min{d(x, y)|x ∈ A�, y ∈ B�}.

Lemma 2. The fuzzy points A and B are �-neighbors if and only if

A� ∩ B� �= �. (2.9)

Proof. Let the fuzzy points A and B be �-neighbors. Thus, (2.6) holds. First, assume that (2.9) does not hold, i.e.

A� ∩ B� = �.

Then, on the line, which joints the points a ∈ Ep and b ∈ Ep, there is x ∈ Ep, x /∈ A�, x /∈ B� which holds (Fig. 4).

d(a, x) > R(1 − �) and d(b, x) > R(1 − �). (2.10)

Taking into account that the points a, x and b lead on a line, from (2.10), the following may be written:

d(a, b) = d(a, x) + d(x, b) > 2R(1 − �).

But, due to Lemma 1, inequality given above contradicts �-neighborhood of the points A and B.Now, assume that (2.9) holds. Then ∃x : x ∈ A�, x ∈ B�. Consequently, because of (2.3) we have;

d(x, a)�R(1 − �) and d(x, b)�R(1 − �). (2.11)

Due to the triangular property of distance, it follows from (2.11) that

d(a, b)�d(a, x) + d(x, b)�2R(1 − �) ⇒ d(a, b)�2R(1 − �).

According to Lemma 1, the last inequality shows that the fuzzy points A and B are �-neighbors. This completes theproof. �

Let the relation T̂ : X × X → [0, 1] be the transitive closure of relation T : X × X → [0, 1]. Note that transitiveclosure is mentioned by using max–min composition.

Theorem 1. Any points A, B ∈ X of the finite set X are fuzzy �-joint points if and only if

T̂ (A, B)��. (2.12)

Proof. At first, suppose that the fuzzy points A and B are �-joint points. Then between the points A and B, a sequentof fuzzy points C1, . . . , Ck , k�0 exists, i.e.

T (A, C1)��, T (C1, C2)��, . . . , T (Ck−1, Ck)��, T (Ck, B)��. (2.13)


Remember that a transitive closure T̂ is the minimal transitive relation which covers the relation T, i.e.(a) ∀A, B ∈ X implies T̂ (A, B)�T (A, B);(b) ∀A, B, C ∈ X, T̂ (A, B)�� and T̂ (B, C)�� implies T̂ (A, C)��.Thus, from the property (a) given above, due to (2.13) we get

T̂ (A, C1)��, T̂ (C1, C2)��, . . . , T̂ (Ck−1, Ck)��, T̂ (Ck, B)��.

If we take into account of property (b), from the last inequalities (2.12) holds.Now, assume that (2.12) holds. Let us show that A and B are �-joint fuzzy points.With respect to the definition of transitive closure,

T̂ = T ∪ T 2 ∪ · · · ∪ T k−1 ∪ T k ∪ · · ·holds and for any reflective relation T on the set of n elements

T ⊂ T 2 ⊂ · · · ⊂ T n−1 = T n = T n+1 = · · · ,holds. Then ∃k�1 it holds that T̂ = T k . With respect to inequality (2.13) we get T k(A, B)��.

The last inequality denotes that the elements A and B may join with sequence (A, C1, . . . , Ck−1, B) with length k,and for all sequent pair of this sequence, it holds that

T (A, C1)��, T (C1, C2)��, . . . , T (Ck−1, B)��.

According to Definition 3, the last inequalities show that the points A and B are �-joint fuzzy points, which completesthe proof. �

3. Algorithm of the FJP method with noises

Let a data set X = {x1, x2, . . . , xn}, xi ∈ Ep, i = 1, n be given. It is required to divide the set X into homogenousgroups, i.e. to classify its elements. Number of classes is unknown a priori. Note that, in order to normalize the fuzzyrelation T : X × X → [0, 1], in FJP algorithm the radii of the considered fuzzy points are calculated as

R = max{d(xi, xj )|xi, xj ∈ X}2

≡ dmax

2.

Thus ∀A, B ∈ X the degree of the relation T (A, B) is defined as

T (A, B) = 1 − d(a, b)

dmax, (3.1)

that implies,

d(a, b) = dmax(1 − T (A, B)). (3.2)

Let N(x) denote a fuzzy neighborhood set of given point x ∈ X on the base of the fuzzy relation T, i.e.

N(x) = {(y, T (x, y)) | y ∈ X}.Let

N(x, �1) = {y ∈ X | T (x, y)��1}be the �1-level set of N(x), i.e. fuzzy �1-neighborhood set of the point x ∈ X.

Definition 5. A point x ∈ X is called a noise point with parameters �1, �2 for given �1 > 0 and �2 > 0, if card N(x, �1) <

�2 is satisfied, where card N(x, �1) = ∑y∈N(x,�1)

T (x, y) is the fuzzy cardinality of the set N(x, �1).

The FJP algorithm suggested below is robust through noises. In this algorithm each point for which certain �1 fuzzyneighborhood cardinality is smaller than certain �2 threshold is perceived as noise. Note that, by changing the �1 and�2 parameters, it is possible to change the sensitivity of the FJP algorithm through noises. It is obvious that in order


to turn off the sensitivity of the FJP through noises, it is enough to make �2 = 0. Finally, in the result of clustering, itwould be possible to assign the noise points to the nearest class.

In order to detect the optimal structure of clusters, the widest change interval of � parameter in which the number ofclusters remains unchanged is used (step 8) in the FJP algorithm. It is obvious that the changes of � parameter couldbe transformed into the changes of distances between points. In the next section, cluster validity analysis is made byusing this transformation and the conditions for the correct running of FJP algorithm is researched theoretically.

FJP Algorithm (with noises).

Step 1: Calculate: dij := d(xi, xj ), i, j = 1, n; dmax := maxi,j=1,n dij ; � := 0.01 · mini,j=1,n dij ; Set up the values�1 and �2; Let �0 := 1;

Step 2: Calculate the fuzzy relation Tij := 1 − dij

dmax, i, j = 1, n;

Step 3: Call the procedure NoiseFilter(�1, �2) to divide initial dataset X into core Xcore and noise Xnoise sets, i.e.X = Xcore ∪ Xnoise, Xcore ∩ Xnoise = �. Calculate the transitive closure T̂ of the relation T on the set Xcore;

Step 4: Let nc = count of elements Xcore; Denote: yi := xi , i = 1, nc; t := 1; k := nc;Step 5: Calculate: d(yi, yj ) = min{d(x′, x′′)|x′ ∈ yi, x

′′ ∈ yj }, i, j = 1, k;

dt := mini �=j

d(yi, yj ); �t := max

{1 − dt + �

dmax, 0

};

Step 6: Call the procedure Clusters(�t ) to calculate fuzzy �t -joint sets X1, X2, . . . , Xk , with conical fuzzy points(xi, dmax/2), i ∈ 1, nc, and to constitute number k of these sets with current value �t ;

Step 7: If k > 1 , then denote yi := Xi , i = 1, k; t = t + 1 and go to Step 5; If k = 1, then go to Step 8.Step 8: Calculate: ��i := �i − �i+1, i = 0, t − 1; z := arg maxi=0,t−1 ��i ; �̄ := �z − �;Step 9: Call the procedure Clusters(�̄) with parameter �̄;Step 10: �̄ is the optimal membership degree of clustering; k is the optimal number of clusters; X1, . . . , Xk is the

partition of the set X.Step 11: For each element x ∈ Xnoise repeat the step 12:Step 12: Calculate k∗ = arg min{dist (x, Xk) | k = 1, . . . , k̄}; Assign x to the Xk∗

;End.The procedure NoiseFilter(�1, �2) is used to divide initial dataset X into two disjunctive core Xcore and noise Xnoise

sets, i.e. X = Xcore ∪ Xnoise, Xcore ∩ Xnoise = �.Procedure NoiseFilter(�1, �2):Input parameters: �1 and �2;Output parameter: The sets Xcore and Xnoise;Step 1: Let X ≡ {x1, x2, . . . , xn} is the set of initial points, Xnoise = �;Step 2: For each element x ∈ X repeat the steps 3 and 4:Step 3: Calculate card N(x, �1) = ∑

y∈N(x,�1)T (x, y);

Step 4: If card N(x, �1) < �2 , then mark x as noise point:Xnoise = Xnoise ∪ {x};Step 5: Let Xcore = X\Xnoise;Step 6: Return the sets Xcore and Xnoise;End.The procedure Clusters(�) calculates a partition of the core set Xc into �-fuzzy joint sets with fixed level degree �.Procedure Clusters(�):Input parameter: �Output parameters: �-fuzzy joint sets X1, X2, . . . , Xk; k - number of these sets;Step 1: S := Xcore = {x1, x2, . . . , xnc }; k := 1;Step 2: Get the first element A ∈ S of the sets;Create sets: Xk := {B ∈ S|T̂ (A, B)��}; S := S\Xk;Step 3: If S �= �, then let k := k + 1 and go to Step 2;Otherwise go to Step 4;Step 4: Return the sets X1, X2, . . . , Xk and number k of these sets.End.


We would like to note that different clustering results could be found if different nonlinear formulas are used insteadof the definition of fuzzy point given in (2.2). In appendix A, an example of such a situation is given. The analysis ofthis curious problem could be the subject of future researches.

4. Cluster validity analysis

Let Xk, k = 1, t , be the cores of homogenous classes created with respect to clustering. Let us denote following(Fig. 5).

d ink = dmax ·

(1 − min

x,y∈XkT̂ (x, y)

),

d inmax = max

kd ink ,

doutmin = min

i,j{d(Xi, Xj )|i �= j},

doutmax = max

i,jd(Xi, Xj ).

As mentioned above the cluster validity criteria used in FJP algorithm is based on the largest �-level change intervalthat does not affect the number of clusters. As the fuzzy point membership function taken into consideration in thispaper is linear, by evaluating the change interval of � parameter as the change of distances between points, we can usethe following cluster validity function:

VFJP = doutmin − d in

max. (4.1)

In other words, the cluster structure which gives maximum value to the functional VFJP is considered as optimal.There are also various cluster validity criteria in the literature. Some of widely used criteria are given in Table 1.

endParaAs seen in Table 1, partition coefficient (PC), classification entropy (CE), and Fukuyamo–Sugeno (FS) depend on the

number of clusters. However, separation index (SI) depends not on the cluster number, but on the structure of clusters.FJP cluster validity index given in Table 1, also depends on the structure of clusters.

Theorem 2. If a hidden partition structure of the set X exists as X = ⋃ki=1 Xi , that holds

d inmax

doutmin

<1

2<

doutmin − d in

max

doutmax

(4.2)

then FJP algorithm will recognize this partition.

Proof. Suppose that a hidden structure with possible partition that holds the inequalities (4.2) exists. Values of �,calculated by cyclic use of the steps 5–7 of the FJP algorithm, are denoted as �0, �1, . . . , �t . With respect to (3.2)and verdict of Lemma 1, each value of the degrees �i , i = 0, t , implies circle, which defines interested �-level set.According to Lemma 2, intersections of these sets define the set of fuzzy �-neighbors for the given degree. As shownin (3.2), radii of these circles begin to increase when � = T (A, B) degrees begin to decrease.

It is obvious that if a partition of the set X is recognized, then it holds that

d inmax < dout

min < doutmax

and there is no point x, y ∈ X that holds

d inmax < d(x, y) < dout

min.

Consequently, some sequent values �z and �z+1 of the sequence �i , i = 0, t , that is generated by the step 5 of the FJPalgorithm, will be convenient to distances d in

max and doutmin.


d inmax

d outmind

outmax

Fig. 5. Illustration of cluster validity parameters of the FJP algorithm.

Table 1Description of cluster validity criteria

Validity criteria Functional description Optimal cluster number

Partition coefficient VPC = 1n

c∑i=1

n∑j=1

u2ij max(VPC, U, c)

Classification entropy VCE = − 1n

c∑i=1

n∑j=1

uij loga uij min(VCE, U, c)

Fukuyamo–Sugeno criteria VFSm =c∑

i=1

n∑j=1

umij [d2(xj , vi ) − d2(mX, vi)] min(VFS, U, c)

Separation index VSI = mini �=j d(ui ,uj )

maxi �(ui )max(VSI, U)

Fuzzy joint points criteria VFJP = doutmin − d in

max max(VFJP, U)

Let us denote the values of the relation T by 1 = �0, �1, �2 and �3, proper to the distances 0 = d0, dinmax, d

outmin and

doutmax.With respect to the steps 8–10, the FJP algorithm calculates the above considered partition when

�1 − �2 = ��z = �z − �z+1 = maxi=0,t−1

��i .

Assume that (4.2) holds. Let us consider at first

d inmax

doutmin

<1

2.

Thus following sequence of inequalities holds

2d inmax < dout

min ⇒ d inmax < dout

min − d inmax ⇒ d in

max − d0 < doutmin − d in

max. (4.3)

Consequently, by taking into account (3.1), we get

�0 − �1 =(

1 − d0

dmax

)−

(1 − d in

max

dmax

)= d in

max − d0

dmax

and

�1 − �2 =(

1 − d inmax

dmax

)−

(1 − dout

min

dmax

)= dout

min − d inmax

dmax. (4.4)

According to (4.3) we get

�0 − �1 < �1 − �2.


Then ∀�i , �j ∈ [�1, �0]|�i − �j |��0 − �1 < �1 − �2,

holds that implies ∀�i , �i+1 ∈ [�1, �0],�i − �i+1 < �1 − �2. (4.5)

Now let us consider the second part of the (4.2). Let

1

2<

doutmin − d in

max

doutmax

holds from which follows:

doutmax < 2(dout

min − d inmax)

thus,

doutmax − dout

min < doutmin − 2d in

max < doutmin − d in

max,

i.e.

doutmax − dout

min < doutmin − d in

max. (4.6)

For the relation T, we may write

�2 − �3 =(

1 − doutmin

dmax

)−

(1 − dout

max

dmax

)= dout

max − doutmin

dmax. (4.7)

Taking into account (4.7) and (4.4), from (4.6) it follows that:

�2 − �3 < �1 − �2.

Then ∀�i , �j ∈ [�3, �2]|�i − �j |��2 − �3 < �1 − �2,

holds, i.e. ∀�i , �i+1 ∈ [�3, �2] it is true that

�i − �i+1 < �1 − �2. (4.8)

Thus from (4.5) and (4.8)

�1 − �2 = maxi=0,t−1

��i

follows which completes the proof. �

5. Examples

In this section, it is aimed to demonstrate the behavior of the FJP algorithm with different adjustment of parameters,and to prove the correctness of the algorithm in view of cluster validity. As it is known, cluster validity process is a mainpart of the clustering algorithms in order to check correct number of clusters [18]. So, we compared the performanceof FJP method with the FCM algorithm with various cluster validity criteria. Descriptions of these validity functionalsand their optimal cluster number conditions are described in Table 1.

In Examples 5.1–5.4, we conducted experiments on four data sets which of two are widely used IRIS and Bensaid’s(we added noises to Bensaid’s data set) data sets [1,2]. The IRIS and original Bensaid’s data sets could be found at[12]. The other two data sets are artificially generated. Calculations are made by using C + + program [17] and thedata sets are shown in Figs. 6 and 7.


Fig. 6. (a) Synthetic-1 data; (b) Synthetic-2 data.

Fig. 7. (a) IRIS data; (b) Bensaid-like data (Bensaid with noises).

Example 5.1. Synthetic-1 is an artificially generated data set with 1130 points distributed mainly on three clusters(Fig. 6a). Noise points are added to the data set to demonstrate robustness of the FJP algorithm through noises.

The performance of the FJP algorithm with �1 = 0.9 and various �2 = 0.8 and �2 = 0.45 values are shown in Fig. 8.As seen in Fig. 8a and b, number of clusters is found accurately as three. In these figures cores of clusters are shownas numbers 0, 1, 2 . . ., respectively, and noises are shown as red rectangles. In the last stage, the noises are assigned tothe nearest cores as shown in Fig. 9a. For this data set, results determined by FCM algorithm are correct and shown inFig. 9b.

Synthetic-1 data set is also classified by FCM algorithm by using KNN and Mountain method to initialize clustersand with various cluster validity criteria given in Table 1. PC and CE detected the real structure and optimal number ofclasses accurately while FS index determined four classes with KNN and found non-realistic maximum cluster numberwith Mountain method. We pointed out these values with “MAX”.

Example 5.2. This example contains artificially generated synthetic data (Synthetic-2). 486 data points distributed ontwo clusters as shown in Fig. 6b.

FJP algorithm detects the real structure and optimal number of clusters for Synthetic-2 with resulting degree �̄ =0.9465 as shown in Fig. 10a. Red rectangles are taken into account as noises for �1 = 0.9 and �2 = 0.3. Fig. 10bdepicts a partition generated by FCM using KNN algorithm for initial clusters. It is obvious that, although FCM foundthe optimal number of clusters, the partition is far away from the optimal structure of clusters.


Fig. 8. Graphic results of FJP algorithm for Synthetic-1 data with resulting degree �̄ = 0.9787: (a) for �1 = 0.9, �2 = 0.8; (b) for �1 = 0.9,�2 = 0.45.

Fig. 9. Graphic results for Synthetic-2 data: (a) FJP algorithm with noises assigned to the nearest cores; (b) FCM algorithm where PC for clustervalidity and KNN algorithm for initial clusters are used.

Fig. 10. Graphic results for Synthetic-2 data: (a) FJP algorithm with resulting degree �̄ = 0.9465, for �1 = 0.9, �2 = 0.3; (b) FCM algorithm wherePC and CE for cluster validity and KNN algorithm for initial clusters are used.


Fig. 11. Graphic result for IRIS data set by using: (a) FJP algorithm with resulting degree �̄ = 0.9448, for �1 = 0.9, �2 = 0.2 and (b) FCM algorithmwhere partition coefficient for cluster validity and KNN algorithm for initial clusters are used .

Table 2Comparison of validity indexes for different validity criteria

Data set Data count Expert Initial clusters PC CE SI FS FJP

Synthetic-1 1130 3 Mountain 3 3 MAX MAX 3KNN 3 2 MAX 4

Synthetic-2 486 2 Mountain 3 2 MAX MAX 2KNN 2 2 MAX MAX

IRIS 150 2 or 3 Mountain 2 2 2 3 2KNN 2 2 2 3

Bensaid-like 200 3 Mountain 2 2 MAX MAX 3KNN 2 2 MAX MAX

Example 5.3. Anderson’s [1] real data set IRIS has n = 150 data points in a four-dimensional measurement space(petal length, sepal length, petal width and sepal width) that represent three physical clusters. In their numericalrepresentation, two of the three clusters are hardly discernable while the third one is well separated from the other two.One can argue in favor of both c = 2 or 3 for IRIS because of the substantial overlap of two of the clusters. We alsotake c = 2 as the optimal choice in view of the geometric structure of IRIS as mentioned in [18]. In this paper, twodimensions (petal length and petal width) of IRIS data set are considered (Fig. 11a). The data set is classical so we donot add extra noise points.

IRIS data set has being classified by using FJP algorithm and by using FCM algorithm with PC in order to find theoptimal number of clusters by using KNN algorithm for initial clusters. In both situations, optimal number of clustersis found to be two. However, the FCM algorithm failed to assign the element circled in Fig. 11b to the correct cluster.

According to Table 2, FJP correctly identifies c = 2 with resulting degree of �̄ = 0.9448, for �1 = 0.9, �2 = 0.2 asshown in Fig. 11a. Fukuyamo–Sugeno index detects overlapping clusters and finds c = 3.

Example 5.4. Bensaid’s [2] two-dimensional real data set composed of three clusters is used in this example. We havesaved the structure of this set but have increased count of core points and added noise points (total count of points is200).

FJP algorithm detected the real structure of classes and its optimal number accurately with �̄ = 0.9208, for �1 = 0.9,�2 = 0.2 (Fig. 12a). On the other hand, FCM algorithm with PC, and Fukuyamo–Sugeno validity index failed indetecting the correct cluster number and the natural cluster structure as shown in Fig. 12b. The reason for FCMs badpartition in Fig. 9b is that low values of least-squared criteria correspond to partitions with approximately equal clusterradii.


Fig. 12. Graphic results for Bensaid data (with noises): (a) FJP algorithm with resulting degree �̄ = 0.9208, for �1 = 0.9, �2 = 0.2; (b) FCMalgorithm where PC, CE for cluster validity and KNN algorithm for initial clusters are used.

6. Conclusions

In this presented work, definitions of fuzzy conical point, fuzzy �-neighbor points and fuzzy �-joint points aregiven and some of their properties are investigated. In order to realize a clustering process, we have proposed the FJPalgorithm. It is based on the heuristic which automatically determines the optimal level degree and recognizes theoptimal number of clusters. We have proved the sufficient condition to recognize the hidden optimal structure of thepossible classes by the FJP algorithm.

As shown in the paper, the FJP algorithm transforms a distance-based approach of clustering to the fuzzy level-sets-based approach. This different fuzzy approach makes more possible to maintain detailed research by using fuzzyrelation, fuzzy distance and etc. techniques of the fuzzy sets theory.

As it is mentioned, the FJP algorithm has some satisfactory advantages. The main advantage of the FJP algorithm isthat it combines determination of initial clusters, cluster validity and direct clustering, which are the fundamental stagesof a clustering process. We have observed in our experiments that as the calculations to find global optimal value ofthe other cluster validity functionals in the FCM method are a non-trivial problem, usage of the FJP method decreasesthe total calculation time of a clustering process. So, the FJP algorithm could be used as a supplementary algorithm tocalculate cluster validity and initial clusters in FCM algorithm.

In addition to this, the FJP method could be used as an independent clustering algorithm. It is possible to handlethe fuzzy properties with various level-degrees of details and to recognize individual outlier elements as independentclasses by the FJP method. This situation could be important in biological, medical, etc. problems in order to recognizenew forms of living objects. As a result of our experiments, we found out that as the number of points in the dataset increases, the best value of the noise sensitivity parameter �2 should be closer to 1. But, it would be interesting toinvestigate the adjustment of the sensitivity parameters �1 and �2 more in detail.

Acknowledgments

The authors are grateful to the referees for their valuable comments and suggestions which improved the presentationof the paper.

Appendix A

Example A.1. If different nonlinear monotonic function is used in the definition of a fuzzy point, different resultswould be produced by the FJP algorithm. For instance, let us examine the following data set with eight points. As itis seen from the figures, when the fuzzy point membership function is linear (Fig. 13a), the widest change interval ofthe � parameter in which the number of clusters remains unchanged is marked as 3 (of course, the lower layer that


1

2

3

1

2

3

Fig. 13. The effect of different membership functions for a fuzzy point on the results of the FJP algorithm.

is convenient to 1 as cluster number is not taken into consideration), and according to the working principle of thealgorithm, such a situation is convenient to the formation of two clusters.

But, when the fuzzy point membership function is nonlinear as given in Fig. 13b, the widest change interval ismarked as 2, and this is convenient to the formation of four clusters. Namely, the result of the algorithm will producetwo clusters if the fuzzy point membership function is designated as given in Fig. 13a, while it will produce four clusterswith Fig. 13b.

References

[1] E. Anderson, The IRISes of the Gaspe peninsula, Bull. Amer. IRIS Soc. 59 (1935) 2–5.[2] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A. Arrington, R.F. Murtagh, Validity-guided (re)clustering with applications

to image segmentation, IEEE Trans. Fuzzy Systems 4 (2) (1996) 112–123.[3] J.P. Bigus, Data Mining with Neural Networks, McGraw-Hill, New York, 1996.[4] S. Brecheisen, H.P. Kriegel, M. Pfeifle, Efficient density-based clustering of complex objects, in: Proc. of the Fourth IEEE Internat. Conference

on Data Mining, United Kingdom, 2003, pp. 43–50.[5] M. Daszykowski, B. Walczak, D.L. Massart, Density-Based Clustering for Exploration of Analytical Data, Springer, Bioanal. Chem., 2004,

370–372.[6] D. Dubois, H. Prade, Fuzzy Sets and Systems, Theory and Applications, Academic Press, New York, 1980.[7] D. Dumitrescu, B. Lazzerini, L.C. Jain, Fuzzy Sets and Their Application to Clustering and Training, CRC Press LLC, Boca Raton, 2000.[8] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybernetics 3 (3) (1973)

32–57.[9] M. Ester, H.P. Kriegel, J. Sander, X. Xu, A density based algorithm for discovering clusters in large spatial datasets with noise, in: Proc. second

Internat. Conf. KDDD, Portland, Oregon, 1996, pp. 1232–1239.[10] J. Grabmeier, A. Rudolph, Techniques of cluster algorithms in data mining, Data Mining and Knowledge Discovery 6 (2002) 303–360.[11] R.E. Hammah, J.H. Curran, On distance measures for the Fuzzy K-Means algorithm for joint data, Rock Mech. Rock Eng. 32 (1) (1999) 1–27.[12] 〈http://www.ics.uci.edu/∼mlearn/databases〉.[13] E. Januzaj, H.P. Kriegel, M. Pfeifle, Towards effective and efficient distributed clustering, in: Proc. of the Fourth IEEE Internat. Conf. on Data

Mining, Melbourne, 2003, pp. 49–58.[14] E.N. Nasibov, A problem of identification of states of a system from fuzzy values of informational features, Automat. Control Comput. Sci. 36

(2) (2002) 27–33.[15] E.N. Nasibov, Identification of states of complex systems with estimation of admissible measurement errors on the basis of Fuzzy information,

Cybernet. Systems Anal. 38 (1) (2002) 53–59.[16] E.N. Nasibov, S. Senol, G. Ulutagay, A visual processing system for fuzzy clustering, IJSIT Lecture Notes of First Internat. Conf. on Informatics

vol. 1 (2) (2004) 123–128.

http://www.ics.uci.edu/mlearn/databases


[17] E.N. Nasibov, G. Ulutagay, A new approach to clustering problem using the fuzzy joint points method, Automat. Control Comput. Sci. 39 (6)(2005) 8–17.

[18] N.R. Pal, J.C. Bezdek, On cluster validity for the fuzzy c-means model, IEEE Trans. on Fuzzy Systems 3 (3) (1995) 370–379.[19] Y. Shi-hong, L. Ping, G. Ji-dong, Z. Shui-geng, Using greedy Algorithm: DBSCAN revisited, J. Zhejiang Univ. Sci. China 5 (11) (2004)

1405–1412.[20] R.P. Velthuizen, L.O. Hall, L.P. Clarke, M.L. Silbiger, An investigation of mountain method clustering for large data sets, Pattern Recognition

30 (7) (1997) 1121–1135.[21] R.R. Yager, D.P. Filev, Approximate clustering via the mountain method, IEEE Trans. Systems Man Cybernet. 24 (8) (1994) 1279–1284.[22] M.S. Yang, H.H. Liu, Fuzzy clustering procedures for conical fuzzy vector data, Fuzzy Sets and Systems 106 (1999) 189–200.[23] N. Zahid, O. Abouelala, M. Limouri, A. Essaid, Fuzzy clustering based on K-nearest-neighbours rule, Fuzzy Sets and Systems 120 (2001)

239–247.

Date post:	24-Apr-2023
Category:	Documents
Upload:	independent
View:	1 times
Download:	0 times

A new unsupervised approach for fuzzy clustering

Documents