+ All Categories
Home > Documents > A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with...

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with...

Date post: 14-Dec-2015
Category:
Upload: jorge-hiller
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
36
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper- rectangles ENDO Yasunori MIYAMOTO Sadaaki
Transcript
Page 1: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

A New Algorithm of Fuzzy Clustering for Data with Uncertainties:

Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles

ENDO Yasunori MIYAMOTO Sadaaki

Page 2: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Outline

•Background and goal of our study•The concept of tolerance•New clustering algorithms for data with

tolerance•Numerical examples•Conclusion and future works

Page 3: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Introduction

Clustering is one of the unsupervised automatic classification. Classification methods classify a set of data into several groups.

Many clustering algorithms have been proposed and fuzzy c-means (FCM) is the most typical method of fuzzy clustering.

In this presentation, I would like to talk about one way to handle the uncertainty with data and present some new clustering algorithms which are based on FCM.

Page 4: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Uncertainty

In clustering, each data on a real space is regarded as one point in a pattern space and classified.

However, the data with uncertainty should be often represented not by a point but by a set.

Page 5: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Three examples of data with uncertainty•Example 1: Data has errors

When a spring scale of which the measurement accuracy is plus/minus 5g shows 450g, an actual value is in the interval from 445g to 455g.

Page 6: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Three examples of data with uncertainty•Example 2: Data has ranges

An apple has not only one color but also a lot of colors so that colors of the apple could not be represented as one point on color space.

Page 7: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Three examples of data with uncertainty•Example 3: Missing values exist in data

In case of a social investigation, if there are unanswered items in the questionnaire, the items are handled as missing values.

Page 8: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Background

In the past, these uncertainties of data have been represented as interval data. Some algorithms for interval data have been proposed (e.g., Takata and Miyamoto[1]).

In those algorithms, dissimilarity is defined between interval data by using particular measures, e.g., nearest-neighbor, furthest-neighbor or Hausdorff distance.

Page 9: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Background

The methodology of interval has the following disadvantages:

•We have to introduce a particular measure. But how do we select the adequate measure?

•Actually, only boundary of interval data is handled by these measures.

Page 10: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Goal of our studyFrom a view point of strict optimization

problem, we handle uncertainty as tolerance and consider the new type of optimization problem for the data with tolerance.

Moreover, we construct new clustering algorithms in the optimization framework. In these algorithms, dissimilarity is defined between target data by using L1 or squared L2 norm.

Page 11: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Features of proposed algorithms

The methodology of tolerance has the following advantages:

•Particular distances between intervals don’t have to be defined.

•Not only the boundary but also all region in tolerance is handled.

•Our discussion becomes mathematically simpler than using interval distances.

Page 12: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

The concept of tolerance

Page 13: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

The concept of tolerance

We define as the -th data on a dimensional vector space , and as the tolerance vector of .The constraint condition is shown by following expression.

Page 14: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

An example of tolerance vector on R

: Tolerance vectorIt is calculated in algorithm.

: ToleranceIt is decided before calculate.

Page 15: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Comparison of Tolerance and Other Measures

Nearest-neighbor method

Furthest-neighbor method

Proposed method

Page 16: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Proposed algorithms

Page 17: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Conventional fuzzy c-means

sFCM: standard fuzzy c-means

….. Number of clusters ….. Number of data ….. Number of dimensions of the pattern

space ….. Membership grade ….. Data ….. Cluster center

Page 18: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Conventional fuzzy c-meansAlgorithm Objective function

sFCM-L1

sFCM-L2

Page 19: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Optimization problem: sFCM-L2

•Objective function:

•Membership grade U :

•Cluster center V :

Page 20: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Algorithm: sFCM-L2

•Step1Set the initial value of V .

•Step2Update U by .

•Step3Update V by .

•Step4If is convergent, stop.Otherwise, go back to Step2.

Page 21: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Proposed algorithmsAlgorithm Objective function

sFCMT-L1

sFCMT-L2

The constraint condition:

Page 22: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

An example of tolerance vector on R

: Tolerance vectorIt is calculated in algorithm.

: ToleranceIt is decided before calculate.

Page 23: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Optimization problem: sFCMT-L2

•Objective function:

•Membership grade U :

•Cluster center V :

Page 24: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Optimization problem: sFCMT-L2

•Tolerance vector E :

Page 25: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Algorithm: sFCMT-L2

•Step1Set the initial values of V and E.

•Step2Update U by

.•Step3

Update V by .•Step4

Update E by .

•Step5If is convergent, stop.Otherwise, go back to Step2.

Page 26: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Numerical examples

Page 27: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Test data: sFCMT-L2

Page 28: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Diagnosis of heart disease data

•Heart disease database has five attributes. The result of diagnosis, presence or absence is known. The number of data is 866 and 560 data contains missing values in some attributes.

Attribute Number of missing values

Resting blood pressure 5

Maximum heart rate achieved 1

ST depression induced by exercise relative to rest

8

The slope of the peak exercise ST segment 255

Number of major vessels colored by fluoroscopy

557

Page 29: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Diagnosis of heart disease data

•In all algorithms, the convergence condition is

where is the previous optimal solution.In addition, in sFCM.

•To handle missing values as tolerance, we define it as follows.

Page 30: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Diagnosis of heart disease data

•We try to classify all 866 data with missing values by using proposed algorithms, and only 306 data without missing values by using conventional algorithms.

• In each algorithm, we give initial cluster centers at random and classify the data set into two clusters. We run this trial 1000 times and show the average of ratio of correctly classified results.

Page 31: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Diagnosis of heart disease data

• This tables shows the results of classifying only 306 data without missing values.

• This table shows the results of classifying all 866 data.

Algorithm Average ratio

sFCM-L1 70.0

sFCM-L2 75.2

Algorithm Average ratio

sFCMT-L1 68.6

sFCMT-L2 73.4

Page 32: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Diagnosis of heart disease data• This table shows the

results of classifying all 866 data by using the proposed algorithms in our research.

• This table shows the results of classifying all 866 data by using an algorithm which handles missing value as interval data and uses nearest-neighbor distance to calculate dissimilarity.

Algorithm Average ratio

sFCMT-L1 68.6

sFCMT-L2 73.4

Algorithm Average ratio

sFCMT-L1 69.0

sFCMT-L2 67.2

Page 33: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Conclusion and future works

•Conclusion▫We considered the optimization problems

for data with tolerance and solved the optimal solutions. Using the results, we have constructed new six algorithms.

▫We have shown the effectiveness of the proposed algorithms through some numerical examples.

Page 34: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Conclusion and future works

•Future works▫We will calculate other data sets with

tolerance.▫We will apply the concept of tolerance to

regression analysis, support vector machine and so on.

Page 35: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Thank you for your attention.

Page 36: A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

References

1.Osamu Takata, Sadaaki Miyamoto : “Fuzzy clustering of Data with Interval Uncertainties”, Journal of Japan Society for Fuzzy Theory and Systems, Vol.12, No.5, pp.686-695 (2000) (in Japanese)


Recommended