+ All Categories
Home > Documents > Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

Date post: 14-Dec-2016
Category:
Upload: iqbal
View: 217 times
Download: 2 times
Share this document with a friend
14
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013 855 Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means Hesam Izakian, Student Member, IEEE, Witold Pedrycz, Fellow, IEEE, and Iqbal Jamal Abstract—In spatiotemporal data commonly encountered in ge- ographical systems, biomedical signals, and the like, each datum is composed of features comprising a spatial component and a temporal part. Clustering of data of this nature poses challenges, especially in terms of a suitable treatment of the spatial and tem- poral components of the data. In this study, proceeding with the objective function-based clustering (such as, e.g., fuzzy C-means), we revisit and augment the algorithm to make it applicable to spa- tiotemporal data. An augmented distance function is discussed, and the resulting clustering algorithm is provided. Two optimiza- tion criteria, i.e., a reconstruction error and a prediction error, are introduced and used as a vehicle to optimize the performance of the clustering method. Experimental results obtained for synthetic and real-world data are reported. Index Terms—Fuzzy clustering, reconstruction and prediction criteria, spatiotemporal data, weather data. I. INTRODUCTION G IVEN the unprecedented growth of spatiotemporal data encountered in different application domains such as, e.g., geography, climatology, and health surveillance systems, their analysis has become more important and practically relevant. In spatiotemporal data, each data point is composed of two parts, namely, a spatial component, typically denoting its loca- tion (say, x y or latitude–longitude coordinates), and temporal part, comprising one or more time series associated with the spa- tial coordinates. Daily average temperature recorded at different weather stations, number of disease cases reported in different cities in a monthly period, and hourly air pollution recordings are examples of this kind of data. Clustering of spatiotemporal data reveals interesting struc- tures that could be used in different applications. The fuzzy C-means (FCM) algorithm [9] is one of the commonly used Manuscript received January 3, 2012; revised May 23, 2012 and September 8, 2012; accepted November 6, 2012. Date of publication December 11, 2012; date of current version October 2, 2013. This work was supported in part by the Alberta Innovates—Technology Futures and Alberta Advanced Education and Technology, the Natural Sciences and Engineering Research Council of Canada, and the Canada Research Chair Program. H. Izakian is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, T6G 2V4 (e-mail: izakian@ ualberta.ca). W. Pedrycz is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, T6G 2V4, with the Department of Electrical and Computer Engineering Faculty of Engineering, King Abdu- laziz University, Jeddah 21589, Kingdom of Saudi Arabia, and with the Sys- tem Research Institute, Polish Academy of Sciences, Warsaw 00-716, Poland (e-mail: [email protected]). I. Jamal is with AQL Management Consulting Inc., Edmonton, AB, Canada, T6J 2R8 (e-mail: [email protected]). Digital Object Identifier 10.1109/TFUZZ.2012.2233479 clustering techniques and is inherently associated with some underlying objective function. To cope with the specificity of the spatiotemporal data, the generic objective function of the FCM requires a thorough examination and revision of its for- mulation. In this paper, we introduce a concept and offer the ensuing algorithmic developments by using the generic FCM algorithm (although the main line of thought is equally valid for any objective function-based clustering). The crux of the method is to effectively handle the data reflecting the spatial and tempo- ral facet of the problem (data) in order to preserve the essence of the problem. For this purpose, we revisit the distance function and augment the “standard” Euclidean distance. Equally impor- tant is the fact that the augmented distance is endowed with a substantial level of flexibility so that the contributions coming from the temporal and spatial parts of the data could be carefully balanced and optimized. The resulting flexibility is exploited to minimize two performance indexes, namely, a reconstruction error or a prediction error. To deal with the reconstruction error is essential when assessing the quality of clusters—information granules and quantifying their role being played in the processes of information granulation and de-granulation. The prediction aspects are of interest when forecasting a temporal component of the data given their specific location (spatial information). Interestingly enough, the objective function of the FCM algo- rithm has been subject to various modifications in order to cope with the specificity of the problem. In [19], by adding a gain field, the FCM objective function has been reformulated and optimized in an iterative fashion for segmentation and classifi- cation of M-FISH images to detect chromosomal abnormalities and support a genetic disease diagnosis. In [62], a fuzzy clus- tering approach for data points comprising various object types was proposed by reformulating the FCM objective function and optimizing a constrained optimization problem. A membership matrix and a ranking matrix have been employed in the opti- mization procedure, where the membership matrix comprises membership degrees of objects to clusters, while the ranking matrix measures how representative an object is in comparison with other objects in various clusters. In [61], a general def- inition of distance functions that preserve the applicability of the centroid-based alternating optimization in FCM is provided. They showed that any distance function that can be used in the FCM algorithm is an instance of the generalized point-to- centroid distance and can be derived by a differentiable convex function. In addition, in [49], some methods and guidelines to design collaborative fuzzy clustering algorithms for clustering distributed data among different data sites were developed. This study is organized as follows. We start with a brief review of the research being reported so far. The two 1063-6706 © 2013 IEEE
Transcript
Page 1: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013 855

Clustering Spatiotemporal Data: An AugmentedFuzzy C-Means

Hesam Izakian, Student Member, IEEE, Witold Pedrycz, Fellow, IEEE, and Iqbal Jamal

Abstract—In spatiotemporal data commonly encountered in ge-ographical systems, biomedical signals, and the like, each datumis composed of features comprising a spatial component and atemporal part. Clustering of data of this nature poses challenges,especially in terms of a suitable treatment of the spatial and tem-poral components of the data. In this study, proceeding with theobjective function-based clustering (such as, e.g., fuzzy C-means),we revisit and augment the algorithm to make it applicable to spa-tiotemporal data. An augmented distance function is discussed,and the resulting clustering algorithm is provided. Two optimiza-tion criteria, i.e., a reconstruction error and a prediction error, areintroduced and used as a vehicle to optimize the performance ofthe clustering method. Experimental results obtained for syntheticand real-world data are reported.

Index Terms—Fuzzy clustering, reconstruction and predictioncriteria, spatiotemporal data, weather data.

I. INTRODUCTION

G IVEN the unprecedented growth of spatiotemporal dataencountered in different application domains such as, e.g.,

geography, climatology, and health surveillance systems, theiranalysis has become more important and practically relevant.In spatiotemporal data, each data point is composed of twoparts, namely, a spatial component, typically denoting its loca-tion (say, x − y or latitude–longitude coordinates), and temporalpart, comprising one or more time series associated with the spa-tial coordinates. Daily average temperature recorded at differentweather stations, number of disease cases reported in differentcities in a monthly period, and hourly air pollution recordingsare examples of this kind of data.

Clustering of spatiotemporal data reveals interesting struc-tures that could be used in different applications. The fuzzyC-means (FCM) algorithm [9] is one of the commonly used

Manuscript received January 3, 2012; revised May 23, 2012 and September8, 2012; accepted November 6, 2012. Date of publication December 11, 2012;date of current version October 2, 2013. This work was supported in part by theAlberta Innovates—Technology Futures and Alberta Advanced Education andTechnology, the Natural Sciences and Engineering Research Council of Canada,and the Canada Research Chair Program.

H. Izakian is with the Department of Electrical and Computer Engineering,University of Alberta, Edmonton, AB, Canada, T6G 2V4 (e-mail: [email protected]).

W. Pedrycz is with the Department of Electrical and Computer Engineering,University of Alberta, Edmonton, AB, Canada, T6G 2V4, with the Departmentof Electrical and Computer Engineering Faculty of Engineering, King Abdu-laziz University, Jeddah 21589, Kingdom of Saudi Arabia, and with the Sys-tem Research Institute, Polish Academy of Sciences, Warsaw 00-716, Poland(e-mail: [email protected]).

I. Jamal is with AQL Management Consulting Inc., Edmonton, AB, Canada,T6J 2R8 (e-mail: [email protected]).

Digital Object Identifier 10.1109/TFUZZ.2012.2233479

clustering techniques and is inherently associated with someunderlying objective function. To cope with the specificity ofthe spatiotemporal data, the generic objective function of theFCM requires a thorough examination and revision of its for-mulation. In this paper, we introduce a concept and offer theensuing algorithmic developments by using the generic FCMalgorithm (although the main line of thought is equally valid forany objective function-based clustering). The crux of the methodis to effectively handle the data reflecting the spatial and tempo-ral facet of the problem (data) in order to preserve the essence ofthe problem. For this purpose, we revisit the distance functionand augment the “standard” Euclidean distance. Equally impor-tant is the fact that the augmented distance is endowed with asubstantial level of flexibility so that the contributions comingfrom the temporal and spatial parts of the data could be carefullybalanced and optimized. The resulting flexibility is exploited tominimize two performance indexes, namely, a reconstructionerror or a prediction error. To deal with the reconstruction erroris essential when assessing the quality of clusters—informationgranules and quantifying their role being played in the processesof information granulation and de-granulation. The predictionaspects are of interest when forecasting a temporal componentof the data given their specific location (spatial information).

Interestingly enough, the objective function of the FCM algo-rithm has been subject to various modifications in order to copewith the specificity of the problem. In [19], by adding a gainfield, the FCM objective function has been reformulated andoptimized in an iterative fashion for segmentation and classifi-cation of M-FISH images to detect chromosomal abnormalitiesand support a genetic disease diagnosis. In [62], a fuzzy clus-tering approach for data points comprising various object typeswas proposed by reformulating the FCM objective function andoptimizing a constrained optimization problem. A membershipmatrix and a ranking matrix have been employed in the opti-mization procedure, where the membership matrix comprisesmembership degrees of objects to clusters, while the rankingmatrix measures how representative an object is in comparisonwith other objects in various clusters. In [61], a general def-inition of distance functions that preserve the applicability ofthe centroid-based alternating optimization in FCM is provided.They showed that any distance function that can be used inthe FCM algorithm is an instance of the generalized point-to-centroid distance and can be derived by a differentiable convexfunction. In addition, in [49], some methods and guidelines todesign collaborative fuzzy clustering algorithms for clusteringdistributed data among different data sites were developed.

This study is organized as follows. We start with abrief review of the research being reported so far. The two

1063-6706 © 2013 IEEE

Page 2: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

856 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

fundamental concepts being of essential relevance in the contextof the study, that is, a representation of time series and quantify-ing distance between time series are discussed. In Section III, weintroduce spatiotemporal clustering and formulate the ensuingoptimization problem. In Section IV, two performance indexes(evaluation criteria) casting the clustering results in the setting ofreconstruction and prediction problems are investigated. In Sec-tions V and VI, experimental results dealing with synthetic dataand real-world problems are reported. Conclusions are coveredin Section VII.

II. CLUSTERING SPATIOTEMPORAL DATA—A FOCUSED

LITERATURE REVIEW

In real-world applications, we encounter different kinds ofspatiotemporal data. Kisilevich et al. [51] divided spatiotem-poral data into five categories including spatiotemporal events,georeferenced variables, georeferenced time series, moving ob-jects, and trajectories.

In spatiotemporal event data, there is a set of events, eachoccurred in a spatial location and coming with its timestamp.Clustering this type of data aims to find a set of events thatare close to each other in both space and time. One of thecommonly used methods for clustering these types of data isscan statistics [52], [53]. In this method, one moves a cylindri-cal window of variable size and shape, across a geographicalregion to detect clusters of events with the highest likelihoodratios. In [54], an extended version of FCM has been proposedto find circular clusters of hotspots in spatiotemporal geograph-ical information system data. For each timestamp, the events areclustered based on their spatial location, and then, a comparisonbetween occurred clusters in consecutive time stamps has beenperformed to conclude some interpretations about events. Wanget al. [55] proposed two spatiotemporal clustering methods,which are called ST-GRID and ST-DBSCAN, to detect seis-mic events in China and neighboring countries. The ST-GRIDmethod used a multidimensional grid that covers the entire spa-tiotemporal feature space. Then, by merging the dense neighborcells, spatiotemporal clusters were formed. ST-BDSCAN ex-tended DBSCAN [56] by redefining density reachability usingspatial and temporal radius. Both methods exploited an orderedk-dist graph [56] to determine their parameters.

Georeferenced time series are composed of a set of fixedgeographical coordinates, each corresponding to one or moretime series. Georeferenced variables data form a special caseof georeferenced time series where only the most recent pointof time series is available. Clustering this type of data aimsto group objects based on their spatial closeness and temporalsimilarities. In [57], FCM has been used to cluster weather timeseries. The Pearson correlation coefficient was employed as thesimilarity measure expressing closeness of two time series anda method to determine the number of clusters has been pro-posed. However, the method does not involve the spatial partof data in the clustering process. Deng et al. [58] proposed adensity-based spatiotemporal clustering. In this method, a spa-tial proximate network has been constructed using Delaunaytriangulation and a spatiotemporal autocorrelation analysis was

employed to define the spatiotemporal neighborhood. In [44],an extended version of FCM was proposed for image segmen-tation by considering the spatial location of pixels. This methodhas been considered by Coppi et al. [59] for clustering spa-tiotemporal data. In this approach, a spatial penalty term thatwas calculated using a spatial contiguity matrix has been addedto the objective function to guarantee an approximate spatialhomogeneity of the clusters.

Trajectories capture the movement behavior of a set of spa-tial objects in the form of time series. When the most recentposition of the objects is available, the data are called movingobjects data. Clustering of this kind of data aims to discovera behavior of a collection of objects, e.g., those occurring inurban traffic or animals’ migration. In [15], the Euclidean dis-tance between trajectories was used as a dissimilarity measure,whereas OPTICS [14] has been extended to cluster trajectories.Two methods, trajectory-OPTICS and a time-focused versionof that (called TF-OPTICS) were proposed. In [13], a proba-bilistic regression model for trajectory detection was proposedand expected maximization algorithm [12] has been employedto model trajectories. Kalnis et al. [47] proposed algorithms todiscover moving clusters in spatiotemporal data. In these meth-ods, the set of objects of a moving cluster change over time.At each time step, the location of objects has been consideredas a snapshot and a spatial clustering method like DBSCANwas used for clustering. Two snapshot clusters in consecutivetime steps were considered as moving clusters if a value of theirJaccard coefficient exceeds a certain threshold. A fuzzy cluster-ing for three-way data was proposed in [40]. In this structure,each data point was composed of objects, attributes, and situa-tions. The data are clustered based on not only individual timeinstances, but in addition, the similarity between structures hasbeen considered in different time steps. A survey of clusteringspatiotemporal data is reported in [51].

A. Time-Series Representation Methods

Time series have been investigated in a variety of problemsof data mining such as clustering [36], [39], classification [8],[45], [46], forecasting [42], [43], [60], and modeling [38], [41].Based on the type of data being used, the methods of time-series clustering can be split into three categories [16], [27],namely those using raw time-series data [32]–[34], model-based methods [24], [35], [37], and representation-based meth-ods [16], [28]–[30].

There are a number of methods proposed in the literatureto represent time series. In general, such representation meth-ods are categorized into data-adaptive and non-data-adaptivemethods [17], [18], [20]. Adaptive piecewise constant approxi-mation [18], piecewise linear approximation [22], singular valuedecomposition [6], and symbolic aggregate approximation [17]are examples of data-adaptive methods. Discrete Fourier trans-form (DFT) [1], Chebyshev polynomials [21], discrete wavelettransform (DWT) [3], [4], and piecewise aggregate approxima-tion (PAA) [2] are well-known methods belonging to the secondcategory.

Page 3: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 857

In this paper, we use three commonly studied methods torepresent time series, namely, DFT, PAA, and DWT. They can beviewed as sound representatives of the large set of the methodsexisting in the literature. In what follows, we review them verybriefly.

1) Discrete Fourier Transform: The DFT models the timeseries using a set of sine and cosine waves. It represents thetime series in a frequency domain. For a time series y of lengthN , DFT is composed of N complex numbers, each describinga sine/cosine wave given by

fk =1√N

N −1∑

i=0

yi exp(−j2πki/N) k = 0, 1, . . . , N − 1

(1)where j =

√−1. The original time series can be reconstructed

by running an inverse transform given by

yi =1√N

N −1∑

k=0

fk exp(j2πki/N) i = 0, 1, . . . , N − 1 (2)

Faloutsos et al. [1] employed DFT to index time series. Theynoted that the most important features of each sequence are thefirst k (real and imaginary) coefficients (k << N) of the DFTtransform, while the other coefficients assume values close tozero. By having these k coefficients, the original time series canbe reconstructed with a little loss of information.

2) Piecewise Aggregate Approximation: This method pro-vides a simple and efficient way of time-series representation intime domain offering a substantial dimensionality reduction [2].PAA divides the time series y into k (k << N) segments ofequal length and determines the mean value of data points lyingwithin each segment as the representatives of the original timeseries. More formally, we have the representation in the form ofa vector f whose coordinates are expressed as follows:

fi =k

N

Nk (i+1)−1∑

j= Nk i

yj , i = 0, 1, . . . , k − 1. (3)

3) Discrete Wavelet Transform: Wavelets are basis functionsthat describe time series in a time–frequency joint representa-tion. In [3] and [4], DWT is used as an efficient representationmethod to index time-series data. A suitable method to calculatethe DWT coefficients is a pyramid algorithm [5]. In this method,the length of time series N has to be a power of two. For timeseries that do not satisfy this condition, zero padding is realized.DWT converts the time series into two types of coefficients re-sulting from low-pass filters (also called scaling function) andhigh-pass filters (also called wavelet function), each with lengthN/2, given by

ai =12

N −1∑

j=0

c2i−j+1yj , i = 0, 1, . . . ,N

2− 1 (4)

fi =12

N −1∑

j=0

(−1)j cj−2iyj , i = 0, 1, . . . ,N

2− 1 (5)

where a= [a0 , a1 , . . . , aN/2−1 ]T are scaling coefficients, and f= [f0 , f1 , . . . , fN/2−1 ]T are wavelet coefficients present at thefirst level. To calculate the wavelet coefficients at the next level,the aforementioned calculations are performed over the scalingcoefficients a. The procedure is recursive until the requirednumber of iteration has been reached. For each wavelet function,there are a number of nonzero coefficients. For example, for theHaar function, the nonzero coefficients are c0 = c1 = 1.

One has to stress that the representation method of time se-ries is problem-dependent. For example, one may be interestedto analyze time series based on their frequency characteristics(using DFT), time characteristics (where PAA could be of in-terest), or time–frequency joint characteristics (DWT). In thispaper, we used these three representation methods in clusteringtime-series data.

B. Distance Functions

Distance functions (distances, for brief) used in time seriescan be divided into three general categories: Lp−norm dis-tances, elastic measures, and statistical measures. Euclidean dis-tance L2 has been widely used as a dissimilarity measure [20]and is suitable to compare equal-length time series. Dynamictime warping distance [7] is an elastic measure used to deter-mine an optimal match between two time series by stretching orcompressing their segments, and concentrates on the similarityof time series with respect to their shapes. Longest common sub-sequence [25] is another example of the elastic-based distancemeasures. This method uses the length of the longest subse-quence occurring in two time series to quantify their similarity.In addition, an edit distance of real-number sequences [48],which is another elastic-based distance measure, considers thenumber of insert, delete, and replace operations that are requiredto convert one sequence to another to express the similarity.Pearson coefficient is a statistics-based method that is used toquantify the correlation between two time series. The Kullback–Liebler distance [24] is another statistical measure useful in ex-pressing the dissimilarity between two time series representedby their Markov chain. A comparison between a number of rep-resentation methods and similarity measures used for varioustypes of time series was reported in [20] in the problem of in-dexing time series. The suitability of each similarity measure isapplication-oriented. Nevertheless, the Euclidean distance is incommon usage.

III. CONCEPT OF CLUSTERING OF SPATIOTEMPORAL DATA

In clustering spatiotemporal data, we assume that there are ndata x1 , x2 , . . . , xn , each comprising its spatial and temporalcomponents. The ith data xi is represented as a concatenationof its spatial and temporal parts, namely, xi = [xi(s)|xi(t)]

T,

where xi(s) is the spatial part of xi , while xi(t) denotes thetemporal part (or its representation) of the same data point. Byconsidering r features in the spatial part and q features in thetemporal one, we have

xi =[xi(s)|xi(t)]T = [xi1(s), . . . , xir (s)|xi1(t), . . . , xiq (t)]

T .(6)

Page 4: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

858 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

As noted earlier, our interest is in the augmentation of the FCMalgorithm so that the spatiotemporal nature of the data can befully utilized in the clustering process. The aim of the FCM isto construct a collection of “c” information granules—clusterswith the structure of data described by a collection of prototypesv1 , v2 , . . . , vc and a fuzzy partition matrix U = [uik ], i =1, 2, . . . , c, k = 1, 2, . . . , n, where uik ∈ [0, 1],

∑ci=1 uik =

1, ∀k, and 0 <∑n

k=1 uik < n, ∀i. This structure arises throughthe minimization of the following objective function:

J =c∑

i=1

n∑

k=1

umikd2(vi ,xk ) (7)

where m(m > 1) is a fuzzification coefficient. The distance dused in the objective function is usually viewed as the Euclideandistance or its relative such as the weighted Euclidean or theMahalanobis distance [9]. When it comes to the spatiotemporaldata, the key point is to prudently capture a notion of distance,which will clearly distinguish between the spatial and the tem-poral components in the problem at hand. Likewise, we may liketo accommodate a crucial possibility to strike a sound tradeoffbetween the distance determined with regard to the spatial andthe temporal parts of the feature vector. This is accomplishedby forming an additive form of the distance function composedof the two components

d2λ(vi ,xk ) = ‖vi(s) − xk (s)‖2 + λ ‖vi(t) − xk (t)‖2 , λ ≥ 0.

(8)This augmented distance allows us to control the effect of eachpart of data in the determination of the overall Euclidean dis-tance and helps strike a sound balance between the impact of thespatial and temporal components of the data. When λ = 0, thespatial component is considered and the temporal part is com-pletely ignored. The higher the value of λ, the more substantialthe impact of the temporal part of the spatiotemporal data onthe discovery of the structure. Subsequently, the aforementioneddistance function is used in the objective function

J =c∑

i=1

n∑

k=1

umikd2

λ(vi ,xk ). (9)

Carrying out the optimization of J , we arrive at the followingexpressions for the prototypes and the partition matrix

vi =∑n

k=1 umikxk∑n

k=1 umik

(10)

uik =1

∑cj=1

(dλ(v i ,xk )dλ(vj ,xk )

)2/(m−1) . (11)

As usual, these two formulas are used in an iterative way inwhich the partition matrix and the prototypes are updated in aconsecutive fashion. While the weight factor (λ) offers a badlyneeded flexibility to the method and could help in its opti-mization, it becomes crucial to arrive at a constructive way ofselecting its optimal value. In what follows, we introduce twoevaluation criteria using which the factor’s value becomes opti-mized.

Fig. 1. Overall scheme of evaluation of the clustering process completed withthe aid of (a) RC and (b) PC.

IV. EVALUATION CRITERIA

The two criteria of interest are concerned with a way in whichthe results of clustering are evaluated. Those are the reconstruc-tion criterion (RC) [10] and prediction criterion (PC) [11]. Fig. 1highlights the essence of these two criteria.

Our starting point is the result of clustering expressed in termsof the prototypes and the partition matrix. The clustering wasrealized for a certain value of λ.

A. Reconstruction Criterion

The essence of this evaluation process is to “reconstruct” theoriginal data using the cluster prototypes and the partition matrixby minimizing the following sum of distances [10]:

F =c∑

i=1

n∑

k=1

umik ‖vi − xk‖2 (12)

where xk is the reconstructed version of xk . By zeroing thegradient of F with respect to xk , we have

xk =∑c

i=1 umikvi∑c

i=1 umik

. (13)

Once the reconstruction has been completed, viz.,x1 , x2 , . . . , xn were constructed with the use of (13), the qualityof reconstruction regarded as a function of λ is expressed in theform

E(λ) =n∑

k=1

‖xk − xk‖2

=n∑

k=1

‖xk (s) − xk (s)‖2 +n∑

k=1

‖xk (t) − xk (t)‖2 (14)

where

‖xk (s) − xk (s)‖2 =1r

r∑

j=1

(xkj (s) − xkj (s))2

σ2j

(15)

Page 5: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 859

and

‖xk (t) − xk (t)‖2 =1q

q∑

j=1

(xkj (t) − xkj (t))2

σ2j

(16)

and σ2j is the variance of jth feature. Given that commonly the

spatial part and the temporal part are expressed in spaces ofvery different dimensionalities (typically r << q), in these two,we use the normalized Euclidean distances in order to avoidany bias toward any particular component of the distance. Thereconstruction error E(λ) is a function of λ and its minimum isdetermined by a systematic sweeping through a certain range ofthe values of λ. This approach, instead of any more sophisticated1-D search, is considered because learning about the form of thisindex as a function of λ is also of interest.

B. Prediction Criterion

The essence of the PC is to “predict” the temporal componentof the data by using the available spatial structure. Since eachdata point is composed of the spatial and the temporal parts,the cluster centers (prototypes) are composed of the spatial partv(s), and temporal part v(t) as well. Using the spatial part ofdata along with the spatial part of the calculated cluster centers,we form a new partition matrix, which is denoted by U , asfollows [11]:

uik =1

∑cj=1

(‖v i (s)−xk (s)‖‖vj (s)−xk (s)‖

)2/(m−1) . (17)

With the use of this new partition matrix and the temporal partof the cluster centers v(t), we minimize the following sum ofdistances:

F =c∑

i=1

n∑

k=1

umik ‖vi(t) − xk (t)‖2 (18)

where xk (t) is the predicted temporal part of the kth data. Byzeroing the gradient of F with respect to xk (t), we have

xk (t) =∑c

i=1 umikvi(t)∑c

i=1 umik

. (19)

The quality of prediction is evaluated using the following pre-diction error:

E(λ)=n∑

k=1

‖xk (t) − xk (t)‖2 =1q

n∑

k=1

q∑

j=1

(xkj (t) − xkj (t))2

σ2j

.

(20)It takes on a form of the sum of the normalized Euclidean dis-tances between the temporal part of the data and the predictedtemporal part. As in the previous criterion, the intent is to min-imize E(λ) by adjusting the value of λ. Algorithm 1 shows thepseudocode of the proposed algorithm.

V. EXPERIMENTAL STUDIES: USE OF SYNTHETIC DATA

In this section, we investigate the behavior of the clusteringresults quantified in terms of the criteria of reconstruction andprediction for two synthetic datasets. Fig. 2(a) shows the spa-tial component of these datasets where P1, P2, P3, and P4 are

Fig. 2. Synthetic spatiotemporal data. (a) Spatial component, (b) temporalcomponent of more distinguishable dataset, and (c) temporal component of lessdistinguishable dataset.

Fig. 3. (a) Selected time series and its representations with the use of(b) DFT(32), (c) PAA(32), and (d) DWT(32).

groups associated with four categories of time series of lengthof 256 samples. We considered two scenarios. In the first one,Fig. 2(b), the time series are clearly distinguishable, while thoseshown in Fig. 2(c) exhibit a significant level of overlap (lessdistinguishable data). The generated time series in these figuresare a kind of increasing and decreasing time series encounteredin control chart patterns [50].

In Fig. 3, we presented one of the time series along with itscorresponding representations, namely DFT(32), PAA(32), andDWT(32). The notion DFT(32) means the DFT with length 32.

We systematically sweep through the range of values of λ

to find its value where the reconstruction or prediction error(based on the evaluation criterion) attains its minimum. Table Ipresents the optimal values of λ along with the correspondingreconstruction error reported for several number of clusters,i.e., c = 2, 3, and 4, and different representation methods withlengths 8, 16, and 32. Notice that the reported reconstructionerror is a sum of the squared Euclidean distances between the

Page 6: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

860 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

ALGORITHM 1PSEUDOCODE OF THE CLUSTERING METHOD USING RC AND PC

original extracted features and the reconstructed features (see(14). In all experiments, the value of the fuzzification coefficientm was set to 2.

The table visualizes the effect of different parameters on theoptimal value of λ and the resulting reconstruction error. Amongdifferent representation methods, the DFT representation hasthe lowest value of the optimal λ, while the DWT assumes thehighest value. The reason is that the magnitude of features isdifferent depending on the representation method used.

As shown in this table, given a higher dimensionality of therepresentation space used for the temporal part of data, the op-timal value of λ will occur in a lower amount to prevent biastoward temporal part in the clustering process. With the increaseof the number of clusters, the reconstruction error becomes re-duced. Having more visible structure in the more distinguishabledataset [see Fig. 2(b)], its reconstruction error usually is lowerthan the one reported for the less distinguishable dataset. Table IIshows the results obtained when using the PC.

We can see that most of the conclusions obtained when deal-ing with the RC hold here. There is an exception, however:Sometimes with the increase in the number of clusters, the errordoes not decrease. For example, the value of the error for c =3 is higher than the one for c = 2 because for the generated

TABLE IOPTIMAL VALUES OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR

THE SYNTHETIC DATASETS

TABLE IIOPTIMAL VALUES OF λ AND THE ASSOCIATED PREDICTION ERROR FOR THE

SYNTHETIC DATASETS

datasets, by considering the number of clusters c = 3, the “po-sition” of the spatial part of prototypes and the “structure” ofthe temporal part of prototypes are not efficient for prediction,as the predicted time series are the weighted (calculated by theposition of spatial part of prototypes in the form of U ) averageof temporal parts of prototypes.

Page 7: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 861

Fig. 4. Contour plots of membership functions for selected values of λ andc = 2, PAA(16) representation, and less distinguishable dataset. (a) λ = 0.(b) λ = 1 and RC. (c) λ = 1 and PC. (d) λ = 3 and RC. (e) λ = 3 and PC.(f) λ = 10 000.

A. Effect of λ on the Performance and Results of Clustering

In this experiment, we show how λ impacts the effect arisingfrom the temporal and spatial components of the data. We usethe less distinguishable dataset, [see Fig. 2(c)], set the numberof clusters to 2, and use PAA(16) as the representation methodof the time series; both the RC and the PC are considered. Fig. 4shows the results in the form of a contour plot of the obtainedmembership functions. The values λ = 0 and λ = 10 000 aretreated as the extreme cases: when λ = 0, the spatial part is in-volved in clustering, while the second boundary focuses on thetemporal part of the data. It becomes visible that the changes ofλ lead to the shift of the contour plots which are reflective ofthe growing impact of the temporal or spatial component of thedata. In the sequel, we investigate the impact of λ on the recon-struction and prediction errors. In the series of experiments, weset the number of clusters to c = 3. The DFT(16) is used as therepresentation method. Fig. 5 displays the plots for (a) RC and(b) PC. The optimal value of λ is clearly visible.

VI. EXPERIMENTAL STUDIES: USE OF REAL-WORLD DATA

In this section, we investigate the proposed method in applica-tion to the Alberta temperature dataset, including daily averagetemperature.

Fig. 5. Plots of reconstruction and prediction errors versus λ for c = 3 andDFT(16) representation. (a) Reconstruction error and (b) prediction error.

A. Analysis of Alberta Temperature Data

Alberta agriculture and rural development provides updatedagriculture-related data including daily temperature, humidity,precipitation, etc. The data are recorded by a number of sta-tions located within the province of Alberta, Canada. For eachstation, the geographical coordinates in the form of its latitudeand longitude are provided. These data are available online atwww.agric.gov.ab.ca. In this system, the end-user can select therequired stations and pertinent agriculture-related variables todownload the data. Fig. 6(a) shows a snapshot of the system withthree highlighted stations located in South East, South West, andNorth West Alberta.

Fig. 6(b) shows the average daily temperature recorded atthese stations in 2009. The collected data are of high relevanceto various groups of users. Epidemiologists form one of suchgroups: who are seeking to better understand the relationshipsbetween measures of environmental health and measures of an-imal health, for example, to better understand the relationshipsbetween province-wide precipitation, temperature and humid-ity, and the dynamics of the prevalence of endemic diseasesand possible outbreaks. Animal health information includestemporal and spatial field level veterinarian observations (e.g.,preliminary syndromes and clinical diagnoses) and laboratoryresults from submitted field samples. While this understandingenhances knowledge of the dynamics of interacting environmen-tal and health domains through ad hoc analyses, it also supportsthe development of near-real-time surveillance systems. Thesesystems provide operational insight into these dynamic relation-ships needed for ongoing monitoring, response, and improvedcontrol of biosafety and risk of diseases. As can be seen fromFig. 6(b), different stations located in different parts of provincecome with different temperature patterns. Therefore, grouping

Page 8: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

862 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

Fig. 6. (a) Snapshot of the Alberta Agriculture and Rural Development sys-tem and three highlighted stations (www.agric.gov.ab.ca). (b) Daily averagetemperature in 2009 for the highlighted stations.

(clustering) these stations based on their locations and theirdaily average temperature (or any other variable e.g. precipita-tion) generates some useful insights with potential applicabilityto various domains. We consider the temperature data recordedduring 2009–2011 at 246 stations located across Alberta. Noticethat in the experiments, in the first step, we project latitude andlongitude coordinates to Cartesian coordinates to be used in thecalculations of the Euclidean distance.

1) Alberta Temperature Data in Different Seasons—Reconstruction Criterion: We split the daily average temper-ature data recorded in 2009 into four seasons (Spring, Summer,Fall, and Winter) and run the experiments using the RC, whilethe number of clusters varies from 2 to 5. The length of eachtime series is about 90 (depends on season) and for each rep-resentation method, the length of 8 has been chosen. Table IIIsummarizes the results.

What could have been expected, when forming more clusters,the reconstruction error is reduced. Furthermore, from this table,we can see that in some cases, we have λopt = 0. This meansthat involving temporal information in these cases does not helpthe method to reconstruct data in a more accurate way. Fig. 7shows the contour plot of the membership degrees of the clustersobtained for different seasons of the year.

TABLE IIIOPTIMAL VALUE OF λ AND THE ASSOCIATED RECONSTRUCTION ERROR FOR

246 STATIONS IN THE ALBERTA TEMPERATURE DATASET IN DIFFERENT

SEASONS OF 2009

Fig. 7. Clusters visualized in the form of contour plot of the membershipdegrees for successive seasons of 2009, c = 2, and PAA(8) representation:(a) Spring, (b) Summer, (c) Fall, and (d) Winter.

Page 9: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 863

Fig. 8. Clusters of spatiotemporal data—Summer 2009 data, c = 3, and(a) DFT(8), (b) PAA(8), and (c) DWT(8). The optimal values of λ are 0.35,45, and 125, respectively.

For different seasons, we encounter different structures. Thisis quite reasonable because in some seasons, several locationson the map are similar in temperature, while in some otherseasons, they might be very different. Moreover, we can seethat the Spring clusters are similar to the Winter clusters, whileSummer clusters are similar to the Fall clusters. The reason isthat in the Spring and Winter, the temperature is low in mostparts of Alberta so that there is no significant difference intemperature in most stations. As a result, the spatial part of datahas more effect on the resulting clusters. On the other hand, in theSummer and Fall, the magnitude of temperature in the RockyMountains area (south west Alberta) is significantly differentfrom the temperature recorded in some other areas [as can beseen from Fig. 6(b)] so that the temporal part of the data has moreeffects. Fig. 8 shows the clusters obtained for Summer 2009data, λopt and c = 3. The stars denote the spatial prototypes.There are clear differences between the clusters when usingdifferent representations of the time series. This is not surprisingas different representation methods capture different facets ofthe time series. In addition, for each representation method, thedistinguishability of the features can be different, and as a result,

TABLE IVPC FOR ALBERTA TEMPERATURE DATASET DURING 2009–2011. EACH CELL

COMPRISES TWO ENTRIES: THE OPTIMAL VALUE OF λ AND THE ASSOCIATED

PREDICTION ERROR

for different representation methods, the revealed structures intemporal part of data can be more or less significant.

2) Alberta Daily Average Temperature During 2009–2011—Prediction Criterion: We considered daily average temperaturefor 246 stations in Alberta in the time period 2009–2011 andbuild the clusters to investigate the PC. Table IV shows theoptimal amount of λ and its corresponding prediction error forthese 246 stations and number of clusters c = 2, 4, 6, 8, and 10.The length of time series in each dataset is 365, and the lengthof representation methods is set to 32.

The plots in Fig. 9 illustrate the obtained clusters for c = 4.The clusters vary, depending upon the value of λ. The use of theoptimal value gives rise to clusters that form a sound balance be-tween the spatial and temporal resemblance. The results identifythe region of the Rocky Mountains, prairie region (that is com-posed of the southern and northern sections of the province),and the northern part of the province (an upper portion of themap).

B. Comparative Study

Pham [44] proposed a spatial model of FCM (called RobustFuzzy C-means Algorithm (RFCM)), for image segmentation.This method uses a spatial penalty on membership degrees. Theproposed objective function is as

V =c∑

i=1

n∑

k=1

umik ‖xi − vk‖2 +

β

2

c∑

i=1

n∑

k=1

umik

c ′∈Mi

j∈Nk

umc ′j

(21)where Nk denotes the neighbors of station k, and Mi ={1, 2, . . . , c} − {i}. Equation (21) is composed of two parts:the FCM objective function for temporal part of data and a spa-tial regularization term. β is a weight to control the effect of eachpart in clustering (like λ in our method). The aforementionedobjective function can be minimized by calculating partitionmatrix and prototypes in an iterative process. Let us say that thekth object has a high membership degree to ith cluster. Mini-mizing (21) leads to the reduction of the membership degreesof objects in Nk to the cluster centers in Mi . Coppi et al. [59]

Page 10: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

864 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

Fig. 9. Plot of spatiotemporal clusters for 2009 for (a) λ = 0, (b) λ = 10 000,and (c) λ = λopt using PC. The number of clusters c = 4 and DFT(32) used asthe representation method.

extended this method to cluster spatial time series. To compareour method (using the RC and PC) with the RFCM, we proposethe following evaluation criterion:

Q =J (x(s)|U)J (x(s))

+J (x(t)|U)J (x(t))

(22)

where U is the optimal partition matrix in spatiotemporal clus-tering (resulted from optimal λ in our methods and optimal β inRFCM). J (x(s)|U) is the FCM objective function for spatialpart of data by considering U as its partition matrix and calcu-lating new prototypes. J (x(s)) is the FCM objective functionresulting from clustering spatial part of data separately. In ad-dition, x(t) denotes the temporal part of data. In fact, J (x(s))and J (x(t)) are two normalization terms. The intuition behindthe proposed criterion is that we consider a clustering as an“appropriate” clustering, if it is suitable for both spatial partand temporal part of data. The lower the value of Q, the moreappropriate the spatiotemporal clusters. Notice that since, inclustering spatial (or temporal) part of data separately, we donot consider the other part, the resulting partition matrix willbe the optimal one for that part, and obviously, we will haveJ (x(s)|U) ≥ J (x(s)) and J (x(t)|U) ≥ J (x(t)), and as a re-

TABLE VCOMPARISON OF RC, PC, AND RFCM OVER THE EVALUATION CRITERIA (22)

FOR DIFFERENT REPRESENTATIONS AND NUMBER OF CLUSTERS

sult, always in (22), we have Q ≥ 2. We calculated Q for RC,PC, and RFCM. In RFCM, to find the optimal value of β, aheuristic can be used. In [44] and [59], different values of β in arange is checked to optimize an objective function. This objec-tive function is minimizing a cross-validation error in [44] andmaximizing a spatial autocorrelation in [59]. Since the evalua-tion criterion in this comparison is Q in (22), we check differentvalues of β and select the one that can minimize it. Table Vshows the comparison for different representations and differ-ent number of clusters for Alberta temperature data in 2009.

As can be seen from this table, in most cases, RC and PC havea lower value of Q because these methods consider the sameimportance for each part of data in clustering, while RFCM paysless attention to the spatial part. In fact, in RFCM, the spatialpart of data has been used to smooth the temporal clusters (likespatial smoothing of pixels in image processing). In addition,we can see that for different representation methods, there aredifferent amounts of Q because each representation methodcaptures special kinds of features, and based on these features,the temporal structures are different.

C. Prediction Abilities

In this experiment, we consider a part of the 2009 Albertatemperature dataset as the training samples xtrain , and the oth-ers as testing samples xtest , and predict the temporal part of thetesting samples based on their spatial coordinates. The proce-dure of this experiment is given in the following.

1) Cluster the training samples using the augmented FCMand PC to find the optimal clusters (using optimal λ). Theresult is a set of spatiotemporal prototypes in the form of{vtrain(s)|vtrain(t)

}.

2) Using the spatial part of the testing samples xtest(s) andthe spatial part of the calculated prototypes vtrain(s), cal-culate the new partition matrix U using (17).

3) Predict the temporal part of the testing samples usingU and the temporal part of the calculated prototypesvtrain(t).

In this experiment, we consider ntest = 74 (around 30%)stations of the 2009 Alberta temperature dataset as the testingsamples and the other stations as training samples.

Table VI shows the average prediction error for the testing set(called testing error), average prediction error for the training set(training error), and an average error rate for different represen-tations and different number of clusters over 100 independent

Page 11: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 865

TABLE VIAVERAGE AND STANDARD DEVIATION OF TESTING ERROR, TRAINING ERROR,

AND ERROR RATE REPORTED OVER 100 INDEPENDENT RUNS

Fig. 10. (a) Selected testing samples with three labeled stations a, b, and c forprediction. (b) Clusters of training samples with two labeled prototypes P1 andP2.

runs. In addition, we define the error rate as

E =testing errortraining error

. (23)

In Table VI, with the increase of the number of clusters, bothtesting and training errors are reduced. This is quite reasonablesince having more clusters means having more prototypes andmore information about data, and as a result, the prediction canbe more accurate. Moreover, because the clustering is performedon training samples, the defined error rate in (23) is alwayshigher than 1 and by increasing the number of clusters, thereduction in training error is higher than the reduction in testingerror so that the rate of testing error to training error is increased.

Fig. 10(a) shows an example of selected stations as testingsamples (star symbols) and the others as training samples. Threestations a, b, and c from testing samples have been labeled in

Fig. 11. Original and predicted time series for (a) station a, (b) station b, and(c) station c.

this figure. Fig. 10(b) shows the optimal clustering (λopt = 0.65)of the training samples. In this figure, two prototypes, i.e., P1and P2, are labeled. The number of clusters was set to 5, andDFT(32) representation of time series is used.

Fig. 11 shows the reconstructed time series by the originalfeatures (32 DFT features) and predicted features. Using thePC, the temporal part of stations a and b has been predictedwith a high accuracy. However, the prediction for station c isnot accurate because this station is between two clusters P1 andP2 (see Fig. 10) with two very different temporal patterns. Infact, the spatial part of c is close to P1, but its temporal part isclose to P2.

Fig. 12 shows the original and predicted time series of stationc along with the time series corresponding to the prototypesP1 and P2. Both predicted and original time series of stationc are almost between the time series corresponding to P1 andP2. P1 has more effect on prediction, because the spatial partof station c is closer to the spatial part of P1, and as a result,P1 has a higher weight (in the form of membership degree U)

Page 12: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

866 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

Fig. 12. Original and predicted time series for station c [in Fig. 10(a)] and thetime series corresponding to the prototypes P1 and P2.

Fig. 13. Generated two unseen spatial points a and b and their neighbors inthe map.

for prediction. One may consider more clusters to achieve moreaccurate prediction. For example, the prediction error for stationc with number of clusters 2, 5, 8, and 12 is 1.283, 1.240, 0.684,and 0.511, respectively.

In the next step, we consider the entire data as training sam-ples, and predict the temporal part of some unseen spatial co-ordinates in the map. The procedure is the same as used in theprevious experiment. Fig. 13 shows two generated spatial pointsa and b in the map. In addition, for each point, a number of sta-tions is selected as their neighbors. Fig. 14(a) and (b) showsthe predicted time series for a and b along with the time seriescorresponding to their neighbors. As seen from these figures,the predicted time series for points a and b are similar to theirneighbors (time series).

The PC that has been used in this paper is different from thetime-series forecasting methods proposed in literature in bothmethodology and purpose. Our PC predicts the time series basedon their spatial location and the time series formed in the clustercenters. In addition, in this method, the objective is to find anoptimal tradeoff to regulate the interaction between spatial andtemporal patterns in the clustering process and not forecastingthe time series for the future time steps. Time-series forecastingmethods proposed in the literature (e.g., [23], [26], and [31])usually assume that the times series follow a linear or nonlin-

Fig. 14. Predicted time series and the time series corresponding to the neigh-bors of (a) station a and (b) station b highlighted in Fig. 13.

ear model and try to find the parameters of the correspondingmodel using historical data. Then, the generated model is usedto forecast the time series in the future.

VII. CONCLUSION

We have introduced the concept and algorithmic frameworkof fuzzy clustering for spatiotemporal data. It was shown thatgiven a different nature of spatial and temporal components ofthe data, their different treatment is realized through a flexibledistance function where the parameter λ, controlling the influ-ence of temporal and spatial components, is optimized throughthe minimization of the RC or PC.

In this research, we confined ourselves to univariate timeseries. An interesting extension could be to consider multi-variate time series. Here, the data come in the form xi =[xi(s)|xi1(t),xi2(t), . . . ,xiM (t)]

Twhere xik (t) is the kth vari-

able (e.g., temperature), and M is number of variables presentin the temporal part of data. As each time series might comewith its own specificity, this could be reflected in the augmentedadditive distance function expressed as

d2λ(vi ,xk ) = ‖vi(s) − xk (s)‖2 + λ1 ‖vi1(t) − xk1(t)‖2

+ · · · + λM ‖viM (t) − xkM (t)‖2 (24)

where M weight coefficients λ1 , λ2 , . . . , λM offer the requiredflexibility, and the values of these coefficients could be subjectto optimization again by taking advantage of the RC or PC.Another interesting development worth pursuing would be toinvestigate some other distance measures, e.g., the dynamic time

Page 13: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

IZAKIAN et al.: CLUSTERING SPATIOTEMPORAL DATA: AN AUGMENTED FUZZY C-MEANS 867

warping distance, longest common subsequence distance, etc.One has to be aware of the fact that as we encounter variousdistance functions, this may pose challenges at the end of fuzzyclustering and further refinements of the generic FCM methodto cope with the diversity of distance measures different fromthe Euclidean one.

REFERENCES

[1] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequencematching in time-series databases,” in Proc. ACM SIGMOD Int. Conf.Manage. Data, 1994, pp. 419–429.

[2] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionalityreduction for fast similarity search in large time series databases,” J.Knowl. Inf. Syst., vol. 3, no. 3, pp. 263–286, Aug. 2001.

[3] K.-P. Chan and A.W.-C. Fu, “Efficient time series matching by wavelets,”in Proc. Int. Conf. Data Eng., 1999, pp. 126–133.

[4] K.-P. Chan, A. W.-C. Fu, and C. Yu, “Haar wavelets for efficient similar-ity search of time-series: With and without time warping,” IEEE Trans.Knowl. Data Eng., vol. 15, no. 3, pp. 686–705, May/Jun. 2003.

[5] S. Mallat, “A theory for multiresolution signal decomposition: The waveletrepresentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 2,pp. 674–693, Jul. 1989.

[6] F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting ad-hocqueries in large datasets of time sequences,” in Proc. ACM SIGMOD Int.Conf. Manage. Data, New York, 1997, pp. 289–300.

[7] D. Berndt and J. Clifford, “Using dynamic time warping to find patterns intime series,” in Proc. Workshop Knowledge Discovery Databases, 1994,pp. 359–370.

[8] J. Caiado, N. Crato, and D. Pena, “A periodogram-based metric for time se-ries classification,” Comput. Statist. Data Anal., vol. 50, no. 10, pp. 2668–2684, Jun. 2006.

[9] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algo-rithms. New York: Plenum, 1981.

[10] W. Pedrycz and J. V. de Oliveira, “A development of fuzzy encoding anddecoding through fuzzy clustering,” IEEE Trans. Instrum. Meas., vol. 57,no. 4, pp. 829–837, Apr. 2008.

[11] W. Pedrycz and A. Bargiela, “Fuzzy clustering with semantically dis-tinct families of variables: Descriptive and predictive aspects,” PatternRecognit. Lett., vol. 31, no. 13, pp. 1952–1958, Oct. 2010.

[12] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood fromincomplete data via the EM algorithm,” J. Royal Statist. Soc., Series B,,vol. 39, no. 1, pp. 1–38, 1977.

[13] S. Gaffney and P. Smyth, “Trajectory clustering with mixtures of regres-sion models,” in Proc. 5th ACM SIGKDD Int. Conf. Knowl. DiscoveryData Mining, 1999, pp. 63–72.

[14] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: Or-dering points to identify the clustering structure,” in Proc. ACM SIGMODInt. Conf. Manag. Data, Philadelphia, PA, 1999, pp. 49–60.

[15] M. Nanni and D. Pedreschi, “Time-focused clustering of trajectories ofmoving objects,” J. Intell. Inf. Syst., vol. 27, no. 3, pp. 267–289, Nov.2006.

[16] Y. Yang and K. Chen, “Time series clustering via RPCL network ensemblewith different representations,” IEEE Trans. Syst., Man, Cybern. C, Appl.Rev., vol. 41, no. 2, pp. 190–199, Mar. 2011.

[17] J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: A novelsymbolic representation of time series,” Data Mining Knowl. Discovery,vol. 15, no. 2, pp. 107–144, Aug. 2007.

[18] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally adaptivedimensionality reduction for indexing large time series databases,” ACMTrans Database Syst., vol. 27, no. 2, pp. 188–228, Jun. 2002.

[19] H. Cao, H. W. Deng, and Y. P. Wang, “Segmentation of M-FISH imagesfor improved classification of chromosomes with an adaptive fuzzy C-means clustering algorithm,” IEEE Trans. Fuzzy Syst., vol. 20, no. 1,pp. 1–9, Feb. 2012.

[20] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Query-ing and mining of time series data: Experimental comparison of represen-tations and distance measures,” in Proc. VLDB Endowment, Auckland,New Zealand, 2008, pp. 1542–1552.

[21] Y. Cai and R. Ng, “Indexing spatio-temporal trajectories with Chebyshevpolynomials,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2004,pp. 599–610.

[22] E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm forsegmenting time series,” in Proc. IEEE Int. Conf. Data Mining, 2001,pp. 289–296.

[23] G. E. P. Box and G. Jenkins, Time Series Analysis: Forecasting and Con-trol. San Francisco, CA: Holden-Day, 1976.

[24] M. Ramoni, P. Sebastiani, and P. Cohen, “Bayesian clustering by dynam-ics,” Mach. Learn., vol. 47, no. 1, pp. 91–121, 2002.

[25] M. Vlachos, D. Gunopulos, and G. Kollios, “Discovering similar multidi-mensional trajectories,” in Proc. Int. Conf. Data Eng., 2002, pp. 673–684.

[26] M. H. Magalhaes, R. Ballini, and F A. C. Gomide, “Granular mod-els for time-series forecasting,” in Handbook of Granular Computing,W. Pedrycz, A. Skowron, and V. Kreinovich, Eds. New York: Wiley-Interscience, 2008.

[27] T. W. Liao, “Clustering of time series data—a survey,” Pattern Recognit.,vol. 38, no. 11, pp. 1857–1874, Nov. 2005.

[28] E. A. Maharaj, P. D’Urso, and D. U. A. Galagedera, “Wavelet-based fuzzyclustering of time series,” J. Classif., vol. 27, no. 2, pp. 231–275, 2010.

[29] P. D’Urso and E. A. Maharaj, “Autocorrelation-based fuzzy clustering oftime series,” Fuzzy Sets Syst., vol. 160, no. 24, pp. 3565–3589, Dec. 2009.

[30] E. A. Maharaj and P. D’Urso, “Fuzzy clustering of time series in thefrequency domain,” Inf. Sci., vol. 181, no. 7, pp. 1187–1211, Apr. 2011.

[31] H. G. Seedig, R. Grothmann, and T. A. Runkler, “Forecasting of clusteredtime series with recurrent neural networks and a fuzzy clustering scheme,”in Proc. Int. Joint Conf. Neural Netw., Atlanta, GA, 2009, pp. 2846–2853.

[32] C. S. Moller-Levet, F. Klawonn, K.-H. Cho, and O. Wolkenhauer, “Fuzzyclustering of short time series and unevenly distributed sampling points,”in Proc. 5th Int. Symp. Intell. Data Anal., 2003, pp. 28–30.

[33] X. Zhang, J. Liu, Y. Du, and T. Lv, “A novel clustering method on timeseries data,” Expert Syst. Appl., vol. 38, no. 9, pp. 11891–11900, Sep.2011.

[34] F. Petitjean, A. Ketterlin, and P. Gancarski, “A global averaging method fordynamic time warping, with applications to clustering,” Pattern Recognit.,vol. 44, no. 3, pp. 678–693, Mar. 2011.

[35] K. Kalpakis, D. Gada, and V. Puttagunta, “Distance measures for effectiveclustering of ARIMA time-series,” in Proc. IEEE Int. Conf. Data Mining,2001, pp. 273–280.

[36] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introductionto Cluster Analysis. New York: Wiley, 1990.

[37] Y. Xiong and D. Yeung, “Time series clustering with ARMA mixtures,”Pattern Recognit., vol. 37, no. 8, pp. 1675–1689, Aug. 2004.

[38] G. Schwarz, “Estimating the dimension of a model,” Ann. Statist., vol. 6,no. 2, pp. 461–464, 1978.

[39] P. D’Urso, “Fuzzy clustering for data time arrays with inlier and outliertime trajectories,” IEEE Trans. Fuzzy Syst., vol. 13, no. 5, pp. 583–604,Oct. 2005.

[40] M. Sato and Y. Sato, “On a multicriteria fuzzy clustering method for 3-way data,” Int. J. Uncertainty Fuzziness Knowl.-Based Syst.,, vol. 2, no. 2,pp. 127–142, Jun. 1994.

[41] A. Lemos, W. Caminhas, and F. Gomide, “Multivariable Gaussian evolv-ing fuzzy modeling system,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1,pp. 91–104, Feb. 2011.

[42] Z. Chen, S. Aghakhani, J. Man, and S. Dick, “ANCFIS: A neuro fuzzyarchitecture employing complex fuzzy sets,” IEEE Trans. Fuzzy Syst.,vol. 19, no. 2, pp. 305–322, Apr. 2011.

[43] S. Chen and C. Chen, “TAIEX forecasting based on fuzzy time series andfuzzy variation groups,” IEEE Trans. Fuzzy Syst., vol. 19, no. 1, pp. 1–12,Feb. 2011.

[44] D. L. Pham, “Spatial models for fuzzy clustering,” Comput. Vis. ImageUnderstand., vol. 84, no. 2, pp. 285–297, 2001.

[45] V. Petridis and A. Kehagias, “Predictive modular fuzzy systems for time-series classification,” IEEE Trans. Fuzzy Syst., vol. 5, no. 3, pp. 381–397,Aug. 1997.

[46] S. M. Arafat and M. Skubic, “Modeling fuzziness measures for bestwavelet selection,” IEEE Trans. Fuzzy Syst., vol. 16, no. 5, pp. 1259–1270, Oct. 2008.

[47] P. Kalnis, N. Mamoulis, and S. Bakiras, “On discovering moving clustersin spatio-temporal data,” in Proc. Int. Symp. Spatial Temporal Databases,2005, pp. 364–381.

[48] L. Chen, M. T. Ozsu, and V. Oria, “Robust and fast similarity search formoving object trajectories,” in Proc. ACM SIGMOD Int. Conf. Manage.Data, 2005, pp. 491–502.

[49] L. F. S. Coletta, L. Vendramin, E. R. Hruschka, R. J. G. B. Campello, andW. Pedrycz, “Collaborative fuzzy clustering algorithms: Some refinementsand design guidelines,” IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 444–462, Jun. 2012.

Page 14: Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means

868 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 21, NO. 5, OCTOBER 2013

[50] D. T. Pham and A. B. Chan, “Control chart pattern recognition using anew type of self organizing neural network,” Proc. Inst. Mech. Eng., PartI: J. Syst. Control Eng., vol. 212, no. 2, pp. 115–127, 1998.

[51] S. Kisilevich, F. Mansmann, M. Nanni, and S. Rinzivillo, “Spatio-temporal clustering,” in Data mining and Knowledge Discovery Hand-book. New York: Springer, 2010, pp. 855–874.

[52] M. Kulldorff, “Prospective time periodic geographical disease surveillanceusing a scan statistic,” J. Roy. Statist. Soc. A, vol. 164, no. 1, pp. 61–72,2001.

[53] H. Izakian and W. Pedrycz, “A new PSO-optimized geometry of spatialand spatio-temporal scan statistics for disease outbreak detection,” SwarmEvol. Comput., vol. 4, pp. 1–11, Jun. 2012.

[54] F. Di Martino and S. Sessa, “The extended fuzzy C-means algorithmfor hotspots in spatio-temporal GIS,” Expert Syst. Appl., vol. 38, no. 9,pp. 11829–11836, Sep. 2011.

[55] M. Wang, A. Wang, and A. Li, “Mining spatial-temporal clusters fromgeo-databases,” in Proc. 2nd Int. Conf. Adv. Data Mining Appl., 2006,pp. 63–270.

[56] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithmfor discovering clusters in large spatial databases with noise,” Data MiningKnowl. Discovery, pp. 226–231, 1996.

[57] Z. Liu and R. George, “Fuzzy cluster analysis of spatio-temporal data,” inProc. 18th Int. Symp. Comput. Inf. Sci., Antalya, Turkey, 2003, pp. 984–991.

[58] M. Deng, Q. Liu, J. Wang, and Y. Shi, “A general method of spatio-temporal clustering analysis,” Sci. Chin. Inf. Sci., pp. 1–14, 2011.

[59] R. Coppi, P. D’Urso, and P. Giordani, “A fuzzy clustering model formultivariate spatial time series,” J. Classif., vol. 27, no. 1, pp. 54–88, Mar.2010.

[60] Y. C. Cheng and S. T. Li, “Fuzzy time series forecasting with a probabilis-tic smoothing hidden Markov model,” IEEE Trans. Fuzzy Syst., vol. 20,no. 2, pp. 291–304, Apr. 2012.

[61] J. Wu, H. Xiong, C. Liu, and J. Chen, “A generalization of distance func-tions for fuzzy C-means clustering with centroids of arithmetic means,”IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 557–571, Jun. 2012.

[62] J. P. Mei and L. Chen, “A fuzzy approach for multitype relational dataclustering,” IEEE Trans. Fuzzy Syst., vol. 20, no. 2, pp. 358–371, Apr.2012.

Hesam Izakian (S’12) received the M.S. degree incomputer engineering (artificial intelligence) fromthe University of Isfahan, Isfahan, Iran. He is cur-rently working toward the Ph.D. degree with the De-partment of Electrical and Computer Engineering,University of Alberta, Edmonton, AB, Canada.

He is working under the supervision of Prof. W.Pedrycz. His research interests include computationalintelligence, knowledge discovery and data mining,pattern recognition, and software engineering.

Witold Pedrycz (M’88–SM’94–F’99) received theM.Sc., Ph.D. and D.Sci. degrees from the SilesianUniversity of Technology, Gliwice, Poland.

He is currently a Professor and Canada ResearchChair (CRC computational intelligence) with the De-partment of Electrical and Computer Engineering,University of Alberta, Edmonton, AB, Canada. In2009, he was elected as a foreign member of the Pol-ish Academy of Sciences, Warsaw, Poland. He is theauthor of 14 research monographs covering variousaspects of computational intelligence and software

engineering. He is also with the Department of Electrical and Computer Engi-neering Faculty of Engineering, King Abdulaziz University, Jeddah, Kingdomof Saudi Arabia. His main research interests include computational intelligence,fuzzy modeling and granular computing, knowledge discovery and data mining,fuzzy control, pattern recognition, knowledge-based neural networks, relationalcomputing, and software engineering. He has published numerous papers in thisarea.

Prof. Pedrycz was elected as a Fellow of the Royal Society of Canada in 2012.He has been a member of numerous program committees of IEEE conferences inthe area of fuzzy sets and neurocomputing. He is intensively involved in editorialactivities. He is an Editor-in-Chief of Information Sciences and Editor-in-Chiefof the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A:SYSTEMS AND HUMANS. He currently serves as an Associate Editor of the IEEETRANSACTIONS ON FUZZY SYSTEMS and is a member of a number of edito-rial boards of other international journals. In 2007, he received a prestigiousNorbert Wiener Award from the IEEE Systems, Man, and Cybernetics Coun-cil. He received the IEEE Canada Computer Engineering Medal in 2008. In2009, he received a Cajastur Prize for soft computing from the European Centrefor Soft Computing for “pioneering and multifaceted contributions to granularcomputing.”

Iqbal Jamal received the M.S. degree in managementscience and the M.A.Sc.Eng. degree from the Univer-sity of British Columbia, Vancouver, BC, Canada.

He is currently a Principal of AQL ManagementConsulting (AQLMC) Inc., Edmonton, AB, Canada:a data mining/analytics-based company. AQLMCspecializes in developing and implementing data an-alytics in support of anomaly detection for animal,human, and environmental health. AQLMC also con-ducts operations analysis for public sector services.


Recommended