\u003ctitle\u003eAutomatic partially supervised classification of multitemporal remotely sensed...

$Page 1: \u003ctitle\u003eAutomatic partially supervised classification of multitemporal remotely sensed images\u003c/title\u003e$
Automatic partially supervised classification of multitemporalremotely sensed images

Gabriele Mosera, Sebastiano B. Serpicoa, Michaela De Martino, and Daniele Coppolinoa

aDept. of Biophysical and Electronic Engineering (DIBE), University of Genoa;Via Opera Pia 11a, I-16145, Genova, Italy;

ABSTRACT

The use of remotely sensed imagery for environmental monitoring naturally leads to operate with multitemporalimages of the geographical area of interest. In order to generate thematic maps for all acquisition dates, anunsupervised classification algorithm is not effective, due to the lack of knowledge about the thematic classes.On the other hand, a detailed analysis of all the land-cover transitions is naturally accomplished in a completelysupervised context, but the ground-data requirement involved by this approach is not realistic in case of shortrivisit time. An interesting trade-off is represented by the partially supervised approach, exploiting groundtruth only for a subset of the acquisition dates. In this context, a multitemporal classification scheme has beenproposed previously by the authors, which deals with a couple of images of the same area, assuming ground truthto be available only at the first date. In the present paper, several modifications are proposed to this systemin order to automatize it and to improve the detection performances. Specifically, a preprocessing algorithm isdeveloped, which addresses the problem of mismatches in the dynamics of images acquired at different timesover the same area, by both automatically correcting strong dynamics differences and detecting cloud areas. Inaddition, the clustering procedures integrated in the system are fully automatized by optimizing the selection ofthe numbers of clusters according to Bayesian estimates of the probability of correct classification. Experimentalresults on multitemporal Landsat-5 TM and ERS-1 SAR data are presented.

Keywords: Multitemporal classification, partially supervised classification, clustering, change detection.

1. INTRODUCTION

The use of remotely sensed imagery for environmental monitoring naturally leads to operate with multitemporalimages of the geographical area of interest (i.e., with sequences of remote sensing images acquired on the givenarea at different dates), for example, to study the evolution of land covers in applications like natural resourceexploitation or for the management of natural disasters. The resulting multitemporal data requires specificanalysis techniques, meant to extract the required information. The development of such techniques plays astrategic role also in view of future missions, designed to provide data with short rivisit time (e.g., 24 or 12 hours).However, this short rivisit time constraint and the need for accurate land-cover mapping at each acquisition datejointly render the usual supervised and unsupervised multitemporal analysis technique inadequate. Specifically,several unsupervised methodologies allowing the identification of the changes occurring between consecutiveacquisition dates without prior information for any date have been proposed in the literature123,4 but theseapproaches do not provide information about the different typologies of land-cover transition, due to the lackof knowledge about the land cover classes. Such analysis is naturally accomplished in a completely supervisedcontext56,7 i.e., when ground truth data are available for all observation dates, but this ground truth availabilitycontraint involves high costs and is not even realistic in the case of short rivisit time.

An interesting trade-off is represented by a partially supervised approach, exploiting ground truth informationonly for a subset of the acquisition dates.8 In particular, in9 a contextual classification technique was developedthat requires training data only for for the land cover transitions of interest to the end-user. In10 a partiallysupervised methodology was proposed in order to update the parameters of an already trained parametricmaximum-likelihood (ML) classifier whenever a new image lacking the corresponding ground truth has to be

Further author information:E-mail: [email protected], Telephone: +39 010 353 2752.

Image and Signal Processing for Remote Sensing X, edited byLorenzo Bruzzone, Proc. of SPIE Vol. 5573 (SPIE, Bellingham, WA, 2004)

0277-786X/04/$15 · doi: 10.1117/12.567968

126

analyzed, under the assumption that no new class is included in the new image. Radial Basis Function (RBF)and ML classifiers are combined in11 in a multitemporal partially supervised multiple-classifier architecture. Amarkov Random Field (MRF) multitemporal classification and change-detection algorithm dealing with remotelysensed sequences with ground truth information available only at one acquisition date is developed in.12 Inparticular, a multitemporal classification scheme has been proposed by the authors in,8 which deals with acouple of images of the same area, assuming ground truth to be available only at the first date. The methodintegrates this prior information with the “K-means” clustering algorithm,13 applied to each single-date image,and with the Kittler and Illingworth’s (K&I) thresholding technique,14 applied to change detection2,3 in orderto generate a supervised classification map for the first date, together with a partially supervised classificationmap for the second one.

In the present paper, several modifications are proposed to this system in order to fully automatize it and toimprove the detection performances. Specifically, all change detection approaches based on image differencing,1

such as the above-mentioned method based on K&I, critically depend on a correct correspondence between thedynamics of the two images. Dealing with optical data, this condition is often violated, for instance, due todifferent atmospheric conditions at the two dates. In the present paper, a preprocessing algorithm is developed,which addresses this operational issue, by both automatically correcting strong dynamics differences and detectingcloud areas. Furthermore, an open issue in dealing with “K-means” is the selection of the number of clusters.We propose here to generate a set of distinct “K-means” clustering solutions, with different numbers of clusters,and to select an optimal solution by optimizing a functional related to the classification accuracy. The resultingsystem is experimentally validated on real multitemporal Landsat-5 TM and ERS-1 SAR data, both assessingthe visual quality of the resulting partially supervised maps through photo-interpretation and quantitativelycomputing the corresponding classification accuracies.

2. METHODOLOGY

2.1. System architecture

The adopted multitemporal partially supervised classification scheme deals with two equal-sized images I0 andI1 acquired on the same geographical region at dates t0 and t1, respectively (t1 > t0). Specifically, I0 and I1

are assumed to be N1 × N2 images and all the maps introduced in the present section and in the following onesare assumed to be defined over a two-dimensional N1 × N2 lattice X = (m, n) : m = 0, 1, . . . , N1 − 1; n =0, 1, . . . , N2 − 1 containing N = N1N2 pixels. Ground truth information (expressed by a training map G withM classes ω1, ω2, . . . , ωM ) is assumed to be available for the first date t0, whereas no prior information is knownat the second date t1. The same classes can be present also in I1, together with possible further classes (“new”classes), which are not present in I0. We denote explicitly by Ω = ω1, ω2, . . . , ωM the set of “old” classes andby Θ = θ1, θ2, . . . , θM ′ the set of “new” classes. In order to generate a classification map both for I0 (assigningeach pixel a class label in Ω) and for I1 (assigning each pixel a class label in Ω∪Θ), the adopted multitemporalpartially supervised classifier performs the following four stages:

• Clustering phase: an unsupervised classification (i.e., clustering)13 is applied to each image Ir in orderto generate a clustering map Mr (r = 0, 1); the number of clusters in Mr is denoted by Kr and the k-thcluster in Mr by σrk (k = 1, 2, . . . , Kr; r = 0, 1);

• Hybrid classification phase: the clustering map M0 and the training map G are jointly processed togenerate a hybrid supervised/unsupervised classification map H for I0 by applying the ML decision rule15

to each cluster σ0k (k = 1, 2, . . . , K0; see Sec. 2.3);

• Change detection phase: an unsupervised change-detection method is applied inside each cluster σ1k inM1 (k = 1, 2, . . . , K1), in order to identify the changes occurred in the corresponding ground area duringthe time interval [t0, t1], thus generating a binary change map C;

• New classes detection phase: the ML rule is applied to each cluster σ1k in M1 (k = 1, 2, . . . , K1) inorder to decide whether it is a sub-class of one of the classes in Ω or it represents a “new” class, which ispresent in I1 but was not present in I0 (see Sec. 2.4);

Proc. of SPIE Vol. 5573 127

1st date image I0

2nd date image I1

“K-means” clustering “K-means” clustering

ground truth map G

hybrid supervised /

unsupervised

classification

image differencing

unsupervised

change detection

(K&I)

label propagation

and detection of new

classes

clustering

map M0

clustering

map M1

difference

image

change

map C

hybrid supervised/unsupervised

classification map H

partially supervised classification map P

corrected image

at date t1

dynamics tuning and cloud removal

corrected image

at date t0

Figure 1. Architecture of the considered partially supervised multitemporal classification method.

• Label propagation phase: the MAP criterion is applied to each cluster σ1k labelled as “old” during theprevious phase, in order to propagate a specific class-label from date t0 to date t1 (k = 1, 2, . . . , K1), thusgenerating a partially supervised classification map P for I1 (see Sec. 2.5).

In order to identify the possible classes present in I1, applying a clustering algorithm is mandatory, sinceno prior information is available at time t1. On the other hand, despite of the availability of ground truthinformation at t0, we choose to perform an unsupervised classification also for the corresponding image I0. Infact, although a supervised classification map is usually more accurate than an unsupervised one, the informationclasses provided by the ground truth are typically multimodal.16 On the contrary, the clusters provided by anunsupervised classifier are, in general, monomodal, thus, they may be considered as sub-classes of the informationclasses provided by the ground truth map.16 As far as the change-detection phase is concerned, according to theresults of the comparative analysis reported in2 and in,3 we apply the Kittler and Illingworth’s unsupervisedthresholding method14 (hereafter simply denoted by K&I) on the modulus of the difference image between I0

and I1, in order to split the “change” from the “no-change” areas inside σ1k (k = 1, 2, . . . , K1) and to generatecorrespondingly the change map C. In fact, the change detection algorithm proposed in2 and based on the K&Ithresholding technique has proved to separate effectively the “change” and “no-change” classes, at the same timeinvolving a small computational burden. However, all change detection approaches based on image differencing,such as the above-mentioned method based on K&I, critically depend on a correct correspondence between thedynamics of the two images. Dealing with optical data, this condition is often violated, for instance, due todifferent atmospheric conditions at the two dates. In the present paper, a preprocessing algorithm is developed,which addresses this operational issue, by both automatically correcting strong dynamics differences and detectingcloud areas. Therefore, a preliminary “Dynamics tuning and cloud removal” phase is integrated in thesystem (see Sec. 2.2).

As a clustering method, we choose to employ “K-means”,13 thanks to its computational simplicity andto the reduced number of input parameters, as compared with other popular methods (e.g., ISODATA13).Specifically, the only crucial parameters to be set are the numbers K0 and K1 of clusters to be searched in I0

and I1, respectively. In8 we performed this choice through a manual “trial-and-error” procedure. In the presentpaper, we propose a fully automatized version of this multitemporal scheme, by expressing the selection of asuitable number Kr of clusters in Ir as the maximization of a functional Jr, which formulates an estimate ofthe probability of correct classification at date tr (r = 0, 1). Specifically, we adopt at date t0 the supervisedestimate developed in.17 Focusing on date t1, we develop here an innovative partially supervised estimate of theprobability of correct classification for I1 and we use it as a criterion to set K1. The computation of the functional

128 Proc. of SPIE Vol. 5573

is integrated in the “new classes detection” and “label propagation” phases (see Sec. 2.5). A complete blockdiagram of the resulting automatized classification architecture is shown in Fig. 1. The following sub-sectionsfocus on the specific modules integrated in the system in order to automatize it.

2.2. Dynamic tuning and cloud removal

Assuming I0 and I1 to be possible multisource images, acquired by the same optical and/or Synthetic ApertureRadar (SAR) sensors, the proposed dynamic tuning algorithm aims at reducing the possible dynamic mismatchesbetween corresponding optical bands acquired at t0 and t1, at the same time reducing the impact of cloud areas onthe change-detection and classification results. Assuming I0 and I1 to include m optical bands and denoting byOj

r the j-th optical band in Ir (j = 1, 2, . . . , m; r = 0, 1), the key-idea of the method is that radiance differencesin “no-change” areas should be interpreted as due to dynamic mismatches. Therefore, in order to correct suchmismatches, the method integrates iteratively the detection of changes with a linear dynamic correction insidethe “no-change” areas. Specifically, the method is initialized by generating a preliminary change map throughthe application of K&I to the modulus of the difference image between I0 and I1 (operating on the whole imagearea without restricting to any cluster). Then the following operations are iterated until convergence:

• linear rescale of each optical band Oj1 with slope and offset computed by matching the local first- and second-

order statistics over the “no-change” areas (j = 1, 2, . . . , m). Specifically, we adopt a mosaic subdivisionof this area in square windows of fixed size: denoting by L, mj

r , and σjr the total number of windows,

the sample-mean of the grey-levels inside the -th window in Ojr and the sample-standard deviation of

the same grey-levels ( = 1, 2, . . . , L; r = 0, 1), the slope aj = σj0 /σj

1 and the offset bj = mj0 − ajmj

1

matching the first- and second-order moments inside each window are computed.1 In addition, in order toavoid undesirable discontinuity artifacts at the boundary between neighboring windows, we avoid rescalingdirectly the grey-levels in the -th window in Oj

1 with coefficients (aj, bj), but we define a univocalcoefficient vector (aj , bj) by averaging all the window-specific coefficients. Then we rescale the whole bandOj

1 with parameters (aj , bj) (j = 1, 2, . . . , m).

• application of K&I to the modulus of the difference image between I0 and the current rescaled version ofI1 inside the current “no-change” area.

We note that K&I always deals with a two-hypothesis problem, i.e., it splits “change” from “no-change”without distinguishing different typologies of change. Specifically, radiance changes are assumed to be due eitherto true land-cover transitions or to differences in the cloud coverage. The iterative application of K&I adopted bythe proposed dynamic-tuning method does not aim at detecting the land-cover changes, but the changes due tocloud presence, which are expected to be strongly evident in the difference-image. In order to deal with differentland-cover transitions, the binary K&I approach is applied in a cluster-oriented way, as described in Sec. 2.1.

2.3. Hybrid classification for the first date image

We denote by T0i and Srk the set of training pixels for ωi in G and the set of pixels assigned in Ir to σrk

(k = 1, 2, . . . , Kr; r = 0, 1), respectively. In order to generate a classification map H for I0, we apply at acluster level the ML classification rule, by assigning each cluster σ0k to the class ωi which exhibits the highestvalue of the cluster probability mass function (pmf) P (σ0k|ωi) (k = 1, 2, . . . , K0). Specifically, the followingrelative-frequency estimate is introduced for this pmf∗:

P (σ0k|ωi) =|T0i ∩ S0k|

∑K0h=1 |T0i ∩ S0h|

=|T0i ∩ S0k|

|T0i| , (1)

since S01 ∪S02 ∪ . . .∪S0K0 = X , i.e., the clustering map M0 is assumed to be exhaustive. In particular, we notethat σ0k is not necessatily assigned to the class ωi exhibiting the highest spatial overlap |T0i ∩ S0k|, due to thepresence of the normalization factor |T0i| at the denominator of Eq. (1). This helps reducing the impact on theclassification result of large differences in the number of training samples available for each class.

∗Given a finite set A, we denote by |A| the cardinality (i.e., the number of elements) of A.


However, a cluster σ0k including no training samples (i.e., T0i ∩ S0k = ∅ for all i = 1, 2, . . . , M) turns out tobe unlabelled. Specifically, a large number of unlabelled samples in the hybrid map H is not desirable: hence, weaim at selecting the number K0 of clusters with the purpose of minimizing both the number of misclassified pixelsand the number of unlabelled samples. The hybrid approach developed in17 generated a collection of clusteringmaps by applying the hierarchical clustering-by-melting algorithm proposed in18 and selected the optimal mapaccording to an estimate of the probability of correct classification. In the present paper, we integrate thisapproach in the proposed multitemporal classification system by generating through “K-means” a sequenceof clustering maps with increasing number of clusters, and by choosing the best map according to the above-mentioned functional. In particular, denoting by A0i the index set of the clusters assigned to ωi (i = 1, 2, . . . , M)by the above-mentioned ML rule, the probability PC of correct classification can be expressed as:

PC =K1∑

k=1

Pcorrect|σ0kP (σ0k) =M∑

i=1

∑

k∈A0i

P (ωi|σ0k)P (σ0k). (2)

since Pcorrect|σ0k = 0 (i.e., Perror|σ0k = 1) for each unlabelled cluster σ0k. Estimating the class posteriorprobabilities P (ωi|σ0k) as a relative frequency over the training set and the cluster prior probabilities P (σ0k) asa relative frequency over the whole image, and plugging the corresponding estimates into Eq. (2), we have thefollowing estimate of the probability of correct classification at date t0:

J0 =M∑

i=1

∑

k∈A0i

|T0i ∩ S0k|∑M

j=1 |T0j ∩ S0k||S0k|N

=1N

M∑

i=1

∑

k∈A0i

|S0k||T0 ∩ S0k| |T0i ∩ S0k| (3)

where T0 = T01 ∪ T02 ∪ . . . ∪ T0M is the total set of training samples. The proposed multitemporal classificationsystem applies “K-means” with several different input numbers of clusters and computes this functional for eachclustering map, thus selecting the map with the highest value of J0, in order to minimize the error probability.

2.4. New classes detection

The approach adopted in Sec. 2.3 based on relative-frequency probability estimates at a cluster level, is generalizedalso to the problems of new-classes detection and cluster labelling at date t1. We denote by D0i, Dch, and Dnc

the sets of pixels assigned to ωi in H, labelled as “change” in C and labelled as “no-change” in C, respectively,and we assume, as a working hypothesis, that only a negligible number of pixels turns out to be unlabelled inthe hybrid map H. This is implicitly based on the assumption that the “hybrid classification step” effectivelygenerated a hybrid map including only a few unlabelled pixels. In addition, also the clustering map M1 at datet1 and the change map C are assumed to be exhaustive (i.e., S11∪S12∪ . . .∪S1K1 = Dch∪Dnc = X). Under thisassumptions, we prove in the Appendix that relative-frequency estimates of the joint probability P (Ω, σ1k) andP (Θ, σ1k) of the cluster σ1k together with the sets of “old” and “new” classes, respectively (k = 1, 2, . . . , K1),can be expressed as follows:

P (Θ, σ1k) = λ|S1k ∩ Dch|

N, P (Ω, σ1k) =

|S1k ∩ Dnc|N

+ (1 − λ)|S1k ∩ Dch|

N(4)

where λ is an estimate of the probability P (Θ|ch) of “new” classes, given “change” (this estimate will be computedlater). By normalizing with respect to the index k, we obtain estimates for the pmfs P (σ1k|Ω) and P (σ1k|Θ) ofthe cluster σ1k conditioned to the set of “old” and “new” classes, respectively (k = 1, 2, . . . , K1), i.e.:

P (σ1k|Θ) =P (Θ, σ1k)

∑K1h=1 P (Θ, σ1h)

=|S1k ∩ Dch|

|Dch| (5)

P (σ1k|Ω) =P (Ω, σ1k)

∑K1h=1 P (Ω, σ1h)

=|S1k ∩ Dnc| + (1 − λ)|S1k ∩ Dch|

|Dnc| + (1 − λ)|Dch| (6)

In particular, the pmf estimate of each cluster σ1k given the “new” classes is proportional to the fraction of“change” pixels in σ1k (k = 1, 2, . . . , K1), which is a reasonable result, since a cluster exhibiting a large number


of “change” pixels is likely to represent a “new” class. On the other hand, the pmf of σ1k conditioned to the“old” classes is a linear combination of two contributions, respectively related to the fractions of “change” and“no-change” pixels inside the cluster. After routine algebraic manipulations, the ML rule, applied with theprobability estimates (5), labels σ1k (k = 1, 2, . . . , K1) as being “new” if and only if:

|S1k ∩ Dch||Dch| >

|S1k ∩ Dnc||Dnc| , i.e.,

|S1k ∩ Dch||S1k| >

|Dch||Dch| + |Dnc| =

|Dch|N

(7)

The ratio |S1k∩Dch|/|S1k| in Eq. (7) is the fraction of “change” pixels inside the cluster σ1k (k = 1, 2, . . . , K1):hence, a cluster is considered as a new class when the percentage of changes inside its area is above the thresholdη = |Dch|/N . It is interesting to note that the ML test (7) does not require calculating the parameter λ,which is involved in the formulation of P (σ1k|Ω) (k = 1, 2, . . . , K1). On the contrary, the result of the test(7) can be used to compute λ: as λ is an estimate of P (Θ|ch), it can be computed as the relative frequencyλ = |Dnew ∩Dch|/|Dch|, where Dnew = ∪k∈NS1k is the set of pixels labelled as “new” and N is the index set ofthe clusters labelled as “new”.

2.5. Partially supervised classification for the second date image

After identifying the possible presence of “new” classes, the clusters labelled as “old” are assigned a specificclass-label in Ω. To this end, the ML rule is applied again to each cluster σ1k (k ∈ N ). Specifically, the followingestimate is proved in the Appendix for the joint probability P (ωi, σ1k) (i = 1, 2, . . . , M ; k = 1, 2, . . . , K1):

P (ωi, σ1k) =|D0i ∩ S1k ∩ Dnc|

N+

1 − λ

M − 1

( |S1k ∩ Dch|N

− |D0i ∩ S1k ∩ Dch|N

)

(8)

and the following estimate is derived for P (σ1k|ωi):

P (σ1k|ωi) =P (ωi, σ1k)

∑K1h=1 P (ωi, σ1h)

=(M − 1)|D0i ∩ S1k ∩ Dnc| + (1 − λ)(|S1k ∩ Dch| − |D0i ∩ S1k ∩ Dch|)

(M − 1)|D0i ∩ Dnc| + (1 − λ)(|Dch| − |D0i ∩ Dch|) (9)

In particular, the pmf of σ1k given ωi turns out to be a lin-Class Apr94 Apr94 May94

training set test set test setbare soil 10437 6689 128wet soil 7580 3891 9103wood 15653 17132 17132cereals 751 742 742corn − − 2225

Table 1. Training and test samples for the dataset employed for experiment.

early increasing function of the fraction of “no-change” pixelsin σ1k which were assigned to ωi in H, and a linearly decreas-ing function of the fraction of “no-change” pixels in σ1k beingassigned to ωi in H. This is a desirable behavior since a largenumber of “no-change” pixels assigned to ωi at date t0 suggeststhat σ1k can be a spectral sub-class of ωi at time t1, whereasa large number of “change” samples being assigned at ωi at t0indicates that σ1k should not be considered as a sub-class ofωi. Plugging these estimates into the ML rule, σ1k (k ∈ N )is assigned to the class ωi which yields the highest value of

P (σ1k|ωi). With calculations similar to the ones made to define the supervised estimate J0, a partially super-vised estimate J1 of the probability of correct classification at date t1 is formulated as follows:

J1 =∑

k∈NP (Θ, σ1k) +

M∑

i=1

∑

k∈A1i

P (ωi, σ1k). (10)

where A1i is the index set of the the second-date clusters assigned to ωi (i = 1, 2, . . . , M). This estimate isemployed as an optimality criterion to select the number K1 of clusters. Specifically, a sequence of clusteringmaps with increasing number of clusters is generated by using “K-means” and the functional J1 is computedover each clustering map. Then, the optimal clustering map (i.e., the optimal number of clusters) is selected,that yields the maximum value of J1 over the sequence of maps.


Figure 2. Apr94:TM band-3.

Figure 3. Apr94, TM band-3: synthetic versions of the image employed to test the dynamic-tuning and cloud-removal technique (from left to right: Apr94.1, ..., Apr94.4).

3. EXPERIMENTAL RESULTS

3.1. Data set for experimentsThe proposed method was validated on a multitemporal and multisource data set acquired in April and May1994 over the province of Alessandria, situated in the Piemonte region in North-West Italy inside the area of thebasin of the Po River (Figs. 2 and 5). The data set consisted of six 484×872 Landsat-5 TM (Thematic Mapper)bands and of an ERS-1 SAR image (C band, VV polarization), and included four information classes (namely,“bare soil”, “wet soil”, “wood”, “cereals”) in the April 1994 image (hereafter denoted simply by “Apr94”) andalso a further “corn” class in the May 1994 image (hereafter denoted by “May94”). It is worth noting that verysimilar spectral responses are obtained for “bare soil” and “cereals” in Apr94 and for “bare soil”, “cereals”, and“corn” in May94. A training map has been adopted for Apr94, whereas no training information is assumed to beavailable for May94. On the other hand, a test map is used at both date to validate quantitatively the accuracyof the classification results. Table 1 gives the number of pre-classified samples available for each informationclass for training and testing purposes.

3.2. Dynamic tuning and cloud removal resultsIn order to test the effectiveness of the proposed cloud-removal and dynamic-tuning algorithm, preliminaryexperiments have been performed first on a synthetic version of Apr94. Four synthetic images (named Apr94.1,..., Apr94.4) have been generated by linearly rescaling the greylevels in each TM band of Apr94 (mapping eachgreylevel z onto the greylevel ζ = 0.25z + 20) and adding synthetic rectangular “change” areas. Specifically,Apr94.1 is characterized by a large “change” area with a strong intensity difference with respect to Apr94,Apr94.2 by a small “change” area, still presenting a large intensity difference, Apr94.3 by a large “change” areawith a low intensity difference with respect to Apr94, and Apr94.4 by a small “change” area with a low intensitydifference (Fig. 3). The proposed method, applied to each couple of images (Apr94, Apr94.v) (v = 1, 2, 3, 4)with a 7 × 7 window, effectively detected the synthetic “change” area (with no missed-alarm pixels in all thefour experiments, with no false-alarms in the experiments with Apr94.1 and Apr94.2 and with only 24 false-alarm pixels in the experiments with Apr94.3 and Apr94.4) and computed the correct greylevel linear mappingrequired to invert the synthetic rescale (i.e., ζ → 4ζ − 80). In particular, only one iteration of the method hasbeen required to reach convergence in the case of Apr94.1 and Apr94.2, thanks to the large intensity of thechange and independently of the size of the “change” area. On the contrary, two iterations are necessary in thecase of Apr94.3 and Apr94.4. In fact, focusing, for instance, on Apr94.3, the rectangular “change” region in themodulus image (Fig. 4) includes two well-defined sub-regions: the first one exhibits grey-levels close to the onesof the “no-change” area, whereas the other one contains pixel intensities more different from the “no-change”ones. Hence, a first iteration of the method is sufficient to identify the latter as “change,” whereas a seconditeration is needed to detect also the former (Fig. 4).

The proposed method has also been tested directly on the couple of images (Apr94, May94). A differentdynamic can be noted by visually comparing these two images (Figs. 2 and 5), mainly due to different atmospheric


Figure 4. Application of the dynamic-tuning and cloud-removalalgorithm to (Apr94, Apr94.3): modulus of the difference-image(left) and change map (right; color legend: white: no-change;grey: change detected at the first iteration; black: change de-tected at the second iteration).

Figure 5. Application of the dynamic-tuning andcloud-removal algorithm to (Apr94, May94): TMband-3 of the original May94 image (left) and ofthe rescaled May94 image with cloud areas high-lighted in white (right).

conditions at the two acquisition dates. Specifically, May94 exhibits several cloud areas, which are not present inApr94; this partial cloud coverage also causes a smaller dynamic of the non-cloud areas in May94 as comparedwith the corresponding areas in Apr94. As shown in Fig. 5, the proposed method effectively detected the cloud-covered regions and reduced the difference between the dynamics of Apr94 and May94. In particular, the methodneeded three iterations to reach convergence and reduced the Root Mean Square Error (RMSE) between thepixel intensities in the “no-change” area of Apr94 and the corresponding intensities in May94 from the originalvalue, equal to 27.80, to 10.71 (i.e., to the 38.53% of the original value; see Fig. 6).

3.3. Multitemporal classification results

The proposed partially supervised classification system has been applied to the “Po-Alessandria” multitemporaldata set by generating through “K-means” several clustering maps for each date, with numbers of clusters rangingfrom 10 to 40. This range has been chosen for Apr94 assuming the possibility of multimodal statistics for eachthematic class with about 2-10 modes per class. The same range has been accepted also for May94. Fig. 7 showsthe behavior of the proposed functional J0 (computed according to the training data available for Apr94) andof the overall accuracy (OA, i.e., the percentage of correctly classified test samples) as functions of the numberK0 of clusters. A similar behavior for the two curves is evident from this plot; this suggests that the first-datefunctional can represent an effective measure of the quality of each clustering solution from the viewpoint ofhybrid classification. In particular, J0 attains its global maximum for K0 = 15, which also corresponds to theglobal maximum of OA (namely, OA = 92.03%). Therefore, for this data set, the maximization of J0 allowsselecting, in the sequence of clustering solutions generated by “K-means”, the map yielding the most accuratehybrid classification result. The optimal hybrid map is selected automatically, without need for the interventionof the user. We note, in particular, that the cluster-class assignment process generated, in this case, no unlabelledclusters; hence, the working hypothesis introduced in Sec. 2.4 turns out to be satisfied.

However, it is worth noting that, increasing K0 in [10, 40], both OA and J0 does not exhibit only one well-defined global maximum, but present several local maxima. Hence, in addition to the selection of an accuratehybrid map through the global maximum, the functional J0 can also be employed to identify, in the sequenceof “K-means” maps, a subset of good clustering solutions, thus reducing the time required by a possible photo-interpretative analysis of the sequence of clustering maps. In particular, we note that, despite the high values ofOA, several of the hybrid maps computed from the “K-means”-based clustering maps, achieve accuracies below70% for the classes “bare soil” and “cereals,” due to the large spectral overlapping between them in the featurespace. In such cases, “K-means” does not allow to handle effectively such overlapping, separating with pooraccuracy the spectral sub-classes of “bare soil” from the ones of “cereals”.


For any K1 = 10, 11, . . . , 40, a partially supervised classification map has been generated for May94, byapplying the “new-classes” detection technique described in Sec. 2.4 and the label propagation algorithm. Foreach clustering map, the method has labelled one or more clusters as “new”: in particular, we note that suchclusters have turned out to be spectral sub-classes either of “corn” or of the“old” “wet soil” class. In fact, a givencluster is labelled as “new” if its “change” percent is above the threshold η: in the present data set, the mainland-cover changes between April 1994 and May 1994 are the appearance of “corn” and the transition from “baresoil” to “wet soil” (due to rice fields flooding). Consequently, the “new”-class condition is fulfilled not only by theclusters representing modes of “corn” (which is really a “new” class) but also by clusters including almost only“wet soil” pixels at May and “bare soil” pixels at April. However, these two conditions cannot be distinguishedwithout prior knowledge about the land-cover classes in May94: in the present experiments, the assignment ofeach cluster labelled as “new” to “corn” or to “wet soil” has been performed by photo-interpretation.

Fig. 8 shows the behavior of the overall accuracy of the27.80

11.47 10.71 10.71

0

5

10

15

20

25

30

0 1 2 3

iteration num ber

RMSE

Figure 6. Behavior of the RMSE between the intensitiesof the “no-change” pixels in Apr94 and the correspondingintensities in May94 as a function of the number of itera-tions of the proposed dynamic-tuning and cloud-removalalgorithm.

resulting partially supervised maps and of the functionalJ1, as functions of the number K1 of clusters. Again wenote a similar behavior for the two curves; in particu-lar, both OA and J1 attain their a global maximum forK1 = 16, which corresponds to a hybrid classificationmap with a good value of OA (namely, 91.11%), despitethe lack to training data for May94. Hence, also at thesecond date, the maximization of the proposed functionalallows selecting the clustering map with the highest clas-sification accuracy in the sequence of maps computed by“K-means”. Focusing on the corresponsing class accura-cies, good results are obtained for “wet soil” and “wood”(both distinguished with accuracies higher than 95%), butpoorer accuracies are obtained with regard to “bare soil,”“cereals,” and “corn,” due to their large spectral overlap-ping. In particular, we note that choosing the optimum

partially supervised map by maximizing J1 aims at optimizing the probability of correct classification. This isfurther confirmed by the similar behaviors of J1 and OA (which is a supervised estimate of the probability ofcorrect classification). However, this approach does not optimize explicitly the accuracy of classification of thesingle classes. As noted also for the April image, both OA and J1 exhibit several local maxima: hence, alsoJ1 can be considered not only as an indicator of a single optimum clustering map but also as a reference guideto the visual analysis of the clustering results, highlighting the presence of several locally optimum maps. Inparticular, all the hybrid maps corresponding to local maxima of J1 exhibit overall accuracies above 85%.

4. CONCLUSIONS

In the present paper the problem of partially supervised multitemporal image analysis has been addressedby adopting a multitemporal semi-supervised classification system previously proposed by the authors and byendowing the system with further processing modules aiming at a complete automatization of the classificationprocess. Specifically, the method deals with a couple of multitemporal remotely sensed images, exploiting groundtruth information only at the first acquisition date. The method integrates the generation of clustering mapsthrough “K-means” for both images with the unsupervised change detection technique based on the Kittlerand Illingworth’s method, in order to generate a hybrid supervised/unsupervised classification map for the firstdate and a partially supervised map for the second date. In this paper, the generation of such maps has beenautomatized completely by expressing both the first- and the second-date classification problems as cluster-levelmaximum-likelihood (ML) classification problems and by developing case-specific relative-frequency estimatesof the conditional probabilities involved in the ML decision rule. The process of selection of the number ofclusters at each date is also integrated with the ML classification and is expressed as the maximization of anestimate of the probability of correct classification. Specifically, a supervised estimate and a partially supervisedestimate have been developed for the first- and the second-date image, respectively. The experiments on a


83%

85%

87%

89%

91%

93%

10 14 18 22 26 30 34 38

K0

OA J0

Figure 7. Plot of the functional J0 (up to a linearrescale) and of the overall accuracy OA for the April 1994image.

78%

80%

82%

84%

86%

88%

90%

92%

10 14 18 22 26 30 34 38

K1

OA J1

Figure 8. Plot of the functional J1 (up to a linearrescale) and of the overall accuracy OA for the April 1994image.

real multitemporal and multisource data set have proved the effectiveness of such estimates, whose behavior asfunctions of the number of clusters turned out to be strictly correlated with the behavior of the overall accuracy(OA) at both dates. In particular, the maximization of such functionals allowed one to select automatically, in agiven sequence of clustering solutions, the clustering map yielding the hybrid/partially supervised classificationmap with the maximum value of OA, which turned out to be above 90% at both dates. In particular, the second-date ML-based approach proved able to detect the presence of “corn” (which was not included in the first-dateimage), despite the lack of prior knowledge about this land-cover type. The classification maps provided verygood classification accuracies for the classes “wet soil” and “wood”, although low accuracies can be encounteredfor “bare soil”, “cereals”, and “corn”. This is interpreted both as resulting from the large spectral overlappingamong such classes in the feature space (which is not handled effectively by a simple clustering procedure like“K-means”) and as a consequence of the selection of the optimal number of clusters according to a functionalrelated to the overall accuracy and not to the accuracy obtained for each class. As a future development of thisresearch work, the improvement of the classification accuracies obtained for the most overlapping classes will beaddressed, for instance, by integrating the use of textural features16 in order to increase the class seperability, bytesting the effectiveness of more sophisticated clustering algorithms (e.g., hierarchical18 or contextual19 clusteringmethodologies) and by modifying the proposed functionals in order to relate them to the average or minimumaccuracy and not only to the overall accuracy.

Furthermore, the problem of a correct matching between the dynamics of the optical bands of the two imageshas been addressed, as a support to the change detection procedures involved by the adopted multitemporalarchitecture. The multitemporal system under analysis has been endowed with a processing module which aimsto detect automatically the differences in cloud coverage at the two dates and to correct possible dynamics differ-ences. Specifically, the method iteratively applies the above-mentioned unsupervised change-detection method,based on the Kittler and Illingworth’s procedure, in order to compute an optimal linear transformation matchingthe dynamics of the “no-cloud” areas. Experiments on a synthetic version of the adopted data set (consisting ofthe first-date image and of several modified versions of this image, obtained by synthetically alterating the dy-namic and by adding simulated changes) showed that the method effectively detected the presence of clouds andcomputed correctly the linear mapping required to invert the synthetic dynamic modification. A further experi-mental analysis involving both acquired images has confirmed these conclusions: the method correctly identifiedthe “cloud” pixels present in the second-date image (and not in the first-date one) and reduced the root meansquare error between the “no-cloud” areas to the 38.53% of the initial value, by automatically computing asuitable linear rescale.

It is worth noting that the resulting multitemporal processing architecture turns out to be completely au-tomatic, as the user only needs to provide the input couple of multispectral and/or multisource images alongwith a training map for the first acquisition date. The method automatically adjusts the dynamic of the twoimages, removes the areas affected by clouds and generates a sequence of “K-means” clustering maps for eachdate. Then, an optimal map is automatically selected at each date by employing the above-mentioned optimality


functionals, and classification maps for both acquisition dates are generated. No parameter setting procedureis required by the method. However, we note that the algorithm also detects the possible presence of “new”classes at the second-date (i.e., classes not present at the first date), although it does not allow labelling thisclasses as corresponding to a given land-cover typology. This is implicit in the partially supervised context,since no prior information is available at the second date, but requires a visual interpretation of the partiallysupervised classification map, in order to assign a semantic land-cover label to each “new” class. On the otherhand, on the employed data set, the functionals used to select the number of clusters presented several localmaxima, typically corresponding to hybrid/partially supervised classification maps with good overall accuracies.This result can also be a support to photo-interpretation since it can highlight the presence of good clusteringsolutions, as alternatives to the automatically-selected optimal one.

5. APPENDIX: RELATIVE-FREQUENCY PROBABILITY ESTIMATES FOR THESECOND-DATE IMAGE

The proposed partially supervised multitemporal classification architecture exploits the spatial overlappingamong each second-date cluster, each first-date class decision region, and the “change” and “no-change” ar-eas to compute estimates of P (ωi, σ1k) and P (Θ, σ1k) (k = 1, 2, . . . , K1; i = 1, 2, . . . , M). Adopting the notationsintroduced in Secs. 2.3-2.5 and denoting by “ch” and “nc” the “change” and “no-change” hypothesis, respectively,the desired estimates have to satisfy the following probabilistic constraints:

P (ωi, σ1k) = P (ωi, σ1k, nc) + P (ωi, σ1k, ch), P (Θ, σ1k) = P (Θ, σ1k, nc) + P (Θ, σ1k, ch), (11)

K1∑

k=1

[

P (Θ, σ1k, nc) +M∑

i=1

P (ωi, σ1k, nc)

]

= P (nc),K1∑

k=1

[

P (Θ, σ1k, ch) +M∑

i=1

P (ωi, σ1k, ch)

]

= P (ch). (12)

Focusing first on the “no-change” components, we note that, given the “no-change” hypothesis, no “new” class isallowed, i.e., P (Θ, σ1k, nc) = 0. In addition, a relative-frequency P (ωi, σ1k, nc) is proportional to |D0i∩S0k∩Dnc|,i.e., P (ωi, σ1k, nc) = A|D0i ∩ S0k ∩ Dnc|, where the proportionality constant A is computed by using Eq. (12):

A

K1∑

k=1

M∑

i=1

|D0i ∩ S0k ∩ Dnc| = P (nc) =|Dnc|

N, (13)

where we have explicitly expressed P (nc) as the relative frequency of “no-change” samples in the image. Assuggested in Sec. 2.3, we assume the first-date hybrid map H to include a negligible number of unlabelled pixels,i.e., we assume D01 ∪D02 ∪ . . .∪D0M = X . In addition, the cluster map M1 is considered to be exhaustive, i.e.,S11 ∪ S12 ∪ . . . ∪ S1K1 = X . Hence, Eq. (13) yields A = 1/N , and consequently

P (ωi, σ1k, nc) =|D0i ∩ S1k ∩ Dch|

N. (14)

With regard to the “change” contributions, we note that P (ωi, σ1k, ch) should be proportional to the numberof “change” pixels assigned to σ1k and not assigned to ωi in H, i.e.,

P (ωi, σ1k, ch) = B

∣∣∣∣∣∣

⋃

j =i

D0j ∩ S1k ∩ Dch

∣∣∣∣∣∣= B(|S1k ∩ Dch| − |D0i ∩ S1k ∩ Dch|) (15)

where B is a normalization constant. Plugging this relation into Eq. (12), setting P (ch) = |Dch|/N , we have:

K1∑

k=1

[

P (Θ, σ1k, ch) + BM∑

i=1

(|S1k ∩ Dch| − |D0i ∩ S1k ∩ Dch|)]

=|Dch|

N. (16)

After routine algebra, this equation reduces to:

P (Θ, ch) + B(M − 1)|Dch| =|Dch|

N=⇒ λ

|Dch|N

+ B(M − 1)|Dch| =|Dch|

N=⇒ B =

1 − λ

(M − 1)N, (17)


where λ = P (Θ|ch). Therefore:

P (ωi, σ1k, ch) =1 − λ

(M − 1)N(|S1k ∩ Dch| − |D0i ∩ S1k ∩ Dch|). (18)

Plugging into Eq. (11) the obtained formulations for P (ωi, σ1k, nc) and P (ωi, σ1k, ch), we derive the expression(8) for P (ωi, σ1k). In addition, summing over i, simple algebraic calculations allow obtaining the expression (4) forP (Ω, σ1k), and to derive P (Θ, σ1k) from the normalization constraint P (Ω, σ1k)+P (Θ, σ1k) = P (σ1k) = |S1k|/N .

ACKNOWLEDGMENTS

This research was funded in part by the Italian Ministry of Education, University and Research (MIUR). Thesupport is gratefully acknowledged.

REFERENCES1. A. Singh, “Digital change detection techniques using remotely -sensed data,” International Journal of Re-

mote Sensing 10, pp. 989–1003, 1989.2. F. Melgani, G. Moser, and S. B. Serpico, “Unsupervised change detection methods for remote sensing

images,” Optical Engineering 41(12), pp. 3288–3297, 2002.3. G. Moser, F. Melgani, and S. B. Serpico, Advances in unsupervised change detection. in Frontiers of Remote

Sensing Information Processing, editor: C. H. Chen, World Scientific Publishing, 2003.4. L. Bruzzone and D. F. Prieto, “Automatic analysis of the difference image for unsupervised change detec-

tion,” IEEE Geoscience and Remote Sensing 38, pp. 1171–1182, 2000.5. P. H. Swain, “Bayesian classification in a time-varying environment,” IEEE Trans. Geosci. Remote Sensing

8, pp. 880–883, 1978.6. B. Jeon and D. A. Landgrebe, “Classification with spatio-temporal interpixel class dependency contexts,”

IEEE Trans. Geosci. Remote Sensing 30, pp. 663–672, 1991.7. L. Bruzzone, D. F. Prieto, and S. B. Serpico, “A neural-statistical approach to multitemporal and multisource

remote-sensing image classification,” IEEE Trans. Geosci. Remote Sensing 37, pp. 1350–1359, 1999.8. G. Moser, F. Melgani, S. B. Serpico, and A. Caruso, “Partially supervised detection of changes from remote

sensing images,” in Proc. of IEEE-IGARSS 2002, Toronto, 1, pp. 299–301, 2002.9. D. F. Prieto and O. Arino, “A partially supervised change detection technique,” in Proc. of IEEE-IGARSS

2001, Sydney, 2, pp. 858–860, 2001.10. L. Bruzzone and D. F. Prieto, “Unsupervised retraining of a maximum-likelihood classifier for the analysis

of multitemporal remote-sensing images,” IEEE Trans. Geosci. Remote Sensing 39, pp. 456–460, 2001.11. L. Bruzzone and R. Cossu, “A multiple-cascade-classifier system for a robust and partially unsupervised

updating of land-cover maps,” IEEE Trans. Geosci. Remote Sensing 40(9), pp. 1984–1996, 2002.12. M. D. Martino, G. Macchiavello, M. G., and S. B. Serpico, “Partially supervised contextual classification of

multitemporal remotely sensed images,” in Proc. of IEEE-IGARSS 2003, Toulouse, 2, pp. 1377–1379, 2003.13. J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison-Wesley Publ. Comp., Massachusetts,

1974.14. J. Kittler and J. Illingworth, “Minimum error thresholding,” Pattern Recognition 19, pp. 41–47, 1986.15. K. Fukunaga, Introduction to statistical pattern recognition, 2nd edition, Academic Press, 1990.16. J. Richards and X. Jia, Remote sensing digital image analysis, Springer-Verlag, Berlin, 1999.17. S. B. Serpico, M. Datcu, G. Moser, S. Mansi, and P. Pecciarini, “Hybrid supervised/unsupervised multisensor

fusion of remote sensing images based on hierarchical clustering,” in Proceedings of the 2003 TyrrhenianInternational Workshop on Remote Sensing, Elba Island (Italy), 2003.

18. Y. F. Wong and E. C. Posner, “A new clustering algorithm applicable to multispectral and polarimetric sarimages,” IEEE Trans. Geosci. Remote Sensing 31(3), pp. 634–644, 1993.

19. Y. Delignon, A. Marzouki, and W. Pieczynski, “Estimation of generalized mixtures and its application toimage segmentation,” IEEE Transactions on Image Processing 6(10), pp. 1364–1375, 2001.


Date post:	23-Nov-2023
Category:	Documents
Upload:	infomus
View:	0 times
Download:	0 times

\u003ctitle\u003eAutomatic partially supervised classification of multitemporal remotely sensed...

Documents