+ All Categories
Home > Documents > 06331573

06331573

Date post: 28-Feb-2018
Category:
Upload: jay-krishna
View: 220 times
Download: 0 times
Share this document with a friend

of 12

Transcript
  • 7/25/2019 06331573

    1/12

    1344 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    Evaluation of SVM, RVM and SMLR for AccurateImage Classification With Limited Ground Data

    Mahesh Pal and Giles M. Foody, Senior Member, IEEE

    AbstractThe accuracy of a conventional supervised classifica-

    tion is in part a function of the training set used, notably impacted

    by the quantity and quality of the training cases. Since it can be

    costly to acquire a large number of highquality training cases, re-

    cent research has focused on methods that allow accurate classifi-

    cation from small training sets. Previous work has shown the po-

    tential of support vector machine (SVM) based classifiers. Here,

    the potential of the relevance vector machine (RVM) and sparse

    multinominal logistic regression (SMLR) approaches is evaluated

    relative to SVM classification. With both airborne and spaceborne

    multispectral data sets, the RVM and SMLR were able to derive

    classifications of similar accuracy to the SVM but required consid-

    erably fewer training cases. For example, from a training set com-prising 600 casesacquired with a conventional stratified random

    sampling designfrom an airborne thematic mapper (ATM) data

    set, the RVM produced the most accurate classification, 93.75%,

    and needed only 7.33% of the available training cases. In com-

    parison, theSVM yielded a classification that had an accuracy of

    92.50% and needed 4.5 times more useful training cases. Similarly,

    with a Landsat ETM+ (Littleport, Cambridgeshire, UK) data set,

    the SVM required 4.0 times more useful training cases than the

    RVM. For each data set, however, the classifications derived by

    each classifier were of similar magnitude, differing by no more

    than 1.25%. Finally, for both the ATM and ETM+ (Littleport)

    data sets, the useful training cases by SVM and RVM had dis-

    tinct and potentially predictable characteristics. Support vectors

    were generally atypical but lay in the boundary region between

    classes in feature space while the relevance vectors were atypical

    but anti-boundary in nature. The SMLR also tended to mostly, but

    not always, use extreme cases that lay away from class boundary.

    The results, therefore, suggest a potential to design classifier-spe-

    cific intelligenttraining data acquisition activitiesfor accurate clas-

    sification from small training sets, especially with the SVM and

    RVM.

    Index TermsGround truth, relevance vector machines, sparsemultinomial logistic regression, support vector machines, trainingdata, typicality.

    I. INTRODUCTION

    L AND cover mapping is one of the most common applica-tions of remote sensing. Land cover maps are produced tomeet the needs of a diverse array of users and are typically de-

    rived via some form of image classification analysis, which is,

    Manuscript received September 30, 2011; revised February 12, 2012; ac-cepted August 02, 2012. Date of publication October 16, 2012; date of currentversion November 14, 2012. This work was supported in part by the Associ-ation of Commonwealth Universities (ACU), London, through a fellowship toM. Pal.

    M. Pal is with the Department of Civil Engineering, NIT Kurukshetra,Haryana, 136119 India (e-mail: [email protected]).

    G. M. Foody is with the School of Geography, University of Nottingham,Nottingham, NG 7 2RD, U.K.

    Digital Object Identifier 10.1109/JSTARS.2012.2215310

    one of the main pattern recognition techniques applied in remote

    sensing. Although remote sensing offers the potential to acquire

    imagery of large areas inexpensively, there are still major costs

    to be incurred in a mapping programme. One major cost to be

    met in a mapping application is associated with ground refer-

    ence data [1][3].

    Ground data requirements may vary from study to study but

    it is common to find that ground data are required to train a su-

    pervised classification analysis and to evaluate map accuracy.

    The training dataset should beclassifier specific. A maximum

    likelihood classifi

    er might need a large sample acquired with arandom sampling design to provide accurate information about

    the mean and variance of the classes while a SVM may need

    only a smaller training set of spectrally extreme cases that lie

    close to the decision boundaries [1].Given that ground reference

    data are expensive anddifficult to acquire, many have sought

    to reduce the groundreference data requirements. While it may

    sometimes be possible to reduce the ground data requirements in

    the testing stage for accuracy assessment [4] most attention has

    focused on the training stage. For example, strategies adopted

    include the use of unlabeled cases in training [2], [5][9], adop-

    tion of pre-processing methods such as feature reduction to re-

    duce training data set requirements [10], [11], use of intelligent

    training selection strategies to focus on the acquisition of in-

    formative training samples [1], [12], [13] and strategies to re-

    duce training set size when attention is focused on a specific

    class [12], [14], [15]. This article develops aspects of previous

    work and focuses on the potential for accurate classification

    with small training sets through the use of contemporary ma-

    chine learning classifiers that may theoretically require only few

    training samples.

    The support vector machine (SVM) has been extensively

    used asa state-of-art supervised classifier with remote sensing

    data [16][21]. A key reason behind its popularity is its ability

    to yield highly accurate classifications, often more accurate than

    from other contemporary approaches such as neural networksand decision trees [20], [22][24]. Moreover, of particular

    concern to this article, research has shown that the SVM may

    be used to produce an accurate classification from a small

    number of useful training cases lying close to the decision

    boundary [1] and that the financial savings to a mapping project

    derived from this feature can be large. For example, [13] show

    a reduction in the total cost of a mapping project by

    focussing attention on the most informative training samples

    forclassification with a SVM.

    The SVM based approach to classification is, however, not

    problem-free. Concerns include the need to define a set of

    parameters [25], [26], an inability to form a full confusion

    1939-1404/$31.00 2012 IEEE

  • 7/25/2019 06331573

    2/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1345

    matrix in some strategies to multi-class classification [27] and

    a lack of information on per-case classification uncertainty[28].

    Other classifiers may sometimes be attractive alternatives to

    the SVM, especially with regard to the aforementioned con-

    cerns with SVM-based classification. Recently, for example,

    [29][31] showed that a Bayesian extension of the SVM, called

    the relevance vector machine (RVM; [32]), can be used as an

    alternative to the SVM for image classification and has the

    ability to provide per-case uncertainty data in the form of poste-

    rior probabilities of class membership. Moreover, comparative

    studies suggest that a RVM may require fewer training cases

    than a SVM in order to classify a data set [29]. It has been

    suggested that that the useful training cases for classification

    by a RVM are anti-boundary in nature while those for use in

    classification by a SVM tend to lie near the boundary between

    classes [32]. The potential to use a small training set and derive

    per-case uncertainty information is also offered by the use the

    sparse multinomial logistic regression (SMLR; [33]) for image

    classification.

    The aim of this study was to evaluate the potential of theRVM and SMLR classifiers for accurate classification from

    small training sets relative to the SVM, which has been eval-

    uated previously [1]. The key focus in the evaluation was

    on the accuracy with which data sets may be classified and

    the number of training cases required. Many studies have

    shown that only a small proportion of a training set acquired

    by conventional sampling methods is actually required for

    accurate classification by classifiers such as the SVM, RVM

    and SMLR [1], [24], [29], [33].A key challenge is finding these

    useful training cases in a way that allows accurate and efficient

    classification. Sometimes researchers have acquired a large

    training sample by conventional methods and then from thisidentified the useful training cases [20], [29], [31], [33][35].

    Such approaches can be inefficient, notably in relation to the

    effort required to collect redundant training samples. A popular

    alternative is to adopt approaches such as active learning in

    which useful training sites are identified in an iterative analysis

    of the image [8], [9]. While attractive such approaches have

    limitations [36]. One key concern is that this type of method

    can only be applied post-image acquisition and can be costly

    and inefficient in terms of ground data acquisition as sites for

    labeling are identified iteratively. The realization of the full

    potential of classification methods such as the SVM, RVM

    and SMLR requires an ability to identify the useful training

    cases and predict their location on the ground in advance of

    the classification [1], [12], [13].This would allow an intelli-

    gent training programme [1], [13] to be defined. For this, it

    is necessary for the characteristics of useful training sites to

    be predictable. Thus in addition to the accuracy with which

    data may be classified, a key focus of this article is the nature

    of the training set required for an accurate classification and

    especially the characterization of the useful training cases to act

    as a guide to their predictability. The remainder of this article

    is structured such that the three classification algorithms used

    are briefly outlined in Section II before presenting the data sets

    they are applied to in Section III. The results of the analyses

    are presented on Section IV and key conclusions drawn inSection V.

    II. CLASSIFICATIONALGORITHMS

    Three classification algorithms were used: SVM, RVM and

    SMLR. All three use the training cases to define the location

    of classification decision boundaries to partition the data space

    such that cases of unknown class membership may be allocated

    to a class. The way the training data are used and nature of

    the classifiers differs, however, and so a brief summary of eachalgorithm is given below. In each discussion the focus is on

    the training of the classifier. Particular attention is paid to the

    training cases that are used to form the decision boundaries.

    A subset of the available cases for training is typically used

    in classification by each of the three selected algorithms. These

    useful training cases are the support vectors, relevance vec-

    tors and retained kernel basis functions in classifications by the

    SVM, RVM and SMLR respectively. In the discussion below a

    training set of cases, represented by , ,

    where is input vector with input

    features (wavebands) and is the

    class vector with classes, is available to the classifiers.

    A. SVM

    The SVM is based on statistical learning theory and has the

    aim of determining the location of decision boundaries that pro-

    duce the optimal separation of the classes [37]. In the case of

    a two-class pattern recognition problem in which the classes

    are linearly separable, the SVM selects from among the infinite

    number of linear decision boundaries the one that minimises the

    generalisation error. Thus, the selected decision boundary will

    be one that leaves the greatest margin between the two classes,

    where the margin is defined as the sum of the distances to the

    hyperplane from the closest points of the two classes [37]. Theproblem of maximising the margin can be solved using standard

    quadratic programming optimisation techniques. The training

    cases that are closest to the hyperplane are used to measure the

    margin and these training cases are termed support vectors.

    Only the support vectors are needed to form the classification

    decision boundaries and these typically represent a very small

    proportion of the total training set. If regions likely to furnish

    support vectors can be predicted then only a small training set,

    comprising the support vectors, may be acquired for a classifi-

    cation [1], [13].

    For a 2-class classification problem (i.e., ),

    the training cases are linearly separable if there exists a weightvector (determining the orientation of a discriminating plane)

    and a scalar (determining the offset of the discriminating plane

    from the origin) such that and the hy-

    pothesis space can be defined by the set of functions given by

    (1)

    The SVMfinds the separating hyperplanes for which the dis-

    tance between the classes, measured along a line perpendic-

    ular to the hyperplane, is maximised. This can be achieved by

    solving following constrained optimization problem

    (2)

  • 7/25/2019 06331573

    3/12

    1346 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    If the two classes are not linearly separable, the SVM tries

    to find the hyperplane that maximises the margin while, at the

    same time, minimising a quantity proportional to the number

    of misclassification errors. The restriction that all training cases

    of a given class lie on the same side of the optimal hyperplane

    can be relaxed by the introduction of a slack variable

    and the trade-off between margin and misclassification error is

    controlled by a positive user-defined constant such that

    [38]. Thus, for non-separable data, (2) can be written as:

    (3)

    SVM can also be extended to handle non-linear decision sur-

    faces. [39] propose a method of projecting the input data onto

    a high-dimensional feature space through some nonlinear map-

    ping and formulating a linear classification problem in that fea-

    ture space. Kernel functions are used to reduce the computa-

    tional cost of dealing with high-dimensional feature space [37].

    A kernel function is defined as andwith the use of a kernel function (1) becomes:

    (4)

    where is a Lagrange multiplier.

    Further and more detailed discussion on SVM can be found

    in [37], [40], [41].

    B. RVM

    The RVM is a recent development in kernel based machine

    learning approaches and can be used as an alternative to SVMfor image classification. The RVM is a possibilistic counterpart

    to the SVM, based on a Bayesian formulation of a linear model

    with an appropriate prior that results in a sparser representation

    than that achieved by SVM. The RVM is based on a hierarchical

    prior, where an independent Gaussian prior is defined on the

    weight parameters in the first level, and an independent Gamma

    hyper prior is used for the variance parameters in the second

    level, which leads to model sparseness [32]. An algorithm pro-

    duces sparse results when among all the coefficients defining

    the model only few are non-zero. This property helps in fast

    model evaluation and provides a potential for accurate classi-

    fication from small training sets. Key advantages of the RVMover the SVM include a reduced sensitivity to the hyperparam-

    eter settings, an ability to use non-Mercer kernels, the provision

    of a probabilistic output, no need to define the parameter , and

    often a requirement for fewer relevance vectors than support

    vectors for a particular analysis [31], [32].

    In a two class classification by RVM, the aim is, essentially,

    to predict the posterior probability of membership for one of the

    classes for a given input. A case may then be allocated to the

    class with which it has the greatest likelihood of membership.

    Using a Bernoulli distribution the likelihood function for the

    analysis would be:

    (5)

    where is a set of adjustable weights. For multiclass classifica-

    tion (5) can be written as:

    (6)

    where is the logistic sigmoid function:

    (7)

    and an iterative method is used to obtain , Let de-

    notes the maximum-a-posteriori estimate of the hyperparameter

    . The maximum-a-posteriori estimate of the weights ( )

    can be obtained by maximizing the following objective func-

    tion:

    (8)

    where the first summation term corresponds to the likelihoodof the class labels and the second term corresponds to the prior

    on the parameters . In the resulting solution, the gradient of

    with respect to is calculated and only those training cases

    having non-zero coefficients , which are called relevance vec-

    tors, will contribute to the generation of a decision function. The

    posterior is approximated around weights by a Gaussian approx-

    imation with

    where is the Hessian of , matrix has elementsand is a diagonal matrix with elements defined

    by .

    An iterative analysis is followed to find the set of weights

    that maximizes the value of (8) in which the hyperparameters

    , associated with each weight are updated. During training,

    the hyperparameter for a large number of training cases will at-

    tain very large value and the associated weights will be reduced

    to zero. Thus, the training process applied to a typical training

    set acquired following standard methods will make most of the

    training cases irrelevant and leave only the useful training

    cases. As a result only a small number of training cases are re-

    quired forfinal classification. The assignment of an individual

    hyperparameter to each weight is the ultimate reason for the

    sparse property of RVM. Further details on the RVM are given

    by [32].

    C. SMLR

    The Sparse Multinomial Logistic Regression algorithm [33]

    utilises a Laplacian prior on the weights of the linear combi-

    nation of functions to enforce sparseness. This prior favours a

    few large weights with many of the others set to exactly zero.

    The SMLR algorithm learns a multi-class classifier based on the

    multinomial logistic regression. This method performs simulta-

    neously a feature selection, to identify a small subset of the most

    relevant features, and the learning of the classification decision

    rules.

  • 7/25/2019 06331573

    4/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1347

    TABLE ITHEMEAN ANDSTANDARDDEVIATIONVALUES OF THE SYNTHETIC DATA

    If is the weight vector associated with class , then the

    probability that a given training case belongs to class is given

    by

    (9)

    Usually a maximum likelihood estimation procedure is used to

    obtain the components of from the training data by maxi-

    mizing the log-likelihood function [42]:

    (10)

    In order to achieve the sparsity, a Laplacian prior ( ) is incor-

    porated while estimating . [32] propose to use a maximum a

    posteriori (MAP) criterion for multinomial logistic regression.

    The estimate of are then given by:

    (11)

    in which is Laplacian prior on , which means that

    where is a user-defined parameter

    and affects the level of sparsity with SMLR. Thus, similar tothe SVM and RVM, the SMLR uses a small number of training

    cases, called retained kernel basis functions, in model creation.

    Further details about SMLR and modified SMLR may be found

    in [10], [33], [43][45].

    III. DATASETS ANDMETHODS

    The three classifiers were used to undertake a series of clas-

    sifications to highlight the potential for accurate classification

    from small training sets. The support vectors, relevance vectors

    and retained kernel basis functions that are central to the classi-

    fication by SVM, RVM and SMLR algorithms respectively will

    be referred to as useful training cases in all classifications.

    Four data sets were used. First, a simple simulated data set

    was used to aid understanding and interpretation of the useful

    training cases. This data set comprised three classes generated

    randomly from Gaussian normal distributions in two wavebands

    (Table I). Here, a training sample of 100 cases of each class was

    randomly generated and made available to each of the classifiers

    and the analyses undertaken using ten-fold cross validation.

    Second, a dataset acquired in bands 1 and 5 from ETM+ of

    a test site near Boston in Lincolnshire UK were used. Atten-

    tion focused on three classes that were abundant at the test site:

    wheat, sugar beet and oilseed rape. One hundred cases of each

    class selected at random were used for training and testing all

    three classifiers. These data were used only to extent the evalu-ation of the characterisation of the useful training cases with the

    simulated data to a real data set. As with the simulated data set,

    ten-fold cross validation was used with ETM+ (Boston) dataset.

    More extensive analyses were undertaken with the remaining

    two data sets with the accuracy of the resulting classifications

    evaluated against ground data.

    The third data set was obtained by Daedalus 1268 airborne

    thematic mapper (ATM) for an agricultural test site near

    Feltwell, UK. The ATM data were acquired in 3 spectral wave-

    bands, with a spatial resolution of 5 m [46]. The ATM data

    were used to classify six different crop types: sugar beet, wheat,

    barley, carrot, potato and grass. A map depicting the crop type

    planted in each field produced near the time of the ATM data

    acquisition was used as ground data to inform the training and

    testing of the classifications. The training sets comprised of

    100 randomly selected pixels of each class for the analyses of

    the ATM data set. The testing set comprised 320 pixels drawn

    at random from the test site.

    The fourth and final data set used was acquired by the

    Landsat ETM+ for an agricultural area near Littleport in

    Cambridgeshire, UK. The data in the six-non-thermal spectralwavebands with a 30 m spatial resolution were used to classify

    seven agriculture land cover types: wheat, sugar beet, potato,

    onion, peas, lettuce and beans [47].A map depicting the crop

    type planted in each field produced near the time of the ETM+

    (Littleport) data acquisitions was used as ground data. For

    each class, 100 randomly selected pixels were used to train the

    classifiers. The accuracy of the classifications was evaluated

    using an independent testing set that comprised 1,400 randomly

    selected pixels.

    For each classification undertaken with the ATM and ETM+

    (Littleport) data sets, accuracy was assessed with the aid of a

    confusion matrix and expressed as the percentage of the testingcases correctly allocated. As the potential for accurate classifi-

    cation by the SVM from small training sets has been demon-

    strated, a desire was to determine if the RVM and SMLR ap-

    proaches were at least as accurate as the SVM classification,

    which may be assessed by a test of non-inferiority. For both the

    RVM and SMLR methods, this was evaluated by using the con-

    fidence interval of the difference in accuracy obtained from that

    observed with the SVM in a test of non-inferiority, which fo-

    cuses on the lower limit of the defined confidence interval [48],

    [49]. In this evaluation it was assumed that the zone of indiffer-

    ence was 2.00%; this value was selected arbitrarily but ensures

    that small differences in accuracy are inconsequential. For all

    experiments, a personal computer with a Pentium IV processor

    and 3 GB of RAM was used.

    SVM were initially designed for binary classification prob-

    lems. A range of methods have been suggested for multi-class

    classification [20], [50], [51]. Here, the one against rest, ap-

    proach with ATM dataset [17], [24] and one against one with

    simulated and ETM+ datasets [51] was used. Throughout, a ra-

    dial basis function kernel with kernel specific parameter ( ) was

    used with SVM, RVM and SMLR algorithms. The softwares

    LIBSVM and BSVM [50], [52] were used to implement the

    SVM whereas software SMLR [33] was used to implement the

    sparse multinomial logistic regression classifier. A multiclass

    implementation of original RVM codes [32]; [53] was used toimplement RVM classifier. Similar to the parameter required

  • 7/25/2019 06331573

    5/12

    1348 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    TABLE IIUSERDEFINED PARAMETERSWITH ALL FOURDATASETSUSED INTHISSTUDY

    TABLE IIIMEANMAHALANOBIS DISTANCEMEASURES COMPUTEDOVERALL USEFUL TRAININGCASES FOR ACLASSBASED ONANALYSES OF THE SIMULATEDDATA

    in the design of SVM classifier, the value of the hyperparameterand parameter influence the accuracy of classifications by

    the RVM and SMLR algorithms respectively. In order to find a

    suitable value for each of the user-defined parameters with the

    different classification algorithms, cross validation and trial and

    error methods were used. Specifically, five-fold cross validation

    was used with the SVM while trial and error procedures were

    used with the RVM and SMLR to find the suitable values for the

    user-defined parameters for the classifications of both the simu-

    lated and real remote sensing data sets. For classification by the

    RVM, the trials involved varying the and values from 0.1 to

    2.0with a step size of 0.1 and to with a step size of

    respectively. For classifi

    cation by the SMLR, the andvalues were varied from 0.1 to 15.0 and 0.1 to 2.5 with a step

    size of 0.1 respectively. For the analyses of all four data sets,

    the optimal values of the user-defined parameters are provided

    in Table II.

    The position of the useful training cases in feature space was

    evaluated visually and quantitatively characterised with mea-

    sures based on their Mahalanobis distance to the centroid of

    each class. The Mahalanobisdistance between a case and a class

    centroid is inversely related to the typicality of the case to the

    class [54], [55]. Thus, a low distance indicates that the case lies

    close to the class centroid and so is typical of the class while

    a large distance indicates that the case is atypical of the class.

    As well as providing a simple guide to the typicality of a case

    to a class, the set of Mahalanobis distances computed over all

    classes for a case may be used to provide a simple descriptorof the location of the case relative to the class centroids and,

    more critically, the decision boundaries. For example, a decision

    boundary may be expected to lie between two class centroids

    and so at a similar Mahalanobis distance from each centroid.

    Thus, if the difference between the two smallest Mahalanobis

    distances computed for a case was small this would indicate

    that the case lies close to the border region between two classes

    and near the location of a decision boundary [56]. Conversely, if

    the difference between the two smallest Mahalanobis distances

    was large the case lies away from the border region between

    two classes and the decision boundary that separates them [56].

    Here, the Mahalanobis distances and the difference between thetwo smallest Mahalanobis distances for each case were com-

    puted to indicate the typicality of each case to a class and its

    position relative to inter-class transition regions respectively.

    IV. RESULTS

    The classifications of the simulated data set allowed the gen-

    eral characteristics of the useful training cases for each classi-

    fier to be determined. Two major attributes of the useful training

    cases were apparent. First, all three classifiers used only a small

    proportion of the available training data set in classifying the

    data (Table III). The total number of useful training cases ranged

    from 6 for the RVM to 76 for the SMLR, representing 2.00%

    and 25.00% of the total sample size respectively. Second, the

  • 7/25/2019 06331573

    6/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1349

    Fig. 1. Location of the useful training cases for classifications of the simulateddata by (a) SVM, (b) RVM and (c) SMLR.

    useful training cases were distributed in feature space in a rel-

    atively systematic fashion (Fig. 1). The location of the useful

    training cases, however, varied between the three classifiers.

    The trends were visually most apparent for class 2. For this

    class, the support vectors were a set of extreme cases that lay at

    the edge of the class distribution and between the distributions

    of the other classes (Fig. 1(a)). As expected, the support vectors,

    therefore, lay in region close to where a classification decision

    Fig. 2. Location of the useful training cases for classifications of the ETM+(Boston) data by (a) SVM, (b) RVM, (c) SMLR.

    boundary would be fitted. With the RVM, the relevance vectors

    were also extreme cases but located away from the boundary re-

    gion (Fig. 1(b)). Note that for all three classes the support vec-

    tors have a relatively large Mahalanobis distance to the actual

  • 7/25/2019 06331573

    7/12

    1350 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    TABLE IVMEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES

    FOR ACLASS BASED ONANALYSES OF THE ETM+ (BOSTON) DATASET

    class of membership and a small difference between the two

    smallest Mahalanobis distances, which indicate that they are

    extreme cases located in the region of a classification decision

    boundary (Table III). The atypical nature of the support vectors

    is perhaps most apparent if the Mahalanobis distance to the ac-

    tual class of membership is expressed as a typicality probability

    [54], with the mean typicality of the support vectors for each

    class being 0.05. The relevance vectors also show a relatively

    large Mahalanobis distance to the actual class of membershipbut a large difference between the two smallest Mahalanobis

    distances, indicating that they are extreme but anti-boundary

    in nature (Table III). Again, for each class, the mean Maha-

    lanobis distance to the actual class for the selected relevance

    vectors equated to typicality probabilities of 0.05 or less. With

    the SMLR, the location of the retained kernel basis functions

    varied from relatively typical (class 2) to atypical and near/anti

    decision boundary (classes 1 and 3) (Fig. 1(c)). Whilethe Maha-

    lanobis distance to the actual class were generally smaller than

    for the SVM and RVM, the useful training cases for classes 1

    and 3 were still highly atypical, with mean typicality proba-

    bilities of . These trends are also apparent in the Maha-lanobis distance based metrics that characterise the location of

    the useful training cases (Table III).

    Keeping in view the uncertainty in the location of useful

    training cases provided by SMLR classifier with simulated

    data set, further analysis with ETM+ (Boston) data set were

    undertaken. The results summarised in Table IV and Fig. 2

    suggests similar trends to those observed with the simulated

    dataset with regard to the location of support vectors and

    relevance vectors. The total number of useful training cases

    with this data set varied from 7 for the RVM to 19 for the

    SVM representing about 2.00% and 6.00% of the total training

    sample respectively. The results indicate that useful trainingcases with SMLR are located away from the class boundary

    for all three classes (Fig. 2). A comparison of Mahalanobis

    distance and difference between the two smallest Mahalanobis

    distance (Table IV) suggests them to be extreme cases lying

    away from class boundary.

    Similar trends were observed with the classifications of the

    ATM data set. Of the 600 training cases available, the SVM,

    RVM and SMLR used only 202, 44 and 101 respectively, repre-

    senting between 7.33%and 33.66% of thetotal set. In thecaseof

    the analyses of the ETM+ (Littleport) data set, the SVM, RVM

    and SMLR used 314, 79 and 172 training cases representing be-tween 11.29% and 44.90% of the total set of 700 training cases.

    The difference between the two smallest Mahalanobis distances

    was generally small for the support vectors but generally large

    for the relevance vectors and the retained kernel basis functions

    for both the ATM (Table V) and ETM+ (Table VI) data sets. The

    results again suggest that useful training cases for the SVM are

    atypical and lie in the border region between classes while for

    the RVM and SMLR the useful training cases are atypical but

    located away from the border region.

    Together, the two attributes of the useful training cases, their

    small number and systematic location in feature space, indicate

    a potential to use small training sets for classification by eachof the three classifiers. Critically, the systematic nature of their

    location in feature space suggests a potential to predict their lo-

    cation on the ground in advance. That is, the systematic location

    of the useful training cases in feature space can be re-projected

    into geographical space to allow intelligent training [1], [13].

    This has been achieved with SVM, for example by deliberately

    focusing training data collection activities on extreme cases that

    are expected to have most spectral similarity to other classes

    [1], [13]. It should also be possible, however, to design intel-

    ligent training data acquisition programmes in ways to focus

    on potentially useful training cases for the RVM and SMLR.

    For example, like the SVM, attention might focus on spectrallyextreme cases, but not those in the border region, when using

    the RVM and SMLR classifiers. The operation of an intelli-

  • 7/25/2019 06331573

    8/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1351

    TABLE VMEANMAHALANOBIS DISTANCEMEASURES COMPUTEDOVERALL USEFULTRAINING CASES FOR ACLASSBASED ONANALYSES OF THE ATM DATA

    gent training scheme requires moving between feature and ge-

    ographical space. For example, the approach used in [13] was

    based on using fundamental knowledge of the variables that in-

    fluence the spectral response to aid the selection of training sites

    on the ground that would be expected to lie at extreme posi-

    tions in feature space. For example, with a crop, extreme cases

    might be expected to occur in regions of differing growth stage

    and cover as well as with differing soil backgrounds. Moreover,

    different extremities can be defined. For example, sites of ex-

    tremely high and low plant cover would be expected to liein dif-

    ferent locations in feature space. Similarly, crops grown on dif-

    ferent soil types or perhaps growing on wet and dry soils wouldbe expected to lie in different, potentially predictable, locations

    of feature space [13], [57]. The precise approach will depend

    on the specific data sets used but provided the useful training

    cases have a potentially predictable nature an intelligent training

    scheme should be feasible. Finally, it is apparent that the results

    also highlight that training data collection programmes should

    be designed in a classifier-specific manner. Note, for example

    with both the ATM and ETM+ (Littleport) data sets, that few of

    the training cases selected as useful by one classifier were also

    selected as useful by another classifier (Table VII).

    The results above indicate that all three classifiers use mostly

    different training cases and so point to a desire for classifier-spe-

    cific training data acquisition programmes. The importance of

    this can be seen in the results of classifications of the ATM data

    derived using a classifier trained upon data useful for another

    classifier. For example, the useful training cases for classifica-

    tion by a SVM (support vectors) were used to train the RVM and

    SMLR classifiers. The resulting classifications had an accuracy

    of 91.00% and 85.00% for the RVM and SMLR respectively;

    both less than the accuracy of 92.50% derived when the sup-

    port vectors identified from the entire training set were used.

    Similarly, the SMLR and SVM yielded classifications with an

    accuracy of 42.50% and 70.31% when trained with the useful

    training cases defined for the RVM; both substantially less than

    the 93.75% obtained when relevance vectors identified from the

    entire training set were used. Lastly, when trained with usefultraining cases for a SMLR classification, the accuracy of the

    SVM and RVM classifications were 87.18% and 78.00% re-

    spectively; again both substantially less than the 92.81% ob-

    tained when the retained kernel basis functions identified from

    the entire training set were used. These results indicate a decline

    in classification accuracy, by all three classification algorithms,

    when trained with useful training cases defined for another clas-

    sifier. Thus, a training set defined for one classifier and able to

    yield an accurate classification may yield a low accuracy if used

    with a different classifier. Taken together, these results highlight

    the impact of the training data on the accuracy of a classification

    and the desire for classifier specific training data acquisition.

    The potential to characterise useful training sites and so to

    design an intelligent training data collection programme offers

  • 7/25/2019 06331573

    9/12

    1352 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    TABLE VIMEAN MAHALANOBIS DISTANCE MEASURES COMPUTED OVER ALL USEFUL TRAINING CASES

    FOR ACLASS BASED ONANALYSES O F THE ETM+(LITTLEPORT) DATA

    TABLE VIINUMBER OFCOMMON USEFUL TRAINING CASES

    attractive benefits relative to alternative approaches for efficient

    training such as active learning. The ideal training set is ac-

    quired at or close to the time of image acquisition. Intelligent

    training allows the location of potentially useful training sites to

    be predicted in advance of an analysis, as in [13], and so allow

    close temporal coincidence of ground and image data acquisi-

    tions. As noted above, methods such as active learning can only

    be applied post-image acquisition. In some situations the time

    gap between theacquisition of ground and image data may be a

    source of error and uncertainty (e.g., the image may have been

    acquired justbefore crop harvesting and so show the presence

    of a mature crop but post-image acquisition ground data surveysmight find bare fieldsetc.). Additionally, as the active learning

    methods highlight useful training sites iteratively their use could

    require a series of ground data collection programmes. Such a

    situation does not allow efficient design of ground data collec-

    tion with multiple, possibly overlapping,field programmes re-

    quired.The magnitude of the concerns will vary as a function of

    factors such as class temporal variability and the source of the

    ground data. It should also be noted that the various approaches

    to efficient training may be complementary and so could per-

    haps be usefully combined. For example, intelligently selected

    training sites could act perhaps as seeds or starting point for the

    selection of other potentially useful but unlabeled pixels.

    A key result is that smaller training sets than required for

    the SVM may be used by the RVM and SMLR with ATM and

  • 7/25/2019 06331573

    10/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1353

    TABLE VIIICONFUSIONMATRICES FOR THECLASSIFICATIONS OF THEATM DATA(A) SVM, (B) RVM AND (C) SMLR. THEOVERALL ACCURACY OF THECLASSIFICATIONS

    WAS93.75% FORRVM, 92.50% FORSVM AND 92.81% FORSMLR. PER-CLASSACCURACY(%) SHOWNFROMUSERS ANDPRODUCERSPERSPECTIVES

    TABLE IXNON-INFERIORITY TEST RESULTSRELATIVE TOSVM BASED ON95%

    CONFIDENCE INTERVAL ON THE ESTIMATEDDIFFERENCE INACCURACY. NOTETHAT THEDIFFERENCES INACCURACY WERE ALL VERYSMALL( )ANDINSIDE THE DEFINED ZONE OFINDIFFERENCE.

    ETM+ (Littleport) data sets, making them attractive alternatives

    to the established SVM forimage classification.The value of this

    attribute, however, is a function of the accuracy and compu-

    tational cost of the classifications. In terms of classification

    accuracy, all three classifiers produced highly accurate classi-

    fications of the ATM data set (Table VIII). Critically, the lower

    limit of the derived 95% confidence interval for the difference in

    accuracy from the SVM classification was above 0 for both the

    RVM and SMLR classificationsand the entire intervallay within

    thezoneof indifference, indicatingthatthe RVMand SMLR clas-

    sifications were statistically non-inferior to that from the SVM

    at the 97.5% level of confidence with ATM data set. Indeed their

    estimated accuracies of 93.75% and 92.81% respectively were

    marginally higher than that from the SVM (92.50%; Table IX).

    TABLE XVARIATION OFCLASSIFICATIONACCURACY ANDNUMBER OFRELEVANCE

    VECTORS WITH USINGATM DATASET.

    It is evident that the RVM produced the highest accuracy yet

    required the smallest training set, although it should be noted

    that the results of the trial analyses highlighted variation in the

    number of relevance vectors needed and classification accuracy

    with the value of the parameter; at large values of the entire

    set of available training cases were required highlighting the

    importance of careful parameter value selection (Table X). In

    the case of analyses with the ETM+ (Littleport) data set, the

    classifications were of similar accuracy with the classification

    obtained by RVM (80.21%) slightly lower than that from

    the SVM (81.36%)and SMLR (81.71%), though critically

    the RVM and SMLR classifications were not inferior to the

  • 7/25/2019 06331573

    11/12

    1354 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 5, NO. 5,OCTOBER 2012

    TABLE XICOMPUTATIONAL COST AND THENUMBER OFUSEFUL TRAININGCASESUSED

    BY THECLASSIFIERS.

    SVM with the confidence intervals lying within the zone of

    indifference. Additionally, the SVM required approximately

    4.0 and 1.8 times the useful training cases used by RVM and

    SMLR respectively. In terms of training and testing time used

    by SVM, RVM and RMLR, precise value for the computationalcost cannot be exactly compared because all three algorithms

    were implemented using different programming languages.

    Nevertheless, a comparison of computational cost suggests that

    the RVM and, to a lesser degree, the SMLR were computation-

    ally more demanding than the SVM (Table XI), which may be

    a concern for analyses of large data sets.

    V. CONCLUSIONS

    The potential of SVM for accurate classification from small

    training sets has been established in previous research. Other

    classifiers such as RVM and SMLR, however, offer additionalfeatures, such as information on per-case classification uncer-

    tainty that may sometimes be useful. Here, it has been shown

    that the RVM and SMLR are able to classify data to similar

    accuracies to the SVM. Moreover, both RVM and SMLR re-

    quire fewer training cases than a SVM when used with remotely

    sensed data. Additionally, the useful training cases for SVM and

    RVM classifiers have different but well-defined characteristics

    which may make them easily predictable. The training cases

    for the SMLR was also mostly well characterised, being of an

    extreme nature and lying away from class boundaries. Conse-

    quently, it may be possible to predict potentially useful training

    sites, especially for the SVM and RVM.

    ACKNOWLEDGMENT

    Dr. Pal wishes to thank the Association of Commonwealth

    Universities for this fellowship. The authors thank the School

    of Geography, University of Nottingham, for use of computing

    facilities. The ATM data were acquired as part of European

    AgriSAR campaign. For SVM, LIBSVM and BSVM packages

    were made available by C.-J. Lin of National Taiwan Univer-

    sity, SMLR package was provided by A. Hartemink, Duke Uni-

    versity and multiclass RVM was provided by Y.-f. Mao, Elec-

    tronics and Information Department, SCUT, Guangzhou, China.

    The authors are also grateful to the editors and the referees for

    their helpful comments on the original manuscript.

    REFERENCES

    [1] G. M. Foody and A. Mathur, Toward intelligent training of supervisedimage classifications: Directing training data acquisitionfor SVM clas-

    sification,Remote Sens. Environ., vol. 93, no. 12, pp. 107117, Oct.2004.

    [2] M. Chi and L. Bruzzone, A semilabeled-sample-driven baggingtechnique for ill-posed classification problems,IEEE Geosci. Remote

    Sens. Lett., vol. 2, no. 1, pp. 6973, Jan. 2005.[3] P. Mantero, G. Moser, and S. B. Serpico, Partially supervised classifi-

    cation of remote sensing images through SVM-based probability den-sity estimation,IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp.559570, Mar. 2005.

    [4] G. M. Foody, Assessing the accuracy of land cover change with im-perfect ground reference data, Remote Sens. Environ., vol. 114, no.10, pp. 22712285, Oct. 2010.

    [5] L. Bruzzone, M. Chi, and M. Marconcini, A novel transductive SVMfor the semisupervised classification of remote-sensing images,IEEETrans. Geosci. Remote Sens., vol. 44, no. 11, pp. 33633373, Nov.2006.

    [6] M. Marconcini, G. Camps-Valls, and L. Bruzzone, A composite

    semisupervised SVM for classification of hyperspectral images,IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 234238, Apr.2009.

    [7] L. Bruzzone and C. Persello, A novel context-sensitive semi-supervised SVM classifier robust to mislabeled training samples,

    IEEE Trans. Geosci. Remote Sens ., vol. 47, no. 7, pp. 21422154, Jul.2009.

    [8] S. Rajan, J. Ghosh, and M. M. Crawford, An active learning approachto hyperspectral data classification, IEEE Trans. Geosci. Remote

    Sens., vol. 46, no. 4, pp. 12311242, Apr. 2008.

    [9] D. Tuia,F. Ratle, F. Pacifici, M. F. Kanevski, and W.J. Emery, Activelearning methods for remotesensing imageclassification,IEEE Trans.

    Geosci. Remote Sens., vol. 47, no. 7, pp. 22182232, Jul. 2009.

    [10] P. Zhong,P. Zhang,and R. Wang, Dynamic learning of SMLR forfea-ture selection and classification of hyperspectral data, IEEE Geosci.

    Remote S ens. Lett., vol. 5, no. 2, pp. 280284, Apr. 2008.[11] M. Pal and G. M. Foody, Feature selection for classification of hyper-

    spectral data by SVM,IEEE Trans. Geosci. Remote Sens., vol. 48, no.

    5, pp. 22972307, May 2010.[12] G. M. Foody and A. Mathur, The use of small training sets containing

    mixed pixels for accurate hard image classification: Training on mixedspectral responsesfor classificationby a SVM,Remote Sens. Environ.,vol. 103, no. 2, pp. 179189, Jul. 2006.

    [13] A. Mathur and G. M. Foody, Crop classification by support vectormachine with intelligently selected training data for an operational ap-plication, Int. J. Remote Sens., vol. 29, no. 8, pp. 22272240, Apr.2008.

    [14] C. Sanchez-Hernandez, D. S. Boyd, and G. M. Foody, One-class clas-sification for mapping a specific land cover class: SVDD classifica-tion of fenland,IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp.10611073, Apr. 2007.

    [15] W. Li, Q. Guo, and C. Elkan, A positive and unlabeled learning algo-rithm for one-class classification of remote-sensing data,IEEE Trans.Geosci. Remote Sens., vol. 49, no. 2, pp. 717725, Feb. 2011.

    [16] J. A. Gualtieri and R. F. Cromp, Support vector machines for hyper-spectral remote sensing classification, inProc. 27th AIPR Workshop:Advances in Computer Assisted Recognition, Washington, DC,Oct. 27,1998, pp. 221232.

    [17] C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment ofsupport vector machines for land cover classification,Int. J. RemoteSens., vol. 23, no. 4, pp. 725749, Feb. 2002.

    [18] G. Zhu and D. G. Blumberg, Classification using ASTER data andSVM algorithms; The case study of Beer Sheva, Israel, Remote Sens.

    Environ., vol. 80, no. 5, pp. 233240, May 2002.[19] M. Pal and P. M. Mather, Assessment of the effectiveness of support

    vector machines for hyperspectral data, Future Gen. Comput. Syst.,vol. 20, no. 7, pp. 1215122, Oct. 2004.

    [20] F. Melgani and L. Bruzzone, Classification of hyperspectral remote

    sensing images with support vector machines, IEEE Trans. Geosci.Remote Sens., vol. 42, no. 8, pp. 17781790, Aug. 2004.

    [21] D. Lu and Q. Weng, A survey of image classification methods andtechniques for improving classification performance, Int. J. Remote

    Sens., vol. 28, no. 5, pp. 823870, Mar. 2007.

  • 7/25/2019 06331573

    12/12

    PAL AND FOODY: EVALUATION OF SVM, RVM AND SMLR FOR ACCURATE IMAGE CLASSIFICATION WITH LIMITED GROUND DATA 1355

    [22] B. Waske and J. A. Benediktsson, Fusion of support vector machines

    for classification of multisensor data, IEEE Trans. Geosci. RemoteSens., vol. 45, no. 12, pp. 38583866, Dec. 2007.

    [23] M. Pal and P. M. Mather, Some issue in classification of DAIS hy-perspectral data, Int. J. Remote Sens., vol. 27, no. 14, pp. 28952916,

    Jul. 2006.[24] G. M. Foody andA. Mathur, A relative evaluation of multiclass image

    classification by support vector machines, IEEE Trans. Geosci. Re-

    mote Sens., vol. 42, no. 6, pp. 13351343, Jun. 2004.[25] M. Pal, Kernelmethods in remote sensing: A review,ISH J. Hydraul.Eng. (Special Issue), vol. 15, no. 1, pp. 194215, May 2009.

    [26] G. Mountrakis, J. Im, and C. Ogole, Support vector machines in re-mote sensing: A review, ISPRS J. Photogramm. Remote Sens. , vol.66, no. 3, pp. 247259, May 2011.

    [27] A. Mathur and G. M. Foody, Multiclass and binary SVM classifica-tion: Implications for training and classification Users,IEEE Geosci.

    Remote Sens. Lett., vol. 5, no. 2, pp. 241245, Feb. 2008.[28] J. Platt,Probabilistic outputs forsupport vectormachinesand compar-

    ison to regularized likelihood methods, in Advances in Large MarginClassifiers, A. Smola, P. Bartlett, B. Schlkopf, and D. Schuurmans,Eds. Cambridge, MA: MIT Press, 2000, pp. 6174.

    [29] B. Demir and S. Ertrk, Hyperspectral image classification using rel-evance vector machines,IEEE Geosci. Remote Sens. Lett., vol. 4, no.

    4, pp. 586590, Apr. 2007.

    [30] G. M. Foody, RVM-based multi-class classification of remotelysensed data,Int. J. Remote Sens., vol. 29, no. 6, pp. 18171823, Mar.

    2008.[31] F. A. Mianji and Y. Zhang, Robust hyperspectral classification using

    relevance vector machine,IEEE Trans. Geosci. Remote Sen s., vol. 49,no. 6, pp. 21002112, Jun. 2011.

    [32] M. E. Tipping, Sparse Bayesian learning and the relevance vector ma-

    chine,J. Mach. Learn. Res., vol. 1, pp. 211244, Jun. 2001.[33] B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink,

    Sparse multinomial logistic regression: Fast algorithms and general-ization bounds,IEEE Trans. Pattern Analysis Mach. Intell. , vol. 27,no. 6, pp. 957968, Jun. 2005.

    [34] G. Camps-Valls, L. Gmez-Chova, J. Calpe-Maravilla, J. D. Martn-Guerrero, E. Soria-Olivas, L. Alonso-Chord, and J. Moreno, Robustsupport vector method for hyperspectral data classification and knowl-edge discovery,IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp.

    15301542, Jul. 2004.[35] G. Camps-Valls and L. Bruzzone, Kernel-based methods for hyper-

    spectral image classification,IEEE Trans. Geosci. Remote Sens., vol.43, no. 6, pp. 13511362, Jun. 2005.

    [36] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Muoz-Mar, Asurvey of active learning algorithms for supervised remote sensingimage classification,IEEE J. Sel. Topics Signal Process., vol. 5, no.3, pp. 606617, Jun. 2011.

    [37] V. N. Vapnik, The Nature of Statistical Lea rning Theo ry. New York:Springer-Verlag, 1995.

    [38] C. Cortes and V. Vapnik, Support-vector networks, MachineLearning, vol. 20, no. 3, pp. 273297, Mar. 1995.

    [39] B. E. Boser, I. M. Guyon, and V. Vapnik, A training algorithm for

    optimum margin classifiers, in Proc. 5th Annual Workshop on Com-putational Learning The ory (CO LT 92), New York, Jul. 2729, 1992,pp. 144152.

    [40] N. Cristianini and J. Shawe-Taylor, An Introduction to S upport VectorMachines. Cambridge, UK: Cambridge University Press, 2000.

    [41] G. Camps-Valls and L. Bruzzone, Kernel Methods for Remote SensingData Analysis. Chichester, UK: Wiley, 2009.

    [42] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. New York:Springer-Verlag, 2001.

    [43] J. Borges, J. Bioucas-Dias, and A. Maral, Fast sparse multinomialregression applied to hyperspectral data, presented at the Int. Conf.Image Analysis and Recognition- ICIAR 2006, Pvoa de Varzim, Por-

    tugal, 2006.[44] J. Borges, J. Bioucas-Dias, and A. Maral, Bayesian hyperspectral

    image segmentation with discriminative class learning, IEEE Trans.

    Geosci. Remote Sens., vol. 49, no. 6, pp. 21512164, Jun. 2011.[45] J. Li, J. M. Bioucas-Dias, and A. Plaza, Hyperspectral image seg-

    mentation using a new Bayesian approach with active learning,IEEETrans.Geosci. Remote Sens., vol. 49,no. 10,pp. 39473960, Oct. 2011.

    [46] G. M. Foodyand M. K. Arora, Anevaluationof some factors affecting

    the accuracy of classification by an artificial neural network, Int. J.Remote S ens., vol. 18, no. 4, pp. 799810, Mar. 1997.

    [47] M. Pal and P. M. Mather, An assessment of the effectiveness of deci-sion tree methods forland cover classification,Remote Sens. Environ.,vol. 86, no. 4, pp. 554565, Oct. 2003.

    [48] J. L. Fleiss, B. Levin, and M. C. Paik, Sta tistical M ethods for Rates &Proportions, 3rd ed. New York: Wiley-Interscience, 2003.

    [49] G. M. Foody, Classifi

    cation accuracy comparison: Hypothesis testsand the use of confidence intervals in evaluations of difference, equiv-

    alence and non-inferiority,Remote Sens. Environ., vol. 113, no. 8, pp.16581663, Aug. 2009.

    [50] C.-W. Hsu and C.-J. Lin, A comparison of methods for multi-classsupport vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2,

    pp. 415425, Feb. 2002.[51] M. Pal, Multiclass approaches for support vector machine based land

    cover classification, in8th Annual Int. Conf., Map India 2005, 2005.[52] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector

    machines,ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 127,Apr. 2011.

    [53] X.-M. Xu,Y.-F. Mao, J.-N. Xiong,and F.-L. Zhou, Classification per-formance comparison between RVM and SVM, in IEEE Int. Work-

    shop on Anti-Counterfeiting, Security, Identification, Xiamen, China,

    Apr. 1618, 2007, pp. 208211.

    [54] G. M. Foody, N. A. Campbell, N. M. Trodd, and T. F. Wood, Deriva-tion and applications of probabilistic measures of class membership

    from the maximum-likelihood classification,Photogramm. Eng. Re-mote Sens., vol. 58, no. 9, pp. 13351341, Sep. 1992.

    [55] N. A. Campbell, Some aspects of allocation and discrimination, inMultivariate Statistical Methods in Physical Anthropology, G. N. VanVark and W. W. Howells, Eds. Dordrecht, The Netherlands: Reidel,

    1984, pp. 177192.[56] G. M. Foody, The significance of border training patterns in clas-

    sification by a feed-forward neural network using back propagationlearning,Int. J. Remote Sens., vol. 20, no. 18, pp. 35493562, Dec.1999.

    [57] G. M. Foody, On training and evaluation of SVM for remote sensingapplications, in Kernel Methods for Remote Sensing Data Analysis,G. Camps-Valls and L. Bruzzone, Eds. Chichester, UK: Wiley, 2009,pp. 85109.

    Mahesh Palreceived the Ph.D. degree from the Uni-versity of Nottingham, U.K., in 2002.

    He is presently an Associate Professor in theDepartment of Civil Engineering, NIT Kurukshetra,Haryana, India. His major research areas are landcover classification, feature selection and applicationof artificial intelligence techniques in various civilengineering application.

    Dr. Pal is on the editorial board ofRemote SensingLetters. Part of the research work reported in thispaper was carried out when Dr. Pal was on a com-

    monwealth fellowship in the University of Nottingham during the period ofOctober 2008March 2009.

    Giles M. Foody (M01) received the B.Sc. andPh.D. degrees in geography from the Universityof Sheffield, Sheffield, U.K., in 1983 and 1986,respectively.

    He is currently Professor of Geographical Informa-tion Science at the University of Nottingham, U.K.His main research interests focus on the interface be-tween remote sensing, ecology and informatics.

    Dr. Foody is currently Editor-in-Chief of theInternational Journal of Remote Sensingand of therecently launched journal Remote Sensing Letters,

    holds editorial roles with Landscape Ecologyand Ecological Informatics, andserves on the editorial board of several other journals. He was awarded theRemote Sensing and Photogrammetry Societys Award, its highest award, forservices to remote sensing in 2009.