1
INTEGRATION OF SUPPORT VECTOR MACHINES AND MARKOV RANDOM FIELDS FOR REMOTE SENSING IMAGE CLASSIFICATION
Paolo Irrera, Gabriele Moser, Sebastiano B. Serpico
University of Genoa, Dept. of Biophysical and Electronic Eng. (DIBE),Via Opera Pia 11a, I-16145 Genoa Italy
2
OUTLINE
• Introduction– Remote sensing image classification– Objective of the paper– Support vector machines– Markov random fields
• Methodology– Markovian proposed method– Architecture of the method– Parameter optimization
• Experimental results– Confusion matrices– Classification maps
• Conclusions
3
REMOTE SENSING IMAGE CLASSIFICATION
• Techniques that aim at labeling each image pixel as belonging to a thematic class.
• Examples of applications:– land-use or land-cover mapping;– urban-area mapping;– forest inventory;– snow-cover mapping.
• Many approaches have been proposed for supervised classification:– parametric and nonparametric Bayesian;– neural;– fuzzy;– support vector machines (SVMs),– …
4
OBJECTIVE OF THE PAPER
• Key-idea of SVMs:– identifying an optimal linear discriminant hypersurface in a suitable
nonlinearly trasformed feature space.
• Good analytical properties (generalization capability) and excellent performance in many applications (e.g., object recognition, hyperspectral image classification).
• Limitation:– SVMs focus on i.i.d (indipendent and identically distribuited) samples;– in image classification, this implies an intrinsically noncontextual
approach. • Objective of the paper:
– integration of the SVM and Markov random field (MRF) approaches to classification, aiming at a rigorous contextual generalization of SVMs.
5
SVM CLASSIFIER
• It exploits the information associated to the samples located at the interface between distinct classes (support vectors).
• Training is expressed as a quadratic programming problem.
• The nonlinear transformation of the feature space is implicitly defined by a kernel function K(x,y), that allows a nonlinear problem to be formalized as a linear problem without a relevant increase in computational complexity.
• Here, we use a gaussian kernel.
Quadratic programming problem
Discriminant function, nonlinear case, two classes
6
MARKOV RANDOM FIELDS• MRFs constitute a general family of stochastic models for the contextual
information associated with an image, in Bayesian image-analysis problems.
• They allow global stochastic models to be formalized according to the local statistical relationships among neighboring pixels (Hammersley-Clifford’s theorem).
• When modeling the random field of the thematic class labels as an MRF, the “maximum a-posteriori” criterion can be formalized as the minimization of a suitable energy function:
7
INTEGRATING MRF AND SVM• Here, we prove that, under proper
assumptions, the Markovian minimum-energy decision rule can be reformulated as the application of a SVM discriminant function in a transformed feature space, associated to a suitable “contextual kernel”.
• Contextual information is formalized through an additional feature (“stacked vector”)
• A modified kernel function fuses contextual and noncontextual information (the linear combination of two related contributions).
• In this framework, a novel classifier is introduced by using the “iterated conditional mode” approach.
Discriminant function.
Contextual kernel
Kernel-based expression of the discriminant function
8
PROPOSED CLASSIFIER
I = image n channels to be classified.
T = training map.
m = update classification map at each iteration.
9
PARAMETER OPTIMIZATION
• The method presents the following parameters:– SVM regularization parameter C;– variance of the Gaussian kernel;– weight parameter λ of the spatial kernel contribution.
• Algorithms used for parameter estimation: Powell, Ho-Kashyap.
• Powell’s algorithm is a local unconstrained minimization method for multidimensional spaces. It does not involves derivatives and is applied here to the cross-validation error (nondifferentiable function) to optimize C and the variance of the Gaussian kernel.
• For the estimation of λ a recently proposed approach, based on the Ho-Kashyap’s algorithm for the optimization of weight parameters in MRF models, has been used.
10
DATA SETS FOR EXPERIMENTS
• Data set “Pavia”– SIR-C/XSAR– Rural area (near Pavia)– 700 x 280 pixels– 4 channels (XSAR channel is
shown in the figure)– Medium resolution (25m)– Main classes: “dry soil” and “wet
soil”.
• Data set “Tanaro”– COSMO/SkyMed– Flood of the Tanaro River near
Alessandria– 3155 x 1695 pixels– single-channel– Very high resolution (1m)– Main classes : “dry soil” and
“water or flooded soil”.
• Spatially disjoint training and test fields are available for both data sets.
11
EXPERIMENTAL RESULTSCONFUSION MATRICES AND ACCURACIES
Pavia. Confusion matrix, noncontextual SVM. Pavia. Confusion matrix, proposed method.
Tanaro. Confusion matrix, noncontextual SVM. Tanaro. Confusion matrix, proposed method.
12
EXPERIMENTAL RESULTSCLASSIFICATION MAPS
Pavia: map generated by a noncontextual SVM. Pavia: map generated by the proposed method.
Tanaro: map generated by a noncontextual SVM. Tanaro : map generated by the proposed method.
13
EXPERIMENTAL RESULTSCONVERGENCE OF THE METHOD
Tanaro: behavior of the accuracy (overall accuracy – OA, average accuracy – AA, and crossvalidation accuracy – XVAL) as a function of the number of iterations of the proposed method.
14
CONCLUSIONS• A feasible Markovian extension of SVM to contextual classification
has been introduced.• Experiments with real data suggest that the proposed method
allows a significant accuracy increase to be obtained, as compared to a standard (noncontextual) SVM.
• Very accurate results on different types of remote-sensing data, including very high resolution COSMO/SkyMed SAR data.
• Possible future extensions:– theoretical analysis of convergence properties (even though no
experimental evidence was collected about possibly critical convergence issues);
– testing the method with other typologies of remote-sensing data (in particular, optical and hyperspectral images) and with more sophisticated MRF models.
15
REFERENCES[1] J. Besag. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical
Society, (6):192–236, 1974.[2] R. Brent. Algorithm for minimization without derivatives, chapter 5. Englewood Cliffs, NJ: Prentice-Hall, 1973.[3] C. J. Burges. A tutorial on support vector machines for pattern recognition. Research report, Kluwer
Academic Publishers, 1998.[4] N. Cristianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based
learning methods. Cambridge University Press, 2000.[5] M. Datcu, K. Seidel, and M. Walessa. Spatial information retrieva from remote sensing images: Information
theoretical perspective. IEEE Trans. Geosci. Remote Sensing, 36(5):1431–1445, 1998.[6] R. Dubes and A. Jain. Random fields models in image analysis. J. Appl. Stat., 16(2):131–163, 1989.[7] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley Interscience, 2001.[8] S. Geman and D. Geman. Sochastic relaxation Gibbs distributions and the bayesian restoration. IEEE Trans.
Pattern Anal. Mach. Intell., 6):721–741, 1984.[9] D. A. Landgrebe. Signal theory methods in multispectral remote sensing. Wiley-InterScience, 2003.[10] F. Melgani and S. B. Serpico. A Markov random field approach to spatio-temporal contextual image
classification. IEEE Trans. Geosci. Remote Sensing, 41(11):2478–2487, 2003.[11] G. Moser. Analisi di immagini telerilevate per osservazione della Terra, pages 7–48 and 140–197. ECIG,
2006.[12] C. Oliver and S. Quegan. Understanding synthetic aperture radar images. SciTech Publishing, 2004.[13] W. K. Pratt. Digital image processing. Wiley Interscience, 2007.[14] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical recipes in C, pages 394–455.
Cambridge University Press, New York, NY, U.S.A., 1992.[15] J. Richards and X. Jia. Remote sensing digital image analysis. Springer, 2005.[16] S. B. Serpico and G. Moser. Weight parameter optimization by the Ho-Kashyap algorithm in MRF model for
supervised image classification. IEEE Trans. Geosci. Remote Sensing, 44(12):3695–3705, 2006.[17] A. H. S. Solberg. Flexible nonlinear contextual classification. Pattern Recognit. Lett., 25(13):1501–1508,
2004.[18] V. N. Vapnik. Statistical learning theory. Wiley Interscience, 1998.