+ All Categories
Home > Documents > Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset...

Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset...

Date post: 01-Aug-2018
Category:
Upload: trantu
View: 235 times
Download: 0 times
Share this document with a friend
7
Multi-spectral dataset and its application in saliency detection Qi Wang, Guokang Zhu, Yuan Yuan Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, PR China article info Article history: Received 14 July 2012 Accepted 3 July 2013 Available online 26 July 2013 Keywords: Multi-spectral Near-infrared Saliency Regression model abstract Saliency detection has been researched a lot in recent years. Traditional methods are mostly conducted and evaluated on conventional RGB images. Few work has considered the incorporation of multi-spectral clues. Considering the success of including near-infrared spectrum in applications such as face recogni- tion and scene categorization, this paper presents a multi-spectral dataset and applies it in saliency detection. Experiments demonstrate that the incorporation of near-infrared band is effective in the sal- iency detection procedure. We also test the combinational models for integrating visible and near-infra- red bands. Results show that there is no single model to effect on every saliency detection method. Models should be selected according to the specific employed method. Ó 2013 Elsevier Inc. All rights reserved. 1. Introduction Saliency detection has been a promising topic recently [1–4]. The goal of saliency detection is to extract salient areas from an in- put image and present the result as a gray scale image. The whiter the pixel seems, the more possible it might be salient. Since the de- tected saliency map can be utilized in various applications, such as recognition [5], segmentation [6], and tracking [7], research to- wards this subject has attracted much attention [8–10]. Generally, methods for saliency detection can be categorized into local based and global based schemes [11]. Local based meth- ods calculate a region’s saliency according to the contrast to a small neighborhood [12–14]. Global based methods evaluate saliency with respect to the whole image’s statistical characteristic [15,16]. Whatever the case is, saliency detection is mostly con- ducted on natural images taken by ordinary cameras. These cam- eras can respond to wavelengths from about 390 to 700 nm, which is called the visible spectrum [17]. The obtained images are regular RGB images. As for the electromagnetic spectrums be- yond this scope, their information is lost during the imaging pro- cess. However, the lost spectrums might be also valuable for vision tasks because the more supporting information we have, the more rationale decisions will be made. This judgment is not only the common sense for humans, but also proved by other applications in computer vision field. For example, after the prop- osition of SIFT descriptor [18] on gray scale images, CSIFT [19,20] was developed to incorporate the color bands into the descriptor. Then not long ago, MSIFT [21] was presented to include the near-infrared band for a richer descriptor. As for the face recogni- tion research, early work primarily focus on the gray or RGB images. Later, other light bands besides the visible spectrum [22] are involved to eliminate the lighting problem. The same is true for boundary detection [23] and tracking [24] that incorporating more clues will improve the performance. In remote sensing, the spectrum is not limited to one or several bands, but up to a level of tens and hundreds [25–27]. Considering the success of including other light bands besides the visible light in many applications, we construct a multi-spec- tral dataset containing both near-infrared (NIR) and regular RGB images in this work. Several dataset containing NIR images have been presented before. For example, the PolyU-NIRFD dataset [22] for face recognition, the NIR–RGB dataset [21] for scene cate- gorization. But these datasets are designed for specific purpose. They cannot be readily utilized for saliency detection. To this aim, the presented dataset is constructed in the hope of providing a new platform for saliency research. The rest of this paper is organized as follows. Section 2 presents the proposed multi-spectral dataset. Section 3 introduces the dis- tinguishable properties of near-infrared band. Section 4 applies the presented dataset in saliency detection. Finally, conclusion is made in Section 5. 2. Multi-spectral dataset Since more clues are prone to provide richer information, we hope that a camera can capture the NIR and RGB spectrums simul- taneously. However, most existing datasets contain images cap- tured from only RGB bands. We cannot get the information of 1077-3142/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.cviu.2013.07.002 Corresponding author. E-mail addresses: [email protected] (Q. Wang), [email protected] (G. Zhu), [email protected] (Y. Yuan). Computer Vision and Image Understanding 117 (2013) 1748–1754 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu
Transcript
Page 1: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

Computer Vision and Image Understanding 117 (2013) 1748–1754

Contents lists available at ScienceDirect

Computer Vision and Image Understanding

journal homepage: www.elsevier .com/ locate /cviu

Multi-spectral dataset and its application in saliency detection

1077-3142/$ - see front matter � 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.cviu.2013.07.002

⇑ Corresponding author.E-mail addresses: [email protected] (Q. Wang), [email protected] (G. Zhu),

[email protected] (Y. Yuan).

Qi Wang, Guokang Zhu, Yuan Yuan ⇑Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics,Chinese Academy of Sciences, Xi’an 710119, Shaanxi, PR China

a r t i c l e i n f o

Article history:Received 14 July 2012Accepted 3 July 2013Available online 26 July 2013

Keywords:Multi-spectralNear-infraredSaliencyRegression model

a b s t r a c t

Saliency detection has been researched a lot in recent years. Traditional methods are mostly conductedand evaluated on conventional RGB images. Few work has considered the incorporation of multi-spectralclues. Considering the success of including near-infrared spectrum in applications such as face recogni-tion and scene categorization, this paper presents a multi-spectral dataset and applies it in saliencydetection. Experiments demonstrate that the incorporation of near-infrared band is effective in the sal-iency detection procedure. We also test the combinational models for integrating visible and near-infra-red bands. Results show that there is no single model to effect on every saliency detection method.Models should be selected according to the specific employed method.

� 2013 Elsevier Inc. All rights reserved.

1. Introduction

Saliency detection has been a promising topic recently [1–4].The goal of saliency detection is to extract salient areas from an in-put image and present the result as a gray scale image. The whiterthe pixel seems, the more possible it might be salient. Since the de-tected saliency map can be utilized in various applications, such asrecognition [5], segmentation [6], and tracking [7], research to-wards this subject has attracted much attention [8–10].

Generally, methods for saliency detection can be categorizedinto local based and global based schemes [11]. Local based meth-ods calculate a region’s saliency according to the contrast to a smallneighborhood [12–14]. Global based methods evaluate saliencywith respect to the whole image’s statistical characteristic[15,16]. Whatever the case is, saliency detection is mostly con-ducted on natural images taken by ordinary cameras. These cam-eras can respond to wavelengths from about 390 to 700 nm,which is called the visible spectrum [17]. The obtained imagesare regular RGB images. As for the electromagnetic spectrums be-yond this scope, their information is lost during the imaging pro-cess. However, the lost spectrums might be also valuable forvision tasks because the more supporting information we have,the more rationale decisions will be made. This judgment is notonly the common sense for humans, but also proved by otherapplications in computer vision field. For example, after the prop-osition of SIFT descriptor [18] on gray scale images, CSIFT [19,20]was developed to incorporate the color bands into the descriptor.

Then not long ago, MSIFT [21] was presented to include thenear-infrared band for a richer descriptor. As for the face recogni-tion research, early work primarily focus on the gray or RGBimages. Later, other light bands besides the visible spectrum [22]are involved to eliminate the lighting problem. The same is truefor boundary detection [23] and tracking [24] that incorporatingmore clues will improve the performance. In remote sensing, thespectrum is not limited to one or several bands, but up to a levelof tens and hundreds [25–27].

Considering the success of including other light bands besidesthe visible light in many applications, we construct a multi-spec-tral dataset containing both near-infrared (NIR) and regular RGBimages in this work. Several dataset containing NIR images havebeen presented before. For example, the PolyU-NIRFD dataset[22] for face recognition, the NIR–RGB dataset [21] for scene cate-gorization. But these datasets are designed for specific purpose.They cannot be readily utilized for saliency detection. To thisaim, the presented dataset is constructed in the hope of providinga new platform for saliency research.

The rest of this paper is organized as follows. Section 2 presentsthe proposed multi-spectral dataset. Section 3 introduces the dis-tinguishable properties of near-infrared band. Section 4 appliesthe presented dataset in saliency detection. Finally, conclusion ismade in Section 5.

2. Multi-spectral dataset

Since more clues are prone to provide richer information, wehope that a camera can capture the NIR and RGB spectrums simul-taneously. However, most existing datasets contain images cap-tured from only RGB bands. We cannot get the information of

Page 2: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754 1749

the four bands at the same time. Though the NIR–RGB dataset [21]has images of both bands, each pair of them are taken consecu-tively with two cameras. This makes the contents of image pairsnot the same. When these images are employed, they have to beaccurately registered. But the obtained results are still not satisfy-ing because some objects exist in one image but not in the other.Considering this problem, we employ a multi-spectral camera tosimultaneously capture the images of the four bands.

The camera we employed is a prism based 2-CCD progressivearea scan one, the configuration of which is shown in Fig. 1(a).We can see clearly that the prisms in the camera spit the inputlight into two channels. One is 400–700 nm visible spectrums ofred, green and blue, and the other is 700–1000 nm NIR spectrums.This separation is accurately ensured by the dichroic coatings ofthe prisms. The splitted spectrums are then responded by two dis-tinct CCDs, each of which is sensitive to a range of wavelengths.Their response curves are shown in Fig. 1(b) and (c). The advantageof this camera is that it can capture two images at the same timeand the obtained image pair are with the same scope and content.

With this camera, we took 40 pairs of 512 � 384 images ofindoor and outdoor scenes and each image contains one or sev-eral salient objects within it. Then the salient objects in eachpair were labeled by 5 graduate students major in computer vi-sion. In this procedure, few instructions were given to the partic-ipants except segmenting the salient ones they thought as. Thisensures the minimum amount of influence on the participants’labelings due to the unnecessary instructions. Since every

(a)

(b)Fig. 1. (a) The mechanism of the camera. (b) The CCD response curve to the NIR spectrumfrom [28].

individual’s perception is different, their labelings differ witheach other. To get an unbiased ground truth, we select the com-mon areas of each participant’s labeling as the final results. Typ-ical examples are shown in Fig. 2.

3. NIR spectrum

The NIR spectrum is between the visible light band and thethermal infrared band. It has the properties of both visible lightand thermal infrared light, but is different from any of them.Firstly, unlike thermal infrared, objects can reflect the NIR lightthe same way as they do to visible light. Secondly, it is invisibleto human eyes like the thermal infrared and reflects an ‘‘unseen’’characteristic different from visible light.

In order to know the relationship and difference between theRGB and NIR spectrums, we plot their pairwise cooccurrence distri-butions on the 40 image pairs in the 2D plane. All the RGB and NIRvalues are normalized to [0,1] and different occurrence frequenciesare denoted by different colors. From Fig. 3, it is obvious that thedistributions of RG, RB and GB are different from those of RN, GNand BN. The latter ones spread more widely in the 2D plane. Thisimplies that the original visible light of red, green and blue aremuch higher correlated in a pairwise manner than the NIR withthem. The NIR spectrum can provide more different informationthan the visible spectrum.

To justify this point, we calculate the joint entropy [21] of eachtwo bands as

(c). (c) The CCD response curve to the visible light spectrum. These two figures are cited

Page 3: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

Fig. 2. Example images in the presented dataset. First row: RGB images; Second row: NIR images; Third row: ground truth labelings of salient objects.

0

0.2

0.4

0.6

0.8

1

1

R

1

0 G 0 1

R

1 1

0 1

1

1

0 0 1

1 1

0 1

B

G

B

R

N

G

N

B

N

(c)(b)(a)

(f)(e)(d)Fig. 3. The cooccurrence distribution of the pairwise bands. This statistic is obtained from the 40 image pairs in the presented dataset.

1750 Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754

HðX;YÞ ¼X

i;j

� pðxi; yjÞlog2pðxi; yjÞ; ð1Þ

where X and Y are the examined spectrums, xi and yj are the pixelvalues of the corresponding spectrum image, and p(xi,yj) is theprobability density. According to the information theory, the entro-py is a measure of unpredictability and reflects the information con-tent. The higher H(X,Y) is, the more information is contained in amessage. The calculated joint entropy is shown in Table 1. Fromthe table, we can see that the joint entropies with the NIR bandare generally higher than those without it. This result demonstrates

Table 1Joint entropy of pairwise bands.

Pairwise bands RG RB GB RN RN BN

Joint entropy 13.488 13.967 13.688 14.361 14.435 14.456

that the NIR band can provide much abundant information. Tryingto utilize it in applications is reasonable.

4. Saliency detection

To demonstrate the effectiveness of the presented dataset, weconduct experiments in the application of saliency detection. Sal-iency maps are firstly extracted from RGB images and NIR images.Then the obtained maps are combined together to get the final re-sults. The purpose of these experiments is to answer the two fol-lowing questions: 1) whether or not the incorporation of NIRband can improve the saliency detection performance; 2) whichkind of models is the best to combine the saliency maps fromthe two channels.

To answer the first question, we compare the results generatedwith only RGB band and the results with both RGB and NIR bands.Algorithms employed in this process are all canonical ones in

Page 4: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754 1751

saliency detection fields. They are AC [29], CA [13], FT [16], HC [11],IT [12], LC [30], MSS [31], RC [11], SR [32] and SUN [33]. To answerthe second question, it is more difficult because a traversal test ofcomparative models is impossible. Considering the initial successof [34], we concentrate on the regression models in this work.

4.1. Evaluation measure

To compare experimental results, an evaluation measure shouldbe firstly specified. In saliency detection field, three metrics areusually employed: precision, recall, and F-measure. They are definedas follows

precision ¼ TPTP þ FP

; recall ¼ TPTP þ FN

;

F-measure ¼ precision� recallð1� aÞ � precisionþ a� recall

;

ð2Þ

where TP is true positive, FP is false positive, and FN is false nega-tive. These three metrics are usually utilized in information retrie-val community and each of them reflects a different aspect.Precision represents the accuracy, recall represents the detectabil-ity, and F-measure is a balance between them. When precisionand recall contradict with each other, F-measure is usually em-ployed to represent a compromised measurement [35].

4.2. Regression models

In our processing, the aim is to infer each pixel’s saliency va-lue according to its obtained saliency maps from the RGB andNIR bands. Suppose the RGB and NIR saliency are denoted as Xrgb

and Xnir, respectively. The question is how to determine themapping function f: (Xrgb,Xnir) ? Y, where Y is the desired sal-iency value. Three commonly used regression models are em-ployed here. They are linear regression, polynomial regressionand logistic regression.

4.2.1. Linear regression.In linear regression [36], the output variable is a linear combi-

nation of input variables. To be specific, the model can be ex-pressed as

Y ¼ a0 þ a1Xrgb þ a2Xnir: ð3Þ

The task is to estimate {ai}i=0,1,2 from the observed N training pointsfXn

rgb;Xnnir; Y

ngn¼1;...;N

. A special case of linear regression is that theconstant coefficient a0 equals 0. In this case, the output is onlythe proportional combination of Xrgb and Xnir, with no translation.The two models are abbreviated as LinearR-I and LinearR-II in laterdiscussion.

4.2.2. Polynomial regressionIn polynomial regression, the independent variables include not

only linear terms, but also quadratic and interactive terms. Themodel is expressed as

Y ¼ a0 þ a1Xrgb þ a2Xnir þ a3XrgbXnir þ a4X2rgb þ a5X2

nir: ð4Þ

The processing is the same as linear regression. According to theknown input and output pairs, get an estimation of {ai}i=0,. . .,5 thatbest fit the training data. This model is denoted as PolyR to facilitatethe representation.

4.2.3. Logistic regressionFor logistic regression [37], the model is defined as

f ðXÞ ¼ eX

eX þ 1¼ 1

1þ e�X; ð5Þ

where X represents some set of independent variables, which in thiswork is defined as

X ¼ a0 þ a1Xrgb þ a2Xnir: ð6Þ

The model reflects the nonlinear relationship between the inputand output variables, especially emphasizing an approximately lin-ear mapping in the mid-range of input variables and stretching outthe extremes exponentially. It is denoted as LogisticR.

4.3. Experiments

In this section, intensive comparative experiments are con-ducted to answer the previously posed two questions. 10 imagesare selected for training the parameters of the regression modelsand the other 30 for testing. Since each image contains512 � 318 pixels, involving them all from the 10 training imageswill lead to a great number of training samples. Besides, thereare many repeated triplets fXn

rgb;Xnnir ;Y

ng with the same values ifall the pixels are employed, which will lead to a biased model.Therefore, we first resize the training images to 1/4 times of theoriginal size. Then all the pixels (128 � 80 � 10 = 102,400) are uti-lized as training samples. After all the parameters are learned fromthe training stage, the remaining 30 images are processed by theacquired models. Typical examples of the regression results areshown in Fig. 4. To quantitatively evaluate the performance of eachregression model, we plot the averaged precision, recall and F-measure bars of the original methods and their improvements (inpercentage) of the regression models in Fig. 5. Detailed analysis ispresented below.

4.3.1. Whether or not the incorporation of NIR band can improve thesaliency detection performance?

From Fig. 5, we can see that the incorporation of NIR bandmakes the three metrics change greatly. Sometimes, they changetowards the desired direction. Other times, the inclusion of NIRband makes the indexes drop unsatisfyingly. There are still casesthat the precision and recall change oppositely or their improveddegrees are different. This leads to a difficult comparison of theregression models. According to [35], F-measure should be referredto in this case because it is a compromise of precision and recall.From Fig. 5 it is evident that for each saliency detection method,there is at least one regression model that can improve the F-mea-sure of the detected results. This means the added NIR band iseffective when employing the appropriate model. The maximumimprovement is approximately 40% and the minimum is about5% in our experiments.

4.3.2. Which kind of models is the best to combine the saliency mapsfrom the two channels?

For each saliency detection method, its corresponding regres-sion model performs differently. Some models improve the results,while others may make the results worse. But every method has atleast one regression model that can make the original results bet-ter. The best regression model for each method is labeled by anpurple rectangle. We can see clearly that not every method hasan identical best regression model. For CA, FT, HC, LC, RC andSUN, their best performance are achieved on LinearR-II model.For AC and SR, LinearR-I fits them more properly. And for IT andMSS, PolyR is the most appropriate one.

Since each saliency detection method generates a different re-sult, which represents a distinguishable data distribution and par-adigm, their best combinational manner might differ with eachother. Six methods achieve their excellence on LinearR-II model.This means the LinearR-II is a more generalized and suitable modelfor saliency detection. No methods perform best on LogisticR mod-el. This means that the model is not fit for these saliency detection

Page 5: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

Original LinearR-I LinearR-II LogisticRPolyRRGB NIR

Fig. 4. Typical examples of experimental results. The first column to the seventh column respectively represent the RGB image, NIR image, the results using only the originalRGB band, the results of combing RGB and NIR bands by LinearR-I, LinearR-II, PolyR and LogisticR models.

1752 Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754

techniques (e.g. several results are all black in the last column ofFig. 4).

4.4. Discussion

In the above Section, experimental results are presented andanalyzed. There are still several issues remaining to be discussedin this part.

The first one is about model selection. In this work, we only fo-cus on the conventional regression models to describe the combi-national relationship of RGB and NIR bands. However, since thereare still other models for the combination of the two bands andwe have not tested them all, we do not claim the regression modelsare the most perfect ones. On the contrary, we think models shouldbe selected according to the specific saliency detection method.This is reasonable to understand because models are built on the

actual data. Different methods generate dissimilar saliency maps,which represent distinct data paradigms. Consequently, there isno reason to assume an universal model to all methods.

The second one is about the performance of the employed sal-iency detection methods. From Fig. 4 and Fig. 5, we can see thatthe employed canonical methods do not perform well enough asreported in [11]. This is because the presented dataset is a littlemore complex than the dataset utilized in [11]. The backgroundsusually contain several disturbing objects of different colors. Thesalient objects are not with the distinguishable fresh color, either.This phenomenon also reveals that the current state-of-the-artmethods heavily rely on the color contrast and its robustness de-serves to be improved.

The third one is the size of the presented dataset. 40 images arenot a large number. This is because the platform for capturingimages is not convenient to move around. We only set the environ-

Page 6: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

0

0.5

1O

rigi

nal

−50%

0

50%

Lin

earR

−I

−50%

0

50%

Lin

earR

−I

I

−50%

0

50%

Poly

R

AC CA FT HC IT LC MSS RC SR SUN−50%

0

50%

Log

istic

RprecisionrecallF−measure

Fig. 5. Quantitative evaluation of different regression models by precision, recall and F-measure. Their results are represented by the improved percentages compared withthe original methods using only RGB band. The first row illustrates the original saliency detection results using common RGB images. The second to the fifth rows are theresults by combing RGB and NIR bands using LinearR-I, LinearR-II, PolyR and LogisiticR regression models. The results show that for each saliency detection method, (1) thereis at least one regression model that can improve the F-measure of the detected results; and (2) its best combinational manner of RGB and NIR bands might differ with othermethods.

Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754 1753

ment within and not far from the laboratory. Now we are workingon transplanting the desktop capturing system to a laptop andtaking more images to form a larger dataset. On the other hand,the average F-measure can be improved with a maximum of 40%,which is not a small improvement. We think it is not a coincidenceand presents a promising result.

5. Conclusion

In this work, a multi-spectral dataset is presented to serve as anew platform for saliency research. Different from existing ones,our dataset contains pairs of RGB and NIR images. This can providemore valuable information for detecting the salient areas in an im-age. Experiments demonstrate the effectiveness of the incorpora-tion of NIR band in saliency detection. We also test severalregression models for combining the RGB and NIR bands. Resultsshow that it is not appropriate to employ one single model as pro-totype. The best model should be selected according to the specificmethod. Future work plans to transplant the image capturing sys-tem to a laptop and take more images into the dataset.

Acknowledgment

This work is supported by the State Key Program of NationalNatural Science of China (Grant No. 61232010), the National Natu-ral Science Foundation of China (Grant No. 61172143 and61105012), and the Natural Science Foundation Research Projectof Shaanxi Province (Grant No. 2012JM8024).

References

[1] T. Jost, N. Ouerhani, R. von Wartburg, R. Müri, H. Hügli, Assessing thecontribution of color in visual attention, Comput. Vis. Image Understand. 100(2005) 107–123.

[2] D. Walther, U. Rutishauser, C. Koch, P. Perona, Selective visual attentionenables learning and recognition of multiple objects in cluttered scenes,Comput. Vis. Image Understand. 100 (2005) 41–63.

[3] I. Bogdanova, A. Bur, H. Hügli, P.-A. Farine, Dynamic visual attention on thesphere, Comput. Vis. Image Understand. 114 (2010) 100–110.

[4] Q. Wang, Y. Yuan, P. Yan, X. Li, Saliency detection by multiple-instancelearning, IEEE Trans. Cybernetics 43 (2013) 660–672.

[5] D. Walther, L. Itti, M. Riesenhuber, T. Poggio, C. Koch, Attentional selection forobject recognition – a gentle way, Biol. Motivated Comput. Vis. 2525 (2002)251–267.

[6] J. Han, K. Ngan, M. Li, H. Zhang, Unsupervised extraction of visual attentionobjects in color images, IEEE Trans. Circ. Syst. Video Technol. 16 (2006) 141–145.

[7] V. Mahadevan, N. Vasconcelos, Saliency-based discriminant tracking, in: IEEEComputer Society Conference on Computer Vision and Pattern Recognition,2009, pp. 1007–1013.

[8] R. Perko, A. Leonardis, A framework for visual-context-aware object detectionin still images, Comput. Vis. Image Understand. 114 (2010) 700–711.

[9] Y. Sun, R.B. Fisher, F. Wang, H.M. Gomes, A computer vision model for visual-object-based attention and eye movements, Comput. Vis. Image Understand.112 (2008) 126–142.

[10] T.E. de Campos, G. Csurka, F. Perronnin, Images as sets of locally weightedfeatures, Comput. Vis. Image Understand. 116 (2012) 68–85.

[11] M.M. Cheng, G.X. Zhang, N.J. Mitra, X. Huang, S.M. Hu, Global contrast basedsalient region detection, in: IEEE Computer Society Conference on ComputerVision and Pattern Recognition, 2011, pp. 409–416.

[12] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapidscene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 1254–1259.

[13] S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, in:IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2010, pp. 2376–2383.

[14] T. Liu, J. Sun, N.N. Zheng, X. Tang, H.Y. Shum, Learning to detect a salient object,in: IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2007, pp. 1–8.

[15] Y. Zhai, M. Shah, Visual attention detection in video sequences usingspatiotemporal cues, in: ACM Multimedia, 2006, p. 815–824.

[16] R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient regiondetection, in: IEEE Computer Society Conference on Computer Vision andPattern Recognition, 2009, pp. 1597–1604.

[17] H.W. Siesler, Y. Ozaki, S. Kawata, H.M. Heise, Near-Infrared Spectroscopy:Principles, Instruments, Applications, John Wiley & Sons, 2001.

[18] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.Comput. Vis. 60 (2004) 91–110.

[19] A.E. Abdel-Hakim, A.A. Farag, CSIFT: a SIFT descriptor with color invariantcharacteristics, in: IEEE Computer Society Conference on Computer Vision andPattern Recognition, 2006, pp. 1978–1983.

[20] G.J. Burghouts, J.-M. Geusebroek, Performance evaluation of local colourinvariants, Comput. Vis. Image Understand. 113 (2009) 48–62.

Page 7: Computer Vision and Image Understanding - GitHub …crabwq.github.io/pdf/2013 Multi-spectral dataset and its... · Pairwise bands RG RB GB RN RN BN Joint entropy 13.488 13.967 13.688

1754 Q. Wang et al. / Computer Vision and Image Understanding 117 (2013) 1748–1754

[21] M. Brown, S. Süsstrunk, Multi-spectral SIFT for scene category recognition, in:IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2011, pp. 177–184.

[22] B. Zhang, L. Zhang, D. Zhang, L. Shen, Directional binary code with applicationto polyU near-infrared face database, Pattern Recogn. Lett. 31 (2010) 2337–2344.

[23] Q. Wang, S. Li, Database of human segmented images and its application inboundary detection, IET Image Process. 6 (2012) 222–229.

[24] A. Leykin, Y. Ran, R.I. Hammoud, Thermal-visible video fusion for movingtarget tracking and pedestrian classification, in: IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, 2007.

[25] L. Zhang, L. Zhang, D. Tao, X. Huang, Tensor discriminative locality alignmentfor hyperspectral image spectralcspatial feature extraction, IEEE Trans. Geosci.Remote Sensing (2012) 1–15.

[26] Y. Wen, Y. Gao, S. Liu, Q. Cheng, R. Ji, Hyperspectral image classification withhypergraph modelling, in: ICIMCS, 2012, pp. 34–37.

[27] Y. Gu, C. Wang, D. You, Y. Zhang, S. Wang, Y. Zhang, Representative multiplekernel learning for classification in hyperspectral imagery, IEEE Trans. Geosci.Remote Sensing 50 (2012) 2852–2865.

[28] Multi-spectral camera, User’s Manual, Ver.1.0, JAI Ltd. (October 2009).

[29] R. Achanta, F. Estrada, P. Wils, S. Süsstrunk, Salient region detection andsegmentation, Comput. Vis. Syst. 5008 (2008) 66–75.

[30] Y. Zhai, M. Shah, Visual attention detection in video sequences usingspatiotemporal cues, in: ACM Multimedia, 2006, pp. 815–824.

[31] R. Achanta, S. Süsstrunk, Saliency detection using maximum symmetricsurround, in: International Conference on Image Processing, pp. 2653–2656.

[32] X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: IEEEComputer Society Conference on Computer Vision and Pattern Recognition,2007, pp. 1–8.

[33] L. Zhang, M.H. Tong, T.K. Marks, H. Shan, G.W. Cottrell, SUN: a Bayesianframework for saliency using natural statistics, J. Vision 8 (2008) 1–20.

[34] Q. Wang, P. Yan, Y. Yuan, X. Li, Multi-spectral saliency detection, PatternRecogn. Lett. 34 (2012) 34–41.

[35] D.R. Martin, C. Fowlkes, J. Malik, Learning to detect natural image boundariesusing local brightness, color, and texture cues, IEEE Trans. Pattern Anal. Mach.Intell. 26 (2004) 530–549.

[36] S. Chatterjee, A.S. Hadi, Influential observations, high leverage points, andoutliers in linear regression, Stat. Sci. 1 (1986) 379–393.

[37] G.A.F. Seber, C.J. Wild, Nonlinear Regression, Wiley-Interscience, Hoboken, NJ,2003.


Recommended