Fused intra-bimodal face verification approach based on Scale-Invariant Feature Transform and a...

Pattern Recognition Letters 36 (2014) 254–260

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Fused intra-bimodal face verification approach based on Scale-InvariantFeature Transform and a vocabulary tree

0167-8655/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.patrec.2013.08.016

⇑ Corresponding author.E-mail addresses: [email protected] (C.M. Travieso), [email protected]

(M.d. Pozo-Baños), [email protected] (J.B. Alonso).

Carlos M. Travieso ⇑, Marcos del Pozo-Baños, Jesús B. AlonsoSignals and Communications Department, Institute for Technological Development and Innovation in Communications (IDeTIC), University of Las Palmas de Gran Canaria, Las Palmasde Gran, Canaria, Spain

a r t i c l e i n f o

Article history:Available online 30 August 2013

Communicated by Luis Gomez Deniz

Keywords:Bimodal interactionInformation fusionVisible and thermal face verificationFace detectionSIFT parametersVocabulary tree

a b s t r a c t

This work studies the intra-bimodal face-based biometric fusion approach composed of the thermal andspatial domains. The distinctive feature of this work is the use of a single camera with two sensors whichreturns a unique image with thermal and visual images at a time, as opposed to the-state-of-the-art, forexample the multibiometric modalities and hyperspectral images. The proposed system represents apractical bimodal approach for real applications. It is composed by a verification architecture based onthe Scale-Invariant Feature Transform algorithm (SIFT) with a vocabulary tree, providing a scheme thatscales efficiently to a large number of features. The image database consists of front-view thermal andvisual image as a single image, which contain facial temperature distributions of 41 different individualsin 2-dimensional format and 18 images per subject, acquired on three different-day sessions. Resultsshowed that visible images gives better accuracy than thermal information, and with independency ofrange, head images give the most discriminative information. Besides, fusion approaches reached betteraccuracy, up to 99.45% for score fusion and 100% for decision fusion. This shows the independency ofinformation between visual and thermal images and the robustness of bimodal interaction.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

The usage of different biometric systems on security applica-tions has become more and more common nowadays. The reasonis a series of advantages versus other methods such as carryingmagnetic cards or reminding passports or PIN numbers, whichcan be forgotten or used by non authorized persons. Identificationsystems based on human body measures are well accepted andperceived naturally by both men and women. Therefore, biometricidentification methods are achieving outstanding results andtruthfulness in the security market.

Human recognition through distinctive facial features sup-ported by an image database is still been studied. Note that thisproblem presents various difficulties. What will occur if the indi-vidual’s haircut is changed? Is make-up a determining factor inthe process of verification? Would it distort the facial featuressignificantly?

The usage of thermal cameras, originally conceived for militarypurposes, has expanded to other fields of application such as con-trol process in production lines, detection/monitoring of fire andeven security and anti-terrorism applications. Therefore, its usein human identification tasks, in scenarios where the lack of light

restricts the operation of conventional cameras, can also be consid-ered. Thermal cameras can also be a great tool against look varia-tions, which in some cases could be quite extreme. Different looksof the main role from the film The Saint are shown in Fig. 1. ValKilmer modifies his look in this film spectacularly in order to notto be recognized by the enemy.

A correct matching between the test face and that stored in theimage database is expected, and this is a hard task to solve even ifnatural distortion effects such as illumination changes or interfer-ence are not considered. The recognition problem should be split inthree stages, that is, acquisition of facial images for testing, fea-tures extraction from specific facial regions and finally, verificationof the individual’s identity (Soon-Won et al., 2007).

Currently, computational face analysis is a very lively researchfield, in which new interesting possibilities are being studied. Forexample, there are approaches aiming to improve a system’s per-formance when working with low resolution images (LR) anddecreasing computational load.

In Huang and He (2011), a facial recognition system was pre-sented, which works with LR images using nonlinear mappingsto infer coherent features that favor higher accuracy of the nearestneighbor (NN) classifiers for the recognition of a single LR faceimage. It is also interesting to cite the approach of Imtiaz and Fat-tah (2011), in which a multi-resolution feature extraction algo-rithm for face recognition, based on two-dimensional discrete

http://crossmark.crossref.org/dialog/?doi=10.1016/j.patrec.2013.08.016&domain=pdf

http://dx.doi.org/10.1016/j.patrec.2013.08.016

mailto:[email protected]



http://dx.doi.org/10.1016/j.patrec.2013.08.016

http://www.sciencedirect.com/science/journal/01678655

http://www.elsevier.com/locate/patrec

Fig. 1. Facial changes of the character played by Val Kilmer in the film The Saint.

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260 255

wavelet transform (2D-DWT), was proposed. Such method exploitslocal spatial variations in a face image effectively, obtaining out-standing results with 2 different databases.

Images from subjects are often taken in different poses or withdifferent modalities, such as thermographic images, presenting dif-ferent stages of difficulty in their identification.

In Socolinsky and Selinger (2004), results on the use of thermalinfrared and visible imagery for face recognition in operational sce-narios were presented. These results showed that thermal face rec-ognition performance is stable over multiple sessions in outdoorscenarios, and that fusion of modalities increases performance.

In the same year 2004, L. Jiang proposed in Jiang et al. (2004) anautomated thermal imaging system that is able to discriminatefrontal from non-frontal face views with the assumption that atany time, there is only 1 person in the field of view of the cameraand no other heat-emitting objects are present. In this approach,the distance from centroid (DFC) shows its suitability for compar-ing the degree of symmetry of the lower face outline.

The use of correlation filters in Heo et al. (2005) showed its ade-quacy for face recognition tasks using thermal infrared (IR) faceimages due to the invariance of this type of images to visible illu-mination variations. The results with Minimum Average Correla-tion Energy (MACE) filters and Optimum Trade-off SyntheticDiscriminant Function (OTSDF) in LR images (20 � 20 pixels) provetheir efficiency in Human Identification at a Distance (HID).

Scale Invariant Feature Transform (SIFT) algorithm Lowe, 1999is widely used in object recognition. In Soyel and Demirel (2011),SIFT appeared as a suitable method to enhance the recognition offacial expressions under varying poses over 2D images. The usageof affine transformation consistency between two faces to discardSIFT mismatches has been demonstrated.

Gender recognition is another lively research field working withSIFT algorithm. In Jian-Gang et al. (2010), faces were represented interms of dense-Scale Invariant Feature Transform (d-SIFT) andshape. Instead of extracting descriptors around interest pointsonly, local feature descriptors were extracted at regular image gridpoints, allowing dense descriptions of face images.

However, SIFT usually generates a large number of featuresfrom an image. This huge computational effort associated with fea-ture matching limits its application to face recognition. To over-come this problem, Majumdar and Ward (2009) proposed theusage of a discriminating method. Computational complexity wasreduced more than 4 times and accuracy increased in 1.00% onaverage by checking irrelevant features.

Another interesting idea is the building method, which is wellscaled with the size of a database and allow finding one elementof a large number of objects in acceptable time. This work is in-spired by Nister and Stewenius (2006), where object recognitionby a k-means vocabulary tree was presented. Efficiency was provedby a live demonstration that recognized CD-covers from a databaseof 40,000 images. The vocabulary tree showed good results when alarge number of distinctive descriptors form a large vocabulary.Many different approaches to this solution have been developedin the last few years (Ober et al., 2007) and (Slobodan, 2008),showing its competency organizing several objects. Based on this

good results, this solution will be tested in this paper with SIFTdescriptors in a vocabulary tree.

In addition, references using two different images in differentranges; visible and infrared-thermal, can be found in the state-of-the-art. In Buyssens and Revenu (2010), the authors used aPCA and Sparce analysis before applying a fusion module. Thisreaches mean recognition rates between 95% and 99% after the fu-sion process for 63 users. In Bhowmik et al. (2012), the systemfuses the thermal and visible images; captured by two individualsensors, in a unique image. Using 70% of the visual image and30% of the thermal image and classifying with a SVM, the systemreaches up to 97.28% on an identification approach.

Other lines of research, multimodal biometric systems havebeen focused. For example, Almohammad et al. (2012) was basedon the fusion of face and gait biometrics, Tong et al. (2010) wasbased on face and fingerprint biometric fusion, Javadtalab et al.(2011) on the fusion of face and ear recognition, and Raghavendra(2012) on the feature level fusion of face and palmprint biometrics.This kind of multimodal approaches usually need longer times oruncomfortable devices from the user and application points ofview.

Finally, different references based on Multispectral Face Recog-nition and Multimodal Score Fusion can also be found. Zheng andElmaghraby (2011) and Bourlai et al. (2012) are two examples,where authors used different sensors and cameras; or one camerafor a certain range by bandpass filters or for broadband.

In this context, the aim of the present work is to propose, inno-vate and evaluate on the field of bimodal face biometrics, for visi-ble and thermal ranges. The proposed method could be used in areal application with the convenience of a unique device for fasttracking and the advantages of the fusion of bimodal information.All this gives it an added value versus the state-of-the-art. In addi-tion, a study to search the main source of information is also in-cluded here. In particular, the system applies the SIFT algorithmand obtains local distinctive descriptors from each image basedon Crespo et al. (2012). The construction of the vocabulary tree en-ables to have these descriptors hierarchically organized and readyto carry out a search to find a specific object.

For each test image, only its new descriptors are calculated andused to search through the hierarchical tree in order to build a votematrix, in which the most similar image of the database can be eas-ily identified. This approach mixes the singularity of the SIFTdescriptors to perform reliable matching between different viewsof a visual and thermal face, and the efficiency of the vocabularytree for building a high discriminative vocabulary. A more detaileddescription of the system is provided in the next subsections.

This paper is organized as follows. The proposed approach ispresented in Section 2. In Section 3, the experimental settings, re-sults and discussions are shown. Finally, conclusions are given inSection 4.

2. Approach proposed

The innovation upon this work focuses in the implementationof fused bimodal face verification approach implemented by a un-ique device, which gives an image from two sensors for visible andthermal ranges respectively. Finally, a bimodal verification ap-proach is implemented using SIFT descriptors as feature extractionand a vocabulary tree through the use of the k-means function as aclassification system. The score and decision fusions for bothranges have been applied. Besides, localized discriminative infor-mation has been searched between ranges and its fusions, and be-tween regions of interest (head versus face). This work is a novelstudy and opens a door for an application on real conditions.

In this section, the whole approach and each of its parts will beexplained.

256 C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

2.1. General description

The proposed approach is composed by five stages: preprocess-ing module, SIFT descriptors calculator, vocabulary tree construc-tion, matching module and fusion module.

While face segmentation is executed manually, the matchingmodule searches in the vocabulary tree the best correspondencebetween the test descriptors and those of the database. Therefore,the forthcoming explanation is first focused on the SIFT parametersand tree classification, and a brief description of the matchingmodule is given afterwards.

A block diagram of the system is shown in Fig. 2.

2.2. Pre-processing

In this step, natural (visible range) and thermal images are ex-tracted and isolated from the unique image supplied by the deviceused. Then, the system detects human faces from the natural im-age. Though many face detectors are available in the literature, afrontal face detector, similar to the Viola-Jones cascade detector(Viola and Jones, 2004), has been used here due to its simplicity,speed and effectiveness.

2.3. Feature extraction: scale-invariant feature transform

The use of SIFT descriptors is applied in the majority of the re-sults achieved by D. Lowe in Lowe (2004) as a guideline, and onlydeterminant parameters are modified in order to adapt the algo-rithm to the system. Keypoints are detected using a cascade filter-ing, searching for stable features across all possible scales. Thescale space of an image, L(x,y,r) is produced from the convolutionof a variable-scale Gaussian, G(x,y,r) with an input image, I(x,y);

Lðx; y;rÞ ¼ Gðx; y;rÞ � Iðx; yÞ ð1Þ

where ⁄ is the convolution operation in x and y, and

Gðx; y;rÞ ¼ 12pr2 � e

�ðx2þy2 Þ2r2 ð2Þ

Following (Lowe, 2004), scale-space in the Difference-of-Gauss-ian function (DoG) convolved with the image, D(x,y,r) can be com-puted as a difference of two nearby scales separated by a constantfactor k:

Dðx; y;rÞ ¼ Gðx; y; krÞ � Gðx; y;rÞð Þ � Iðx; yÞ¼ Lðx; y; krÞ � Lðx; y;rÞ ð3Þ

From MMikolajczyk (2002), it is stated that the maxima andminima of the scale-normalized Laplacian of Gaussian (LGN),r2r2G produce the most stable image features in comparison withother functions, such as the gradient or Hessian. The relationshipbetween D and r2r2G is:

Gðx; y; krÞ � Gðx; y;rÞ � ðk� 1Þr2r2G ð4Þ

Imagesegmentation

SIFT desccalcula

Test Image

Vocabularconstruc

SIFT descriptors calculator

Visible and Thermal image

database(head and face)

Fig. 2. Diagram of the proposed th

The factor (k � 1) is a constant over all scales and does not influ-ence strong location. A significant difference in scales has beenchosen, k = 2�1/2, which has almost no impact on the stabilityand the initial value of r = 1.6 provides close to optimal repeatabil-ity according to Lowe (2004).

After having located accurate keypoints and removed strongedge responses of the DoG function, orientation is assigned. Thereare two important parameters for varying the complexity of thedescriptor: the number of orientations and the number of the arrayof orientation histograms. Throughout the present work, a 4 � 4 ar-ray of histograms with 8 orientations is used, resulting in charac-teristic vectors with 128 dimensions. The results in Lowe (2004)support the use of these parameters for object recognition pur-poses since larger descriptors have been found more sensitive todistortion.

2.4. Classification system: vocabulary tree

The verification scheme used in this paper is based on Nisterand Stewenius (2006). Once the SIFT descriptors are extracted fromthe image database, it is time to organize them in a vocabularytree. A hierarchically verification scheme allows to search selec-tively for a specific node in the vocabulary tree, decreasing searchtime and computational effort.

The k-means algorithm is used in the initial point cloud ofdescriptors to find centroids through the minimum distance esti-mation, so that a centroid represents a cluster of points. The k-means algorithm is applied iteratively, since the calculation ofthe centroid location can vary the associated points. The algorithmconverges if centroids location does not vary. Each tree level repre-sents a node division of the nearby superior stage.

After some experimentation, the initial number of clusters wasdefined as 10, with 5 tree levels. These values have shown good re-sults, working with the actual database.

A model of a vocabulary tree with 2 levels and 3 initial clustersis shown in Fig. 3.

2.5. Fusion module

This block uses the correlation between errors on the differentapproaches (face and head information and visible and thermalranges) in order to improve the global accuracy. The system ap-plied two different strategies for fusion based on score and deci-sion (Yang et al., 2003). For score-based fusion, this work hasimplemented the sum and product rules of thermal and visual nor-malized scores, as in Fuertes et al. (2010). For the decision-basedfusion, the rules OR (Fuertes et al., 2010) and weights (Yanget al., 2011) where implemented. OR function applies an OR logicalfunction on the decision of each classifier. The result of applyingthis function corresponds to the final decision. If the method isbased on weights, the criterion makes use of a priori information;the efficiency of each classifier. A higher accuracy and security sys-

riptors tor

Decision

y tree tion

MatchingModule

FusionModule

ermal face recognition system.

Fig. 3. Two levels of a vocabulary tree with branch factor 3.


tem is conformed when these methods of fusion are implemented(Fuertes et al., 2010).

3. Experimental settings

3.1. Databases used

Authors have built an image database in order to develop thiswork. This database contains 738 images of 704 � 756 pixels and24 bit per pixel. Images were taken using a SAT-S280 SATIR camerawhich contains two sensors; one of them is a thermal sensor andanother is a visible camera. An example of that image is observedon Fig. 4.

Such database is composed by 41 subjects, with 18 images persubject. Images were acquired in 3 different sessions extendedalong 6 months, 6 images recorded per session. The capturedimages were divided in two parts (visible and thermal). Thus, thefinal database contains a total of 1476 images, 738 visible faceimages and 738 thermal facial images. Note that false thermal col-or is given by the sensor according to characteristics of each per-son. All images have been stored in PNG format. A furthersegmentation process is applied as a result of the selection of re-

Fig. 4. Example of a unique image with visible and thermal ranges.

gions of interests. In particular, heads and faces are the groupscreated.

Summing up, images were divided into categories depending onthe type of information they provide, resulting in a total of 2952images:

� Heads: Thermal images of full heads of subjects (738 images).� Heads: Visible images of full heads of subjects (738 images).� Faces: Thermal images of facial details. (738 images).� Faces: Visible images of facial details. (738 images).

The following figures present some examples of thermal andvisible images of heads (in Fig. 5) and faces (in Fig. 6) with thespecified format.

Images were taken indoors with different facial expressionssuch as happiness, sadness or anger, various facial orientationsand distinctive changes in the haircut or facial hair.

The set of head images collects interesting details for recogni-tion tasks, such as ear shape, haircut and chin. On the other hand,the set of facial images provides the minimum information, i.e.nose, mouth and eyes areas.

3.2. Experimental methodology

The aim of the experiments was to find how important is theextra information provided by head shape for human verificationversus face information for thermal and visible ranges, in additionto the effects of fusion. The proposed methodology compares theperformance of thermal and/or visible ranges using faces andheads.

Therefore, four experiments were done, one for each isolatedmodality, varying the range (visible and thermal) and the type ofinformation (head and face). Besides, eight experiments more wereperformed, varying the type of fusion (sum rule for score fusion,product rule for score fusion, function OR for decision fusion, andweights for decision fusion). Summing up, the results of thesetwelve experiments were used to get the conclusions of this work.

In order to assure the independence of results, both sets ofimages were equally divided into 2 subsets: test and training, un-der a 50% hold-out cross validation methodology (Arlot, 2010). Foreach modality, 369 test images and 369 training images wereavailable for the experiments.

For each subject, an equally random division of the image data-base is made so that 9 images per individual are used for testingand the remaining 9 for training purposes (50% hold-out validationmethod). As previously commented, 369 test images and 369 train-ing images randomly chosen are available for the experiments ineach modality. This division is carried out 41 times, i.e. subjectby subject in 41 iterations.

The process of face/head verification for a subject was the fol-lowing. Firstly, the previously stated division of the database wasmade. Secondly, each of the 9 images of the test subject was com-pared with the 369 training images, obtaining the correspondingresults. Once these 9 images were processed, the database wasjoined together again and the process restarted with the next sub-ject until the 41 subjects of the database were processed.

The parameters that took part in the experiments were theFalse Rejection Rate (FRR), False Acceptance Rate (FAR) and EqualError Rate (EER), commonly used in biometric studies. Mean pro-cessing times were also recorded. Such parameters were collectedin form of vectors depending on a variable, the histogramthreshold.

Once the verification process finished, a histogram with thecontributions of each image from the database was obtained. Theimage that best fit the test image shows the biggest value in thehistogram. In a second stage, histogram values were normalized

Table 1All accuracies from the experiments applied for thermal, visual ranges and theirfusions.

Type of experiment Type of information

Head (%) Face (%)

Without fusion: visual range 99.05 97.65Without fusion: thermal range 97.60 88.20Score fusion: sum 99.15 98.15Score fusion: product 99.45 97.65Decision fusion: OR 100 100Decision fusion: weights 100 100

Table 2Average computational times of head and face images verification during theexperiment for thermal and visible ranges, in seconds.

Mean times in seconds

Type of experiment Model building Test verification

Visible head 283.47 0.49Visible face 135.08 0.28Thermal head 121.56 0.26Thermal face 102.55 0.26Score fusion for head (add rule) 408.17 0.36Score fusion for head (product rule) 419.47 0.37Score fusion for face (add rule) 244.68 0.26Score fusion for face(product rule) 231.99 0.25Decision fusion for head (add rule) 401.01 0.37Decision fusion for head (product rule) 414.04 0.37Decision fusion for face (OR rule) 235.25 0.26Decision fusion for face(weight rule) 237.06 0.26

Fig. 5. 6 Thermal and visible head images of the database. The examples show additional facial features such as head shape, hair and chin.

Fig. 6. 6 Thermal and visible face images of the database from the same subjects in Fig. 4. The examples only show basic facial features such as eyes, lips and nose,representing the minimum information needed to verify a subject in the system.


with regard to the biggest value, from 1 to �1. A threshold wasthen applied in order to consider only the contributions of thoseimages that are above it; i.e. those below were discarded in thatmoment. The histogram threshold descends from value 1 to �1in order to consider different samples each time.

3.3. Results

According to the experimental methodology, twelve experi-ments were done. Table 1 shows accuracies calculated underhold-out validation for a verification approach.

Table 2 shows the average computational time. Although theverification time remains the same, the database updating time(model building time) with head images is substantially higheras these images possess more information than facial images andtherefore require a greater computational effort. All experimentswere done using MATLAB on a computer with a 2.66-GHz CPU,and 2 GB RAM.

In Fig. 7, FRR and FAR are shown in relation to the histogramthreshold. X-axis represents the threshold variation and Y-axisshows FRR and FAR values, for the best approaches. These areROC curves for visual head images versus score and decision fu-sions, respectively. In Fig. 7 we can see that the response of FRR

curve is typical and it obtains its answer with thresholds between0 and �1. But the response of FAR curve is flat, as it needs very highvalues to reach its typical shape. Of course, this is a good character-istic of the proposal, as it allows finding a better EER point.

In practical terms, the threshold fall represents how the systembecomes less demanding, taking more samples in account, increas-ing the FRR and FAR, since the additional samples do not belong tothe test subject.

For isolated modalities, the best result obtained in experimentswith thermal head images is 97.60% in relation to 88.20% in ther-mal face verification. And the best result for visual head imagesis 99.05% in relation to 97.65% in visible face verification. There-fore, the accuracy rate with head images is higher in comparisonwith facial images in both cases. Besides, it can be concluded thatthe visible range has more discriminate information than the ther-mal range.

Regarding the fusion experiments, head images provided againbetter results than face images. For both types of fusions, the accu-racy is improved, reaching up to 100% for decision fusion. Forscore-based fusion, 99.45% was reached with the product rule.

(a)

(b)

0.00.10.20.30.40.50.60.70.80.91.0

1.00 0.85 0.70 0.55 0.40 0.25 0.10 -0.05 -0.20 -0.35 -0.50 -0.65 -0.80 -0.95

Verification Threshold

FRR & FAR: visual head image vs score fusion for head with product rule

FRR

FAR

FRR-fusionFAR-fusion

0.0

0.2

0.4

0.6

0.8

1.0

1.00 0.85 0.70 0.55 0.40 0.25 0.10 -0.05 -0.20 -0.35 -0.50 -0.65 -0.80 -0.95

Verification Threshold

FRR & FAR: visual head image vs decision fusion for head

FRR

FAR

FRR-fusion

FAR-fusion

Fig. 7. FRR (blue and green lines) and FAR (red and purple lines) in terms of the histogram threshold in (a), visible head verification, and in (b), thermal head verification. (Forinterpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).


On the isolated approaches shown the EER presents a larger va-lue compared to the fused systems. On fusion the FRR is flat fornegative thresholds and the EER is subsequently lower. So the pre-vious accuracy can be reached.

3.4. Discussions

Four isolated modalities (thermal and visible ranges of face andhead images) have been compared in this work under a verificationproblem, with the aim of studying the amount of information pro-vided by each format. The results showed that, when compared toface images, head images preserve important discriminative char-acteristics in both visible and thermal images to identify a subject.On the other hand, it becomes clear that in case of head images,more SIFT descriptors are produced, and therefore more essentialdata for the verification process is extracted. Additionally, facesof different subjects have often common features that provide nodiscriminative information. This effect became particularly notori-ous for thermal images. On the visible range, a similar accuracywas found; only 2% of decay between head and facial images.

In addition, visible images reached better accuracies, with a dif-ference of 2% for head images and 9% for facial images. Therefore,for the camera used and relying on SIFT keypoints; the visiblerange may be a better approach. SIFT parameterization is adaptedbetter for visual than thermal range, because visual range presentsmore clear details.

The verification quality was evaluated through a series of inde-pendent experiments with various results, showing the power ofthe system, which satisfactorily verified the identity of the data-base subjects, overcoming limitations such as dependency on illu-mination conditions and facial expressions. A comparison betweenhead and face verification was made for both ranges. Such ap-proach has reached accuracy rates of 97.60% in thermal headimages and 88.20% in thermal face verification. On the visible

range, 99.05% was achieved with head images and 97.65% in faceverification. Therefore, visible range gave better accuracies thanthermal range, and with independency of range, head images pro-vided the most discriminate information.

Regarding the fusion approach, the potential of both ranges hasbeen integrated, increasing the system’s performance in all thecases when comparing with isolated biometrics. After experiments,it can be observed that the fusion methods applied improve theaccuracy rate. This means that the source of information (head,face and ranges) is not correlated from the error point of viewand those errors can be corrected by using score or decision infor-mation. Besides, the parameterization method was the same for allthese modalities, which enforce the independency of informationtheory. The usage of fusion methods builds a whole approach,which increases the accuracy rate, showing its robustness. Ratiosbetween 99.45% and 100% were reached for score and decision fu-sions, respectively. The product rule reported the best success onthe score-based fusion, while both cases reached 100% of accuracyon decision based fusion. Thus, the use of bimodal informationseems to be the best option under the conditions studied in thiswork.

4. Conclusions

The main contribution of this work resides in the usage of a un-ique device, providing a single image with both visual and thermalinformation. A full study of each of these modalities has been pro-vided. In addition, a comparison between head and facial verifica-tion systems was also included. All systems were based on SIFTdescriptors with a vocabulary tree and a fusion module appliedon thermal and visible range.

The following conclusions have been found. First, head imagesfor thermal and visible ranges are more discriminative than onlyfacial images. In addition, the visible range is more discriminative


than the thermal range. Finally, the fusion of both ranges improvesthe isolated biometric.

As a future work, it is desired to increase considerably the sizeof the database, including outdoor images, so that the proposed ap-proach could be validated in such extended database.

Acknowledgments

This work was supported by research Project TEC2012-38630-C04-02 from Spanish Government.

Special thanks to Jaime Roberto Ticay-Rivas for their valuablehelp during the building of this database.

References

Almohammad, Manhal Saleh, Salama, Gouda Ismail, Mahmoud, Tarek Ahmed, 2012.Human identification system based on feature level fusion using face and gaitbiometrics. In: 2012 International Conference on Engineering and Technology(ICET), pp. 1–5.

Arlot, S., 2010. A survey of cross-validation procedures for model selection. Stat.Surv. 4, 40–79.

Bhowmik, M.K., De, B.K., Bhattacharjee, D., Basu, D.K., Nasipuri, M., 2012.Multisensor fusion of visual and thermal images for human face identificationusing different SVM kernels. In: 2012 IEEE Long Island Systems, Applicationsand Technology Conference (LISAT), pp. 1–7.

Bourlai, T., Cukic, B., 2012. Multi-spectral face recognition: Identification of peoplein difficult environments. In: 2012 IEEE International Conference onIntelligence and Security Informatics (ISI), pp. 196–201.

Buyssens, P., Revenu, M., 2010. Fusion levels of visible and infrared modalities forface recognition. In: 2010 Fourth IEEE International Conference on Biometrics:Theory Applications and Systems (BTAS), pp. 1–6.

Crespo, D., Travieso, C.M., Alonso, J.B., 2012. Thermal face verification based onscale-invariant feature transform and vocabulary tree – application to biometricverification systems. Int. Conf. Bio-inspired Syst. Signal Process. 2012, 475–481.

Fuertes, J.J., Travieso, C.M., Alonso, J.B., Ferrer, M.A., 2010. Intra-modal biometricsystem using hand-geometry and palmprint texture. In: 44th IEEE InternationalCarnahan Conference on Security Technology, pp. 318–322.

Heo, J., Savvides, M., Vijayakumar, B.V.K., 2005. Performance evaluation of facerecognition using visual and thermal imagery with advanced correlation filters.In: CVPR’05, Proceedings of the 2005 IEEE Computer Society Conference onComputer Vision and Pattern Recognition, pp. 9–15. ISSN: 1063–6919/05.

Huang, H., He, H., 2011. Super-resolution method for face recognition usingnonlinear mappings on coherent features. IEEE Trans. Neural Networks 22–1,121–130, ISSN: 1045–9227.

Imtiaz, H., Fattah, S.A., 2011. A wavelet-domain local feature selection scheme forface recognition. In: ICCSP’11, 2011 International Conference onCommunications and Signal Processing, Kerala, India, p. 448.

Javadtalab, A., Abbadi, L., Omidyeganeh, M., Shirmohammadi, S., Adams, C.M., ElSaddik, A., 2011. Transparent non-intrusive multimodal biometric system forvideo conference using the fusion of face and ear recognition. In: 2011 NinthAnnual International Conference on Privacy, Security and Trust, pp. 87–92.

Jiang, L., Yeo, A., Nursalim, J., Wu, S., Jiang, X., Lu, Z., 2004. Frontal infrared humanface detection by distance from centroide method. In: Proceedings of 2004International Symposium on Intelligent Multimedia, Video and SpeechProcessing, Hong Kong, pp. 41–44.

Jian-Gang, W., Jun, L., Wei-Yun, Y., Sung, E., 2010. Boosting dense SIFT descriptorsand shape contexts of face images for gender recognition. In: CVPRW’10, 2010IEEE Computer Society Conference on Computer Vision and Pattern RecognitionWorkshops, pp. 96–102.

Lowe, D.G., 1999. Object recognition from local scale-invariant features. In: ICIP’99,Proceedings of the Seventh IEEE International Conference on Computer Vision,vol. 2, pp. 1150–1157.

Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J.Comput. Vision 60–2, 91–110.

Majumdar, A., Ward, R.K., 2009. Discriminative SIFT features for face recognition. In:CCECE ‘09, 2009 Canadian Conference on Electrical and, Computer Engineering,pp. 27–30.

MMikolajczyk, K., 2002. Detection of local features invariant to affinetransformations. Ph.D. thesis. Institut National Polytechnique de Grenoble,France.

Nister, D., Stewenius, H., 2006. Scalable recognition with a vocabulary tree. In:CVPR’06, 2006 IEEE Computer Society Conference on Computer Vision andPattern Recognition, vol. 2, pp. 2161–2168.

Ober, S., Winter, M., Arth, C., Bischof, H., 2007. Dual-layer visual vocabulary treehypotheses for object recognition. In: ICIP’07, 2007 IEEE InternationalConference on Image Processing, vol. 6, pp. VI-345–VI-348.

Raghavendra, R., 2012. PSO based framework for weighted feature level fusion offace and palmprint. In: 2012 Eighth International Conference on IntelligentInformation Hiding and Multimedia, Signal Processing, pp. 506–509.

Slobodan, I., 2008. Object labeling for recognition using vocabulary trees. In:ICPR’08, 19th International Conference on, Pattern Recognition, pp. 1–4.

Socolinsky, D.A., Selinger, A., 2004. Thermal face recognition in an operationalscenario. In: CVPR’04, Proceedings of the 2004 IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, vol. 2, pp. 1012–1019. ISSN: 1063–6919/04.

Soon-Won, J., Youngsung, K., Teoh, A.B.J., Kar-Ann, T., 2007. Robust identityverification based on infrared face images. In: ICCIT’07, 2007 InternationalConference on Convergence Information, Technology, pp. 2066–2071.

Soyel, H., Demirel, H., 2011. Improved SIFT matching for pose robust facialexpression recognition. In: FG’11, 2011 IEEE International Conference onAutomatic Face & Gesture Recognition and Workshops, pp. 585–590.

Tong, Yan, Wheeler, F.W., Xiaoming, Liu, 2010. Improving biometric identificationthrough quality-based face and fingerprint biometric fusion. In: 2010 IEEEComputer Society Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW), pp. 53–60.

Viola, P., Jones, M., 2004. Robust real-time object detection. Int. J. Comput. Vision57–2, 137–154.

Yang, J., Yang, J., Zhang, D., Lu, J., 2003. Feature fusion: parallel strategy vs. serialstrategy. Pattern Recognit. 36 (6), 1369–1381.

Yang, S., Zuo, W., Liu, L., Li, Y., Zhang, D., 2011. Adaptive weighted fusion of localkernel classifiers for effective pattern classification. In: ICIC’11 Proceedings ofthe 7th international conference on Advanced Intelligent, Computing, pp. 63–70.

Zheng, Yufeng, Elmaghraby, A., 2011. A brief survey on multispectral facerecognition and multimodal score fusion. In: 2011 IEEE InternationalSymposium on Signal Processing and Information Technology (ISSPIT), pp.543–550.

http://refhub.elsevier.com/S0167-8655(13)00320-6/h0030














Date post:	25-Dec-2016
Category:	Documents
Upload:	jesus-b
View:	212 times
Download:	0 times

Fused intra-bimodal face verification approach based on Scale-Invariant Feature Transform and a...

Documents