A rotation and scale invariant technique for ear detection in 3D

Pattern Recognition Letters 33 (2012) 1924–1931

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

A rotation and scale invariant technique for ear detection in 3D

Surya Prakash ⇑, Phalguni GuptaDepartment of Computer Science & Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India

a r t i c l e i n f o a b s t r a c t

Article history:Available online 19 March 2012

Keywords:BiometricsConnected graph componentsEar detectionEar localizationEar recognition

0167-8655/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.patrec.2012.02.021

⇑ Corresponding author. Tel.: +91 512 2597579; faxE-mail addresses: [email protected] (S. Prakash

This paper presents a technique for automatic ear detection from 3D profile face range images. The pro-posed technique localizes ear by using inherent structural details of the ear in 3D range data and is invari-ant to rotation and scale. It makes use of connected components of a graph constructed using the edges ofthe depth map image of the range data. The main advantages of the proposed technique over other exist-ing techniques are of two folds. First, the proposed technique does not require any registered 2D imagefor the detection of ear in 3D. Second, it is inherently rotation and scale invariant and can detect left andright ear simultaneously without imposing any additional computational cost. To demonstrate the effec-tiveness of the technique, experiments are conducted on University of Notre Dame public database, Col-lection J2 (UND-J2) which consists of 3D profile face range images with scale and pose variations.Experimental results are found to be encouraging and reveal the effectiveness of the proposed technique.Results are also compared with the existing 3D ear detection techniques.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Ear based human recognition has received much attention in re-cent years because the nature of the ear is stable and it provides areliable way for human recognition (Jain et al., 2007). It has beenfound that ear is invariant to different facial expressions and isunaffected by aging. It is also unaffected by cosmetics and eyeglasses which is the case for face and iris respectively. Ear back-ground is also predictable as it always remains fixed at the middleof the profile face. Moreover, ear can be captured easily withoutthe cooperation of the subject. It can be employed in an stand alonehuman recognition system or can be integrated with the face forenhanced recognition. In spite of ear having so many rich featurescompared to other biometrics, reported accuracy for 2D ear recog-nition has firmly kept it away from being widely used. However,the use of 3D ears have helped in improving the accuracy of earrecognition and recently many systems have been proposed withgood recognition accuracy (Chen and Bhanu, 2007; Yan and Bow-yer, 2007; Passalis et al., 2007; Islam et al., 2008b,c, 2011).

Ear recognition in 3D consists of two major phases namely (i)Ear detection from the profile face and (ii) Recognition. Most ofthe well known ear recognition techniques proposed in the litera-ture (viz. (Bhanu and Chen, 2003; Yan and Bowyer, 2005b; Chenand Bhanu, 2005b; Passalis et al., 2007)) have focussed on recogni-tion phase and have used manually cropped ears for recognition.There exist a few techniques to detect and to crop ear automati-cally in 3D profile face range images. However, most of these

ll rights reserved.

: +91 512 2597647.), [email protected] (P. Gupta).

techniques need a registered 2D ear image for the detection ofear (Chen and Bhanu, 2007; Yan and Bowyer, 2007; Islam et al.,2008b,c, 2011). Also, these techniques are not very efficient; partic-ularly when profile face images are affected by scaling and rotation(pose variations). Moreover, they are not fully automatic to bedeployed in real time applications. For an efficient ear recognitionsystem, specially in non-intrusive applications, it is required todetect automatically and to crop the ear from a whole profile faceimage which may be affected by scale and pose variations.

Detection of ears from an arbitrary 3D profile face range imageis a challenging problem because ear images can vary in scale andpose (due to in-plane and out-of-plane rotations) under variousviewing conditions. This paper proposes a technique which at-tempts to overcome these issues and provides an efficient scaleand rotation invariant solution for automatic ear detection in 3Dprofile face range images. The proposed technique is based ongraph connected components which are constructed using theedge map of the profile face range image. Main contributions ofthis paper are as follows. First, it proposes an effective ear detec-tion technique which does not require any registered 2D imagefor the ear detection in 3D. Second, the proposed technique isinherently invariant to rotation and scale and can detect left andright ear simultaneously without imposing any additional compu-tational cost.

Rest of the paper is organized as follows. Section 2 discussessome well known techniques for automatic ear detection. Section3 presents the proposed technique. Scale and rotation invarianceof the proposed technique is discussed in Section 4. Experimentalresults are analyzed in Section 5. Conclusions are given in the lastsection.

http://dx.doi.org/10.1016/j.patrec.2012.02.021

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.patrec.2012.02.021

http://www.sciencedirect.com/science/journal/01678655

http://www.elsevier.com/locate/patrec

Fig. 1. Flow diagram of the proposed technique.

S. Prakash, P. Gupta / Pattern Recognition Letters 33 (2012) 1924–1931 1925

2. Literature review

In recent years, researchers have focused on human recognitionusing ear shape. They have taken the advantage of 2D or/and 3Ddata of the ear for recognition. Ear recognition in 2D uses gray level(Burge and Burger, 2000; Chang et al., 2003; Hurley et al., 2005;Nanni and Lumini, 2007; Yuan and Mu, 2007; Bustard and Nixon,2008; Cummings et al., 2010) or color (Nanni and Lumini, 2009)information of the ear where as 3D ear recognition exploits thethree dimensional details of the ear (Chen and Bhanu, 2004,2005a, 2007; Yan and Bowyer, 2007; Islam et al., 2008b,c, 2011).Most of the techniques proposed for ear recognition in 2D and3D directly focus on recognition part. However, there are few tech-niques which propose ear detection as a first step of ear recogni-tion. Techniques such as (Bhanu and Chen, 2003; Chen andBhanu, 2005b; Yan and Bowyer, 2005b; Passalis et al., 2007) pro-posed for 3D ear recognition do not employ automatic ear extrac-tion process. There are few techniques proposed in (Chen andBhanu, 2004, 2005a, 2007; Yan and Bowyer, 2007; Islam et al.,2008b,c, 2011) for automatic ear detection from 3D range data.There exist also a few recognition techniques which combine 2Dear with 3D ear (Islam et al., 2008c), 3D ear with 3D face (Bustardand Nixon, 2010), etc. to get improved performance. Since this pa-per aims to develop an efficient technique for ear detection in 3D,our main focus in this section is to review the techniques related to3D ear detection.

One of the initial work reported in the area of 3D ear detection isby Chen and Bhanu (2004) using ear template. Model template usedin this technique is represented by an averaged histogram of shapeindex of the ear. Ear is detected by performing the template match-ing at potential areas in the profile face image computed using theedge information. Another template based technique by Chen andBhanu has been proposed in (Chen and Bhanu, 2005a) where earis represented by a set of discrete 3D vertices corresponding toear helix and antihelix edges. Step edges are extracted from therange image and clustered to detect ear. Ear detection is performedby registering each edge cluster with the ear shape model and theone with minimum mean registration error is declared as the ear.To improve the ear detection performance, Chen and Bhanu haveproposed another technique in (Chen and Bhanu, 2007) where theyhave used a single reference 3D ear shape model to locate ear helixand antihelix parts. However, in this technique a registered 2D colorimage has been considered with 3D range image for ear detection.All these techniques in (Chen and Bhanu, 2004, 2005a, 2007) aretemplate based and hence are not efficient for handling scale andpose variations in the range data.

Yan and Bowyer (2005a) have proposed a technique for crop-ping of the ears using landmark points such as Triangular Fossaand Incisure Intertragica on the original 3D profile range image.A line has been drawn to connect these landmark points and it isused to find out the orientation and scaling of the ear. They haverotated and scaled a mask accordingly and applied it on the origi-nal image to crop the 3D ear data. In (Yan and Bowyer, 2007), Yanand Bowyer have proposed another technique for 3D ear detectionin range images using 2D registered image of profile face alongwith 3D. The technique first detects nose tip to compute probableear region. It then finds the ear pit by using skin detection and cur-vature estimation in this region. Further, an active contour is ini-tialized using the boundary of the pit and both color and depthinformation are used to make the contour converge at the earboundary. Performance of this technique relies on the correctdetection of nose tip and is bound to deteriorate when rangeimages are affected by pose variations.

Islam et al. in (Islam et al., 2008b,c, 2011) have proposed a 3Dear recognition technique where ears from a 3D range profile

image is extracted with the help of a registered 2D profile face im-age. First, the ear region is detected in 2D profile face image usingthe AdaBoost based ear detector proposed in (Islam et al., 2008a).Afterwards, the corresponding 3D data is extracted from the co-registered 3D profile image. The techniques proposed in (Islamet al., 2008b,c) require huge amount of time (few weeks) for train-ing of the classifiers. An attempt has been made in (Islam et al.,2011) to reduce the training time to few days by employing a com-puter cluster, however it is also not a practical solution with re-spect to time and money.

It should be noted that except the techniques proposed by Chenand Bhanu (2004, 2005a), all other techniques require a registered2D profile face image for ear detection in 3D range image. More-over, these techniques do not offer any viable mechanism to per-form ear detection in presence of scale and pose (rotation)changes. Also, they are not able to detect left and right ears simul-taneously and require a priori information or specific training forthat.

Our previous research on ear biometric presented in (Prakashet al., 2009) has dealt with ear detection in 2D color imageswhereas the another presented in (Prakash and Gupta, 2011) pro-poses an illumination and pose invariant technique for 2-D earrecognition using local features. These both techniques work on2D data. In this paper, we have made use of depth map imageof the range data to design an ear detection technique for 3Drange data.

3. Proposed technique

The technique is based on the fact that in a 3D profile facerange image, ear is the only region which contains maximumdepth discontinuities; as a result, this place is rich in edges. It alsorelies on the fact that edges belonging to an ear are curved in nat-ure. The technique consists of two main steps: preprocessing andear detection. Fig. 1 shows the flow diagram of the proposedtechnique.

3.1. Preprocessing

Preprocessing consists of four major steps. First 3D profile rangeimage is converted to depth map image. Further, edge computationis carried out on the depth map image. Detected edges are approx-imated using line segments in the next step. Final step of the pre-processing prunes out all irrelevant edges.

1926 S. Prakash, P. Gupta / Pattern Recognition Letters 33 (2012) 1924–1931

3.1.1. Computation of Depth Map ImageThe 3D data of profile face is collected by non-contact 3D digi-

tizer Minolta Vivid 910 which produces 3D scanned data in theform of point cloud grid of size m � n with each point having a3D depth information. The digitizer produces a large value of depth(Inf) if it fails to compute the depth information for a point. Let thedepth information for a point be denoted by z. So for a point (i, j),z(i,j) contains a real finite value if depth can be computed; other-wise, it contains Inf. In the proposed technique, a 3D range imageof a profile face is first converted to a depth map image which isfurther used for edge detection. Depth map image from a range im-age is obtained by treating the depth value of a 3D point as its pixelintensity. Formal procedure of conversion of a 3D range image to adepth map image (I2D 2 Rm�n) is given by the following equation.

I2Dði; jÞ ¼zði;jÞ; if zði;jÞ is finite0; otherwise

�ð1Þ

Pixel intensities in the depth map image are normalized in therange 0 to 1 using min–max normalization as defined below

I2Dði; jÞ ¼I2Dði; jÞ �minðI2DÞ

maxðI2DÞ �minðI2DÞð2Þ

Obtained depth map image is filtered using a median filter toremove noise. Fig. 2(a) shows few examples of 3D range imagescollected using Minolta Vivid 910 scanner. Their correspondingdepth map images are shown in Fig. 2(b). It should be noted thatthe technique described above to convert a 3D range image to2D depth image works well for any general pose and rotation ofa 3D profile face. Bottom row of Fig. 2 shows an example of suc-cessful conversion of 3D range images to depth images for variousviews of a subject.

3.1.2. Edge computationComputation of edges in an intensity image is a challenging

task. However in a depth map image of a profile face, it is relativelyeasy due to the presence of strong depth discontinuities in ear re-gion, particularly around the helix of the ear. In addition, if one ob-serves a depth map image closely, it can be seen that ear is theregion that contains maximum variations in the depth, resultingthis place rich in edges. In the proposed technique, depth map im-age of a profile face undergoes edge computation step where edgesare detected using Canny edge operator (Canny, 1986). Subse-quently, a list of all the detected edges is obtained by connectingthe edge pixels together into a list of coordinate pairs. Whereveran edge junction is found, the list is concluded and a separate listis created for each of the new branches. Edges originated due tonoise are eliminated using an edge length based criterion wherelength of an edge is defined as the number of pixels participatingin it. Length threshold for edge elimination which is chosen

Fig. 2. (a) Original 3D range images and (

automatically is proportional to the width of the profile face depthmap image. Formally, for an image of width n, the threshold s canbe defined as

s ¼ jn ð3Þ

where j is a constant. In our experiments, it is set to 0.03.

3.1.3. Edge approximationAll pixels of an edge obtained in the previous step may not be

equally important and may not be necessary to represent the edge.To speedup the processing, redundant pixels can be removed fromfurther processing. Removal of the redundant pixels is carried outby fitting line segments to the edges. Line fitting proceeds as fol-lows. In each array of edge points, value and position of the max-imum deviation from the line that joins the endpoints iscomputed. If the maximum deviation at a point is found to be morethan the allowable tolerance, the edge is shortened to that pointand procedure is repeated to fit the line in the remaining pointsof the edge. This breaks each edge into line segments, each ofwhich represents the original edge with the specified tolerance.

3.1.4. Pruning of irrelevant edgesAll linear edges (i.e. which need only two points for their repre-

sentation after line segment fitting) obtained in the previous stepcan be removed as human ear edges contain some curvature andneed three or more points for their representation. Let the set ob-tained after removing the irrelevant edges be S where S containsonly the edges which may belong to the ear.

3.2. Ear detection

The process of ear detection consists of two major steps. It firstbuilds an edge connectivity graph using the edge map obtainedfrom depth map image of profile face. This graph is used after-wards to find the connected components for ear detection.

3.2.1. Construction of edge connectivity graphLet the set S contain n edges which define the edge map of the

profile face depth map image. Let the ith edge, ei, in S be defined bya point pi 2 P. Let there be a convex hull CH(ei) defined for eachedge in ei. Let there be a newly defined edge connectivity graphG = (V,E) where the set of vertices, V, and the set of edges, E, canbe defined as

V ¼ fpijpi 2 PgE ¼ fðpi;pjÞjCHðeiÞ intersects CHðejÞg

An interesting property in the ear part of the edge map is thatear edges are mostly convex in nature and also if one moves fromouter part of the ear towards inside, mostly outer ear edges contain

b) Corresponding depth map images.


ear edges inside. Due to this characteristics of the ear edges, convexhulls of outer edges cut across the convex hulls of the inner edges.Experimentally, it is observed that the convex hull of an ear edgecuts at least one other convex hull of the edge belonging to theear. Due to this, it is expected that all the vertices belonging tothe ear part in G get connected to one another directly or throughanother vertex. One can also note that the characteristic of an edgecontaining another is not true for the edges belonging to non-earparts of the profile face depth map image. Hence vertices belongingto non-ear part mostly remain isolated in the graph. An example ofsynthetic edge map and convex hulls of edges is shown in Fig. 3(a)and (b) respectively. The corresponding edge connectivity graph isgiven in Fig. 3(c). Since convex hull of edge A intersects the convexhulls of edges B, C and D in Fig. 3(b), the vertices corresponding toB, C and D are connected to A in Fig. 3(c). Similarly, since convexhull of edge C intersects the convex hulls of edges B and D, the ver-tices corresponding to B and D are connected to C. Points E, F and Gremain isolated in Fig. 3(c) because convex hulls of edges E, F and Gin Fig. 3(b) do not intersect the convex hull of any other edge.

3.2.2. Ear detection using size of connected componentsAfter constructing the edge connectivity graph G of the edge

map, connected components (maximal connected subgraphs) ofthis graph are computed. Graph connected components with singlevertex cannot represent ear and hence are overlooked. Graph com-ponents with more than one vertices are considered for ear detec-tion. Criterion used to localize the ear is as follows. Due to thenature of the outer ear edges containing the inner ones, the earedges maintain good connectivity with one other in the edge con-nectivity graph. This does not happen for non-ear edges and hencethe graph component representing the ear turns into the largest insize. Hence the graph component in G which has the largest size(maximum number of edges) is found out and claimed to be theear. Bounding box of the edges participating in this graph compo-nent is considered as the ear boundary. This boundary is used tocrop the ear from the 3D range image.

Fig. 3. (a) Edge map, (b) convex hulls of the

Fig. 4. (a) Edge map (different colors used to distinguish edges), (b) edge connectivity g

Fig. 4 discusses an example of detecting ear for a subject.Fig. 4(a) provides edge map while Fig. 4(b) shows graph connectedcomponents where the largest component is enclosed inside a rect-angle. It can be observed that all the components except the onerepresenting the ear are of very small size (mostly contain one ver-tex). Ear detection result is shown in Fig. 4(c) where rectangleshows the detected ear location.

4. Scale and rotation invariance

An important and useful characteristics of the proposed tech-nique which distinguishes it from other existing techniques is thatit is inherently scale and rotation invariant. It can detect ear at dif-ferent scales and orientations without adding any extra computa-tional cost or training.

4.1. Invariance to scale

Crux of the proposed ear detection technique is in the construc-tion of edge connectivity graph. Building an efficient graph de-pends on the criterion used to define the connectivity among thevertices. To make the ear detection scale independent, criterionto connect vertices should also be scale independent. As discussedin SubSection 3.2.1, the strategy used to connect the vertices in theedge connectivity graph G is based on the intersection of convexhulls. It is able to define the connectivity among the vertices irre-spective of the scale of the ear image. This makes the ear detectionin the proposed technique scale invariant. To demonstrate thescale invariance, the synthetic edge map shown in Fig. 3(a) isscaled-down. Fig. 5(a) shows the obtained scaled-down edgemap. Convex hulls and edge connectivity graph for this edge mapare shown in Fig. 5(b) and (c) respectively. One can see that thecharacteristics of the graph is exactly same as that shown inFig. 3(c). Similar results can be shown for the scaled-up edgemap of Fig. 3(a).

edges and (c) edge connectivity graph.

raph (the largest connected component enclosed in rectangle) and (c) detected ear.

Fig. 5. (a) Scaled-down edge map of Fig. 3(a), (b) convex hulls of the edges shown in (a), (c) edge connectivity graph for (b), (d) rotated edge map of Fig. 3(a), (e) convex hullsof the edges shown in (d) and (f) edge connectivity graph for (e).


4.2. Invariance to rotation

Since structural appearance of the ear (hence the edge map)does not change due to in-plane rotation, convex hull based crite-rion can effectively be used to define the connectivity among thevertices in the edge connectivity graph G, even in the presence ofrotation. This makes the proposed 3D ear detection technique rota-tion invariant. To demonstrate the rotation invariance, syntheticedge map shown in Fig. 3(a) is rotated by �45�. Fig. 5(d) showsthe rotated edge map. Convex hulls and edge connectivity graphfor this edge map are shown in Fig. 5(e) and (f) respectively. Onecan see that the characteristics of the graph is same as that shownin Fig. 3(c). We can show similar results for any amount of rotation.The technique can also detect ear in the presence of out-of-planerotation, provided the rotation does not hide the structural infor-mation of the ear completely.

5. Experimental results

The proposed technique is validated on Collection J2 (UND-J2)(University of Notre; Yan and Bowyer, 2007) of University of NotreDame public database which consists of 1780 3D profile faceimages of 415 subjects. These images are collected using MinoltaVivid 910 scanner. We have discarded some images of UND-J2where camera could not capture 3D information of the ear cor-rectly and have considered 1604 3D profile face images for exper-imentation. These images are affected by scale and pose variationsand there is, on the average, a time gap of 17.7 weeks between theearliest and the latest images of subjects. Also, some images are oc-cluded by hair and ear rings. Few sample images from this data-base are shown in Fig. 2 where scale and rotation variations can

Table 1Ear detection accuracy of the proposed technique.

Database # of imagesused for testing

Ear detectionaccuracy (%)

Description of the test imagesincluding challenges involved

UND-J2 1604 99.06 Normal UND-J2 3D range images1604 99.06 3D range images rotated by +90�1604 99.06 3D range images rotated by �90�1604 99.06 3D range images flipped

horizontally194 100 3D range images of varied scales149 99.32 3D range images with out of

plane variations

be seen. Most of the images present in this database contain suchvariations. It can be noted that this database also provides corre-sponding 2D registered image for each 3D range image. However,since the proposed technique requires only 3D data for performing3D ear detection and does not need registered 2D image to detectear in 3D, it uses only 3D range images of the database.

5.1. Performance evaluation

An ear detection is considered successful if the detected bound-ary does not include more than 15% neighboring pixels or does notexclude more than 15% pixels of the ear compared to the true rect-angular ear boundary obtained using hand segmentation of 2Ddepth map images. The accuracy of any ear detection techniquecan be measured as follows:

Accuracy ¼ Number of Correct Ear Detections� 100Total Test Samples

%

Accuracy of the proposed technique is shown in Table 1. Thetechnique produces 99.06% correct 3D ear detection rate for 1604images of UND-J2 database. To show the robustness of the pro-posed technique against rotation, there are two new test data setsare generated from the original data, first by performing in-planerotation of the 3D range images by +90� and second by performingthe rotation of �90�. Performance of ear detection on both of thesedata sets is exactly same as the one obtained on the original dataset. Also, to show that the proposed technique can detect left andright ears simultaneously, another test data set is formed by flip-ping the 3D profile face range images horizontally making all leftprofile images right. We have also achieved the same performanceon this data set too.

To test the robustness against scale, 194 3D range images of dif-ferent scales are selected from UND-J2 database and ear detectionis performed. The proposed technique has received 100% detectionaccuracy in this experiment. A set of 149 range images whereimages are affected by out-of-plane rotation is formed from theUND-J2 database to test the technique against out-of-plane rota-tions. Detection accuracy in this case is found to be 99.32%. Thetechnique could not detect ear only in one range image out of149 images due to acute rotation which made the structural detailsof the ear invisible. It can be mentioned that the results in all theseexperiments have been obtained without changing any parameteror performing a specific training. Ear detection accuracies for allthese test cases are summarized in Table 1.

Table 2Ear detection accuracy of the proposed technique in presence of noise.

Database # of images used fortesting

Density of thenoise


(a) Salt and pepper noiseUND-J2 1604 0.01 95.08

0.02 94.520.03 93.460.04 93.270.05 92.750.06 92.08

Database # of images used fortesting

Variance of thenoise


(b) Gaussian noise with mean 0UND-J2 1604 0.0001 91.01

0.0002 88.900.0003 86.290.0004 84.600.0005 83.920.0006 80.93


5.2. Performance in presence of noise

To test the performance of the proposed technique against noise,ear detection has been performed by adding two types of noises,namely salt and pepper noise and Gaussian noise, to profile faceimages of UND-J2 database. The salt and pepper noise is character-ized by its density whereas the Gaussian noise by its mean and var-iance. Experiments are performed for various density values of saltand pepper noise and variance values of Gaussian noises. Mean forthe Gaussian noise has been kept as 0. Experimental results are pre-sented in Table 2. It has been observed that the technique performsgood under the influence of both the noises. Performance graduallydeteriorates as the ear edges get weaken by addition of more andmore noise. It can be noted that results in this experiment are ob-tained by using the same parameters which are used in ear detec-tion from normal profile face images. Results can be improved bytuning the parameters for noisy images. From Table 2(a), one canobserve that even if density of salt and pepper noise is about 6%,the accuracy is more than 92%. But it is little different in case ofGaussian noise (Table 2(b)) where detection accuracy is reducedto about 91% when the noise variance is 0.01%.

5.3. Performance in presence of occlusion

To test the performance of the proposed technique againstocclusion, profile face images having ears partially occluded due

Fig. 6. Ear detection results for occluded ears from UND-J2 database: profile face range i

to hair have been selected from UND-J2 database and ear detectionhas been performed. It has been found that the proposed techniquehas detected ear in all selected images. Some of the obtained re-sults are reported in Fig. 6. However, since the proposed techniquerelies on the strong connectivity among the ear edges in edge con-nectivity graph, it may fail if substantial portion of the outer helixof the ear is occluded.

5.4. Comparison with other techniques

In order to analyze the performance of the proposed technique,we have compared its performance with the state of the art tech-niques proposed in the literature. In (Chen and Bhanu, 2007), Chenand Bhanu have carried out experiments on UND Collection G anda part of Collection F databases (which are the subsets of UND Col-lection J2 database). This technique is based on template matchingand is not rotation and scale invariant. Ear detection accuracy in(Chen and Bhanu, 2007) is reported to be 87.71% on the test dataset of 700 images. This performance has been achieved when 3Drange images are used along with 2D. But its performance whenonly 2D information is used, is not reported in (Chen and Bhanu,2007). However, it is obvious to conclude that the performance ob-tained using only 2D information cannot be better than that ob-tained using both 2D and 3D information.

The technique proposed in (Yan and Bowyer, 2007) uses a partof UND Collection J2 (total test images 415) database for experi-ments. It achieves accuracy of 85.54% for 3D ear detection whenit uses only 3D information. It is reported to achieve 100% detec-tion performance when information from both 2D and 3D is usedfor 3D ear detection. However, it is important to note that the testset used in performance evaluation is very small and consists of aselected set of images from UND-J2 database. As a result, perfor-mance of the technique on whole UND-J2 database is not re-ported. Moreover, it is not rotation invariant as it depends onthe correctness of the nose tip detection. In case of rotation, heu-ristics used to locate nose tip may fail and result into a wrong eardetection.

In (Islam et al., 2011), Islam et al. have claimed to achieve goodaccuracy for 3D ear detection. However, this technique is not a 3Dear detection technique in correct sense as it does not make use of3D information of the ear for its detection. Instead, it performs eardetection on a registered 2D image and uses the location of the earin 2D image to crop ear in 3D. This technique cannot be used whenonly 3D range data of the ear is available. Also, if test ear imagesare rotated or their appearances are changed with respect to train-ing data, this technique may fail. This is due to the fact that thetraining images may not include such cases. Forming a database

mages in the first row and corresponding partially detected ears in the second row.

Table 3Performance Comparison on UND-J2 Database.

Technique Database size Test images considered Detection accuracy (%) Remarks

Chen and Bhanu, 2007 700 700 87.71 When 3D range images used with 2Da

Yan and Bowyer, 2007 1780 415b 85.54 when only 3D range images used100 when 3D range images used with 2D

Islam et al., 2011 1780 830c 99.90 based on 2D registered imagesProposed Technique 1780 1604d 99.06 when only 3D range images used

a No results reported in (Chen and Bhanu, 2007) when only 3D range images are used.b Test images selected manually.c Test images selected manually.d All images of UND-J2 except the one where 3D ear information could not be captured.

Table 4Comparison of the proposed technique with the technique discussed in (Islam et al.,2011).

Parameters Techniques

Islam et al. (2011) Proposedtechnique

Training overhead More, to train classifiers with1000s of positive and negativesamples

Notrainingrequired

Training time required Few days Notraining atall

Invariance to Rotation andScale

No Yes

Registered 2D imagerequired for ear detectionin 3D

Yes No

Can work directly on 3D data No YesCan detect left and right ear

without specific trainingNo Yes


of ears with all possible rotation demands very large space as wellas high amount of training time, which may not be practically fea-sible. Also to detect ears of different scale, it makes an exhaustivesearch with filters of various sizes which is computationally veryexpensive and makes the technique infeasible for real applications.Moreover, UND-J2 database contains 1780 images collected from415 subjects. However, this technique separates a part of it (830images) to show the performance. It has not reported the perfor-mance on whole UND-J2 database. Hence, it can be noted that sizeof the test data used in all of these techniques (Chen and Bhanu,2007; Yan and Bowyer, 2007; Islam et al., 2011) is small as com-pared to the test data used in our experiments.

On the other hand, the technique proposed in this paper isinherently able to handle rotation (pose) and scale changes anddoes not incur any extra computational overhead to achieve this.Also, a good property of the proposed technique is that it can de-tect left as well as right ears simultaneously without any priorinformation or specific training. Also, it does not make use of a reg-istered 2D image to detect ear in 3D. Its independence from 2Dmakes it generic and more applicable to real life scenarios. Thischaracteristics makes the proposed technique superior to the wellknown techniques due to the fact that it may not be always feasi-ble to get a registered 2D image together with 3D range image for aprofile face. Also, its experimental results are more stable as com-pared to the one reported in (Chen and Bhanu, 2007; Yan and Bow-yer, 2007; Islam et al., 2011) as they are obtained on a fairly largedata set of 1604 test samples taken from UND-J2 database.

Table 3 provides ear detection accuracy of all these techniques.It can be observed that the proposed technique performs muchbetter than the techniques proposed in (Chen and Bhanu, 2007;Yan and Bowyer, 2007) (when it uses only 3D information). Also,it performs competitively with respect to the techniques proposed

in (Yan and Bowyer, 2007) (when it uses 2D information togetherwith 3D) and in (Islam et al., 2011). However, it should be notedthat the technique proposed in (Yan and Bowyer, 2007) uses 2Dinformation with 3D and has only been tested on 415 test images.In case of the technique proposed in (Islam et al., 2011), the detec-tion is done using 2D registered image and not using the 3D data.Its test data is also smaller than that used by us. The technique pro-posed in this paper can achieve comparable ear detection perfor-mance for 3D range images with using only 3D information. Adetailed comparison of the proposed technique with the most re-cent technique proposed in (Islam et al., 2011) is given in Table 4.

The proposed technique performs ear detection using con-nected components constructed using ear edges which representthe structural details of the ear. Since structural information ofthe ear does not change due to rotation or scaling, the proposedtechnique can handle scale and rotation present in the images suc-cessfully. Hence it is able to detect ears in images affected by scaleand rotation. Moreover, the technique detects ears in the imagesaffected by out-of-plane rotation provided the rotation is not acuteto hide the structural details of the ear. But this is not the case withthe existing techniques.

Techniques discussed in (Chen and Bhanu, 2004, 2005a) couldnot be compared with our proposed technique as these techniquesare tested on University of California Riverside (UCR) profile facedatabase which is not available publicly.

6. Conclusions

This paper has proposed a technique for automatic ear detec-tion from 3D profile face range images. Inherent structural detailsof the ear in 3D range data has been considered to design the tech-nique. Unlike other existing techniques, it does not take the help ofa registered 2D profile face image for ear detection and it com-pletely relies on the 3D range data. Ear detection is carried outby first constructing a graph using the edges obtained from therange data and then by analyzing the connected components ofthe graph for ear detection. The technique can detect ears of differ-ent scales and rotations efficiently without any user intervention. Itcan also detect left and right ear simultaneously without any priorinformation or specific training. University of Notre Dame database(Collection J2) consisting 3D profile face range images collectedfrom 415 subjects has been used to analyze the proposed tech-nique. These images are affected by scale changes and in-planeand out-of-plane rotations. Experimental results reveal the effec-tiveness of the technique.

Comparing to the well known ear detection techniques in 3D,the proposed technique is found to be efficient. Unlike existingtechniques, it does not need a registered 2D image along with 3Dfor detecting the ear in 3D. The proposed technique is inherentlyscale and rotation invariant. It can detect left and right ears with-out any prior information. It does not need any training for eardetection.


Acknowledgments

Authors are thankful to the anonymous reviewers and the Edi-tor for their valuable suggestions which have helped us to improvethe quality of the manuscript. Authors also like to acknowledge thesupport provided by the Department of Information Technology,Government of India to carry out this work.

References

Bhanu, B., Chen, H., 2003. Human ear recognition in 3D. In: Proc. Workshop onMultimodal User Authentication, pp. 91–98.

Burge, M., Burger, W., 2000. Ear biometrics in computer vision. In: Proc. Internat.Conf. on Pattern Recognition (ICPR’ 00), vol. 02, pp. 822–826.

Bustard, J., Nixon, M., 2008. Robust 2D ear registration and recognition based onSIFT point matching. In: Proc. Internat. Conf. on Biometrics: Theory,Applications and Systems (BTAS’ 08), pp. 1–6.

Bustard, J., Nixon, M., 2010. 3D morphable model construction for robust ear andface recognition, in: Proc. Internat. Conf. on Computer Vision and PatternRecognition (CVPR’ 10), pp. 2582–2589.

Canny, J., 1986. A computational approach to edge detection. IEEE Trans. PatternAnal. Machine Intell. 8 (6), 679–698.

Chang, K., Bowyer, K.W., Sarkar, S., Victor, B., 2003. Comparison and combination ofear and face images in appearance-based biometrics. IEEE Trans. Pattern Anal.Machine Intell. 25 (9), 1160–1165.

Chen, H., Bhanu, B., 2004. Human ear detection from side face range images. In:Proc. Internat. Conf. on Pattern Recognition (ICPR’ 04), vol. 3, pp. 574–577.

Chen, H., Bhanu, B., 2005. Shape model based 3D ear detection from side face rangeimages. In: Proc. Internat. Conf. on Computer Vision and Pattern Recognition-Workshops, pp. 122–127.

Chen, H., Bhanu, B., 2005. Contour matching for 3D ear recognition. In: Proc. IEEEWorkshops on Application of Computer Vision (WACV/MOTION’ 05), vol. 1, pp.123–128.

Chen, H., Bhanu, B., 2007. Human ear recognition in 3D. IEEE Trans. Pattern Anal.Machine Intell. 29 (4), 718–737.

Cummings, A., Nixon, M., Carter, J., 2010. A novel ray analogy for enrolment of earbiometrics. In: Proc. Internat. Conf. on Biometrics: Theory, Applications andSystems (BTAS’ 10), pp. 1–6.

Hurley, D.J., Nixon, M.S., Carter, J.N., 2005. Force field feature extraction for earbiometrics. Comput. Vision and Image Understanding 98 (3), 491–512.

Islam, S., Bennamoun, M., Davies, R., 2008. Fast and fully automatic ear detectionusing cascaded adaboost. In: Proc. IEEE Workshop on Applications of ComputerVision (WACV’ 08), pp. 1–6.

Islam, S.M., Davies, R., Mian, A.S., Bennamoun, M., 2008/ A fast and fully automaticear recognition approach based on 3D local surface features, in: Proc. 10thInternat. Conf. on Advanced Concepts for Intelligent Vision Systems (ACIVS ’08),pp. 1081–1092.

Islam, S.M.S., Bennamoun, M., Mian, A.S., Davies, R., 2008. A fully automaticapproach for human recognition from profile images using 2D and 3D ear data,in: Proc. 4th Internat. Symposium on 3D Data Processing, Visualization andTransmission (3DPVT ’08), pp. 131–135.

Islam, S.M.S., Davies, R., Bennamoun, M., Mian, A.S., 2011. Efficient detection andrecognition of 3D ears. Internat. J. Comput. Vision 95 (1), 52–73.

Jain, A.K., Flynn, P., Ross, A.A. (Eds.), 2007. Handbook of Biometrics. Springer-Verlag,Inc., New York (Chapter 7).

Nanni, L., Lumini, A., 2007. A multi-matcher for ear authentication. PatternRecognition Lett. 28 (16), 2219–2226.

Nanni, L., Lumini, A., 2009. Fusion of color spaces for ear authentication. PatternRecognition 42 (9), 1906–1913.

Passalis, G., Kakadiaris, I.A., Theoharis, T., Toderici, G., Papaioannou, T., 2007.Towards fast 3D ear recognition for real-life biometric applications, in: Proc.IEEE Conf. on Advanced Video and Signal Based Surveillance (AVSS’ 07), pp. 39–44.

Prakash, S., Gupta, P., 2011. An efficient ear recognition technique invariant toillumination and pose. Telecommun. Syst., 1–14. http://dx.doi.org/10.1007/s11235-011-9621-2.

Prakash, S., Jayaraman, U., Gupta, P., 2009. Ear localization using hierarchicalclustering. In: Proc. SPIE International Defence Security and Sensing Conference,Biometric Technology for Human Identification VI, 730620, Orlando, FL, 2009.

University of Notre Dame Profile Face Database, Collection J2. <http://www.nd.edu/cvrl/CVRL/DataSets.html>.

Yan, P., Bowyer, K., 2005. Empirical evaluation of advanced ear biometrics. In: Proc.Internat. Conf. on Computer Vision and Pattern Recognition-Workshops, vol. 3,pp. 41–48.

Yan, P., Bowyer, K., 2005. Ear biometrics using 2D and 3D images. In: Proc. Internat.Conf. on Computer Vision and Pattern Recognition-Workshops, pp. 121–128.

Yan, P., Bowyer, K., 2007. Biometric recognition using 3D ear shape. IEEE Trans.Pattern Anal. Machine Intell. 29 (8), 1297–1308.

Yuan, L., Mu, Z., 2007. Ear recognition based on 2D images, in: Proc. Internat. Conf.on Biometrics: Theory, Applications and Systems (BTAS’ 07), pp. 1–5.

http://dx.doi.org/10.1007/s11235-011-9621-2

http://dx.doi.org/10.1007/s11235-011-9621-2

http://www.nd.edu/cvrl/CVRL/DataSets.html

http://www.nd.edu/cvrl/CVRL/DataSets.html

Date post:	30-Nov-2016
Category:	Documents
Upload:	surya-prakash
View:	226 times
Download:	12 times

A rotation and scale invariant technique for ear detection in 3D

Documents