+ All Categories
Home > Documents > Self-Similarity and Points of...

Self-Similarity and Points of...

Date post: 17-Mar-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
17
1 Self-Similarity and Points of Interest Jasna Maver Abstract—In this work, we present a new approach to interest point detection. Different types of features in images are detected by using a common computational concept. The proposed approach considers the total variability of local regions. The total sum of squares computed on the intensity values of a local circular region is divided into three components: between-circumferences sum of squares, between-radii sum of squares, and the remainder. These three components normalized by the total sum of squares represent three new saliency measures, namely, radial, tangential, and residual. The saliency measures are computed for regions with different radii and scale-spaces are build in this way. Local extrema in scale-space of each of the saliency measures are located. They represent features with complementary image properties: blob-like features, corner-like features, and highly textured points. Results obtained on image sets of different object classes and image sets under different types of photometric and geometric transformations show high robustness of the method to intra-class variations as well as to different photometric transformations and moderate geometric transformations and compare favourably with the results obtained by the leading interest point detectors from the literature. The proposed approach gives a rich set of highly distinctive local regions that can be used for object recognition and image matching. Index Terms—interest point detector, self-similarity, visual attention, visual perception, linear predictors, the total sum of squares. 1 I NTRODUCTION I N the past few years, many techniques have been proposed to try to locate interest points in images [12], [29], [21], [41], [35] [14], [26], [34], [36], [27], [25], [42], [43], [38], [39], [15], [10]. These locations are then used either for image matching or for extraction of meaningful entities from images that can further be applied for object or scene representation and recognition. In the papers [26] and [15] different detectors were tested on image sets under different types of photometric and geometric transformations and on image sets of different object classes. Obtained results show that there is not one detector that outperforms the others for all scene types and all types of transformations. A set of complementary detectors is needed to cope with different phenomena in images. In this work, we present a new approach to interest point detection. Different types of features in images are detected by using a common computational concept. The proposed approach is motivated in the following way: The number of objects and scenes that an intelligent vision system needs to know and recognize is huge. Hence, every system for object recognition should pay attention to the amount of data that is needed for its internal object representations. It makes sense then to find locations in images with local region structure that can be coded efficiently and use these locations for object representations. When a local region is self-similar, the data is redundant and we can expect that less data is needed for its representation. Locations in images with self-similar structure of a local pattern are also distin- Copyright 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. guishable from locations in their vicinity, and hence, can also be used for image matching. In this paper, we propose measuring self-similarity of local regions for interest point detection. Symmetry is an example of self-similarity. In [1], [2], [9], symmetry of plane curves has been analyzed for describing shapes. For this purpose, different descrip- tive terms were introduced, as, for example: sym-ax, smoothed local symmetry, symmetry set, and midlocus. The aim was to reduce the huge amount of information carried by a shape down to a “skeleton” of crucial information, which can be more readily assimilated. Closer to our work are context-free attentional operators which tend to use radial symmetry for interest point detection. Two types of approaches can be identified; both rely on image gradient. In [31], [20], [40], symmetry maps are obtained by computing the contribution a local neighbourhood makes to the symmetry measure of a central pixel. The second approach [33], [16], [24] follows the idea of the circular Hough transform [11], [28]. Sym- metry maps are obtained by computing the contribution each pixel makes to the symmetry of pixels around it. The second approach allows faster computation of symmetry maps. The best runtime, well-suited for real- time vision applications, is achieved by the fast radial symmetry transform of Loy and Zelinsky [24]. Di Ges ` u and Valenti [3] propose a discrete symmetry transform based on local axial moments. The method has been applied to eye detection [6], processing astro- nomical images [4], and as an early vision process [5]. Recursive methods [6], [30] reduce the computational load of the symmetry transform. A drawback of the applied symmetry transform is its tendency to highlight lines and regions of high texture in addition to radially symmetric features. Kovesi [17] proposes a measure of symmetry that is based on the analysis of local frequency information;
Transcript
Page 1: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

1

Self-Similarity and Points of InterestJasna Maver

Abstract—In this work, we present a new approach to interest point detection. Different types of features in images are detected byusing a common computational concept. The proposed approach considers the total variability of local regions. The total sum of squarescomputed on the intensity values of a local circular region is divided into three components: between-circumferences sum of squares,between-radii sum of squares, and the remainder. These three components normalized by the total sum of squares represent three newsaliency measures, namely, radial, tangential, and residual. The saliency measures are computed for regions with different radii andscale-spaces are build in this way. Local extrema in scale-space of each of the saliency measures are located. They represent featureswith complementary image properties: blob-like features, corner-like features, and highly textured points. Results obtained on imagesets of different object classes and image sets under different types of photometric and geometric transformations show high robustnessof the method to intra-class variations as well as to different photometric transformations and moderate geometric transformations andcompare favourably with the results obtained by the leading interest point detectors from the literature. The proposed approach gives arich set of highly distinctive local regions that can be used for object recognition and image matching.

Index Terms—interest point detector, self-similarity, visual attention, visual perception, linear predictors, the total sum of squares.

F

1 INTRODUCTION

IN the past few years, many techniques have beenproposed to try to locate interest points in images [12],

[29], [21], [41], [35] [14], [26], [34], [36], [27], [25], [42],[43], [38], [39], [15], [10]. These locations are then usedeither for image matching or for extraction of meaningfulentities from images that can further be applied for objector scene representation and recognition. In the papers[26] and [15] different detectors were tested on imagesets under different types of photometric and geometrictransformations and on image sets of different objectclasses. Obtained results show that there is not onedetector that outperforms the others for all scene typesand all types of transformations. A set of complementarydetectors is needed to cope with different phenomena inimages.

In this work, we present a new approach to interestpoint detection. Different types of features in images aredetected by using a common computational concept. Theproposed approach is motivated in the following way:The number of objects and scenes that an intelligentvision system needs to know and recognize is huge.Hence, every system for object recognition should payattention to the amount of data that is needed for itsinternal object representations. It makes sense then tofind locations in images with local region structure thatcan be coded efficiently and use these locations for objectrepresentations. When a local region is self-similar, thedata is redundant and we can expect that less data isneeded for its representation. Locations in images withself-similar structure of a local pattern are also distin-

• Copyright 2009 IEEE. Personal use of this material is permitted. However,permission to use this material for any other purposes must be obtainedfrom the IEEE by sending a request to [email protected].

guishable from locations in their vicinity, and hence,can also be used for image matching. In this paper, wepropose measuring self-similarity of local regions forinterest point detection.

Symmetry is an example of self-similarity. In [1], [2],[9], symmetry of plane curves has been analyzed fordescribing shapes. For this purpose, different descrip-tive terms were introduced, as, for example: sym-ax,smoothed local symmetry, symmetry set, and midlocus.The aim was to reduce the huge amount of informationcarried by a shape down to a “skeleton” of crucialinformation, which can be more readily assimilated.Closer to our work are context-free attentional operatorswhich tend to use radial symmetry for interest pointdetection. Two types of approaches can be identified;both rely on image gradient. In [31], [20], [40], symmetrymaps are obtained by computing the contribution a localneighbourhood makes to the symmetry measure of acentral pixel. The second approach [33], [16], [24] followsthe idea of the circular Hough transform [11], [28]. Sym-metry maps are obtained by computing the contributioneach pixel makes to the symmetry of pixels aroundit. The second approach allows faster computation ofsymmetry maps. The best runtime, well-suited for real-time vision applications, is achieved by the fast radialsymmetry transform of Loy and Zelinsky [24].

Di Gesu and Valenti [3] propose a discrete symmetrytransform based on local axial moments. The methodhas been applied to eye detection [6], processing astro-nomical images [4], and as an early vision process [5].Recursive methods [6], [30] reduce the computationalload of the symmetry transform. A drawback of theapplied symmetry transform is its tendency to highlightlines and regions of high texture in addition to radiallysymmetric features.

Kovesi [17] proposes a measure of symmetry that isbased on the analysis of local frequency information;

Page 2: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

2

local symmetry and anti-symmetry in images can beidentified as particular arrangements of phase. Localfrequency information is determined via convolutionwith quadrature log-Gabor filters for the full range offilter orientations and a number of scales. The proposedtechnique is invariant to the magnitude of the localcontrast. The computational cost of the method is high.

Recent papers on symmetry are [13], [32], [7], [8],[23], [19]. In this work, the main motivation is nolonger interest point detection. Instead, effort is put intodeveloping an efficient method to detect objects withsymmetry, to classify this symmetry or to extract regionswith symmetry from images.

In this work, self-similarity of a local region is mea-sured on image intensity values and is quantified bythe standardized regression coefficients. Three new mea-sures of region saliency are derived. They are invariantto the similarity group of geometric transformationsand to photometric shift. Because they are based onthe standardized regression coefficients, they representprecise measures and allow a scale space approach.Local extrema in scale-spaces of the proposed saliencymeasures represent different types of features: blob-likefeatures, corner-like features and highly textured points.

The approach proposed in this work enhances thecontext-free attentional operators based on radial sym-metry. Gradient-based image feature detection and ex-traction approaches are sensitive to variations in imageillumination, blurring, and magnification. The thresholdapplied needs to be modified appropriately [18]. Ourmethod is not gradient-based, quite the contrary. It intro-duces new entities: circumferences and radii. Values oncircumferences and radii are summed together; hence,the proposed approach integrates information in imagesat the very first level of data processing, which makes itrobust to noise. Results obtained on image sets of differ-ent object classes and image sets under different types ofphotometric and geometric transformations show highrobustness of the method to intra-class variations as wellas to different photometric transformations and moder-ate geometric transformations, and compare favourablywith the results obtained by the leading interest pointdetectors from the literature. The proposed approachgives a rich set of highly distinctive local regions thatcan be used for object recognition and image matching.

This paper is organized as follows. Section 2 definesregion self-similarity. In Section 3, we describe quantify-ing region self-similarity by the normalized correlationcoefficient. In Section 4, we derive three measures ofregion saliency which allow detection of different typesof features in images. Section 5 gives implementationdetails. In Section 6, the proposed method is tested inaccordance with two protocols ([26], [15]) for evaluationof local region detectors. Section 7 provides conclud-ing remarks and possible extensions of the proposedmethod.

Fig. 1. A reflection maps the location (r, φ) to location(r, 2ϑ− φ).

2 SELF-SIMILARITYImagine that you have an image fragment and an imageand you would like to find a local region in the imagewhich is similar to the image fragment. One possibleway of solving this problem is to move the imagefragment over the image and, at each location in theimage, compute the normalized correlation coefficientbetween the corresponding intensity values of the imagefragment and the local region. By computing the normal-ized correlation coefficient, it is assumed that when alocal region is similar to the image fregment there existsa linear relationship between the intensity values of theimage fregment and the local region. When the value ofthe normalized correlation coefficient is close enough toone, we believe that a local region is similar to the imagefragment.

A self-similar region in an image can be detectedin a similar way. At each location in the image, thenormalized correlation coefficient is now computed be-tween the intensity values of a local region and theintensity values of the same geometrically transformedlocal region. A local region is self-similar if there existsa linear relationship:

I(T (x)) = a+ bI(x),∀x ∈ P. (1)

Here, P denots a circular region of radius R and xis a point location in P . Variable I(x) denotes the in-tensity value at location x, and T denotes a bijectivegeometric transformation defined on P . Here, we limitT to reflection and rotation. Both reflection and rotation,preserve distances, angles, sizes, and shapes. For thesake of simplicity, point location is represented in polarcoordinates; hence, x = (r, φ).

2.1 ReflectionEvery reflection has a mirror line. Let the mirror line gothrough the centre of P and let ϑ ∈ [0, π) denote themirror line orientation. A reflection maps the location(r, φ) to location (r, 2ϑ− φ) (see Fig. 1).

Page 3: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

3

2.2 Rotation

Every rotation has a centre and an angle. Let the centreof the rotation be the centre of P and let the rotationangle α be one of the angles 2π

n , where n is an integer.A rotation maps the location (r, φ) to location (r, φ+ α).

2.3 Parameters a and b

We have two solutions for the parameters a and b ofEq. 1:

a = 0, b = 1,a = I(x) + I(T (x)), b = −1. (2)

A reflection T with mirror line orientation ϑ maps the

Fig. 2. Two lines representing the solutions of (1).

location (r, φ) to (r, 2ϑ − φ) and location (r, 2ϑ − φ) to(r, φ). Hence,

I(r, 2ϑ− φ) = a+ bI(r, φ),I(r, φ) = a+ bI(r, 2ϑ− φ).

Parameters (2) solve the above system. When T is arotation, the derivation of solutions for parameters a andb is not straightforward; therefore, we give it in the Ap-pendix. The line with parameters a = 0, b = 1 (see Fig. 2)represents regions where I(T (x)) = I(x). These regionshave symmetry (see Figs. 3(a) and 3(c) for examples).The line with parameters b = −1 and a = I(x) + I(T (x))represents regions where the sum of intensity values atcorresponding locations, that is, I(x) + I(T (x)), is thesame value for all corresponding pairs. In this case, wesay that region has anti-symmetry (see Figs. 3(b) and 3(d)for examples). Different values for a give different lines.The lines corresponding to symmetry and antisymmetryare orthogonal and intersect. A point that belongs toboth lines represents a region of constant intensity value.In this case, only one pair of values (I(x), I(T (x)) isdefined. A plotting I(T (x)) versus I(x) can define onlya point, not a line. Equation (1) is underdetermined.

3 QUANTIFYING REGION SELF-SIMILARITY

On real data, (1) can hardly be fulfilled for all points of P(see Figs. 3(e) and 4). Nevertheless, we can measure thestrength of the linear relationship (1) by the normalizedcorrelation coefficent:

Fig. 3. Examples of self-similar regions: (a) mirror sym-metry, ϑ = π

2 ; (b) mirror anti-symmetry, ϑ = 0; (c)rotational symmetry, α = π

4 ; (d) rotational anti-symmetry,α = π (only inside the grey circle); (e) symmetry in reality;

Fig. 4. Scatter plots of intensity values at correspondinglocations.

ncc(P, T ) =

∑i(I(xi)− I)(I(T (xi))− I)√(∑

i(I(xi)− I))2

)(∑i(I(T (xi))− I)2

)=

∑i(I(xi)− I)(I(T (xi))− I)∑

i(I(xi)− I)2

. (3)

Here, i counts all points of P and I represents theaverage intensity value of points of P . The value ncc = 1is obtain for symmetry, while value ncc = −1 for anti-symmetry. Since the sums of squares

∑i(I(xi)− I)2 and∑

i(I(T (xi)) − I)2 are identical, the normalized corre-lation coefficient computed by (3) is the standardizedregression coefficient [37]. It can be used for quantitativecomparison of self-similarities computed at different lo-cations in the image as well as self-similarities computedfor regions with different radii R, that is, for differentscale parameters of a multi-scale representation.

4 LOCATION SALIENCY

At a given location, the normalized correlation coef-ficients (3) can be computed for different mirror lineorientations or different angles of rotation. All giveinformation of region self-similarity. We propose to usethe average normalized correlation coefficient computedover all orientations of the mirror line at a given location(see Fig. 5), or over all angles of rotation as a measureof location saliency.

Computation of (3) for different mirror line orienta-tions or angles of rotation greatly simplifies by trans-forming local image P to polar coordinates. Transforma-tion of data from a Cartesian to polar coordinate system

Page 4: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

4

Fig. 5. Radial saliency for reflection.

introduces a new sampling of data. Let the samplingintervals for r and φ be ∆r = R

M and ∆φ = 2πN ,

respectively. A circular region of radius R is transformedto a rectangle P (M,N) with M × N values, namely,I(rm, φn), where rm = m∆r for m = 0, . . . ,M − 1 andφn = n∆φ for n = 0, . . . , N − 1 (see Fig. 7). The area of aregion belonging to a sampled point grows with r andis: r×∆r×∆φ. A transformation of P to P correspondsto weighting of points of P by a factor 1

r . This weightingacts as a low pass filter and it normalizes circumfer-ences to the same length, and hence, makes measurescomputed on circumferences comparable. It turns outthat the proposed saliency measures are represented bythe values computed on circumferences. We prefer usingterms circumferences and radii, when having in mindrows and columns of P .

4.1 Three Saliency Measures

4.1.1 Radial SaliencyFirst, we compute radial saliency for reflection. Oncethe transformation from Cartesian to polar coordi-nates is done, one needs to know how many mirrorline orientations are needed for computing the nor-malized correlation coefficient. It must be computedfor all possible different sets of corresponding cou-ples when the mapping T is a reflection. Correspond-ing couples can only be formed among points ly-ing on the same circumference; hence, intensity valueI(rm, φn) can form corresponding couples only with val-ues I(rm, φ0), I(rm, φ1), . . . , I(rm, φN−1). The symmetrycouples (I(rm, φn), I(rm, φi)) and (I(rm, φn), I(rn, φi+1))are formed for the mirror line orientations φn+φi

2 andφn+φi+1

2 , respectively. They are separated by an angle:

(φn + φi+1)− (φn + φi)2

=φi+1 − φi

2=

∆φ2.

The mirror line orientation ϑ + π gives the same sym-metry couples as ϑ and therefore does not give us anynew information. N samples on a circumference requireN orientations of the mirror line separated by an angle

∆ϑ = ∆φ2 = π

N . Let ϑi = i×∆ϑ. The set of the mirror lineorientations is: {ϑ0, ϑ1, ϑ2, . . . , ϑN−1}. The region radialsaliency when T is a reflection is computed as

Sr refl(P) =1N

N−1∑i=0

ncc(P, Tϑi)

=1N

1VP

(N−1∑i=0

M−1∑m=0

N−1∑n=0

I(rm, φn)I(rm, 2ϑi − φn)

− 1M

(M−1∑m=0

N−1∑n=0

I(rm, φn))2)

=1N

1VP

(M−1∑m=0

(N−1∑n=0

I(rm, φn))2

−(∑M−1m=0 (

∑N−1n=0 I(rm, φn)))2

M

). (4)

Here, VP denotes the total sum of squarescomputed for the intensity values of P , thatis, VP =

∑M−1m=0

∑N−1n=0 (I(rm, φn) − I)2 with

I =∑M−1

m=0

∑N−1

n=0I(rm,φn)

N×M . Let Cm =∑N−1n=0 I(rm, φn) be

the sum of intensity values on the circumference withradius rm and let C = 1

M

∑M−1m=0 Cm. Then,

Sr refl(P) =1N

1VP

M−1∑m=0

(Cm − C)2. (5)

In the case of rotation, the minimal step betweentwo angles of rotation is ∆α = 2π

N . Let αi = i × ∆α.The possible angles of rotation are: α0, α1, . . . , αN−1. Theregion radial saliency when T is a rotation is

Sr rot(P) =1N

N−1∑i=0

ncc(P, Tαi)

=1N

1VP

(N−1∑i=0

M−1∑m=0

N−1∑n=0

I(rm, φn + αi)I(rm, φn)

− 1M

(M−1∑m=0

N−1∑n=0

I(rm, φn))2)

=1N

1VP

(M−1∑m=0

(N−1∑n=0

I(rm, φn))2

Page 5: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

5

−(∑M−1m=0 (

∑N−1n=0 I(rm, φn)))2

M

)=

1N

1VP

M−1∑m=0

(Cm − C)2, (6)

which is the same equation as (5). The radial saliencycomputed when T is a reflection is equal to the radialsaliency when T is a rotation. From now on, we will useonly one term, i.e., radial saliency Sr representing bothvalues.

The triple sum of (6) can also be seen as a sum ofautocorrelations of discrete signals on circumferences.Similarly, the triple sum of (4) represents a sum ofcircular convolutions computed for the same discretesignals.

Let us now consider the circumferences as sets ofpoints with intensity values. It is known from statis-tics that the total sum of squares, in our case com-puted on the intensity values of P and denoted asVP , can be decomposed into between-sets sum ofsquares and within-sets sum of squares; in our caseinto between-circumferences sum of squares and within-circumferences sum of squares. The value of Sr is largest,equal to 1, when 1

N

∑M−1m=0 (Cm − C)2 or the between-

circumferences sum of squares is equal to VP ; hence,when all variations in P appear between circumfer-ences. Local regions with high values of the between-circumferences sum of squares are blob-like features (seeFig. 12(a)).

Among regions with high within-circumferences sumof squares, we can identify two special types of regions,which are explained in the continuation.

4.1.2 Tangential Saliency

The role of circumferences and radii can be interchanged.By analogy, the tangential saliency St is defined and isequal to the normalized between-radii sum of sqares:

St(P) =1M

1VP

N−1∑n=0

(Rn −R)2,

where Rn =∑M−1m=0 I(rm, φn) denotes the sum of values

on the nth radius and R = 1N

∑N−1n=0 Rn. Then, St has the

largest value when all changes in a local region appearbetween radii (see Fig. 12(b)).

4.1.3 Residual Saliency

The third saliency measure is obtained by the help of thedependent effects model, which we have borrowed fromstatistics, where it is used for analysis of variance [37].Table 1 shows a decomposition of P by the dependenteffects model (see also Fig. 6). In this model, there arethree kinds of effects: row effects αm, column effectsβn, and interaction effects γmn. Each element Imn =I(rm, φn) of P can be represented as

Imn = I + αm + βn + γmn.

Fig. 6. A decomposition of P by the dependent effectsmodel.

Here, I is the mean computed on the intensity values ofP , and αm and βn are deviations of the row and columnmeans from I :

αm =1NCm − I or αm =

1N

(Cm − C),

βn =1MRn − I or βn =

1M

(Rn −R).

The interaction effect is then:

γmn = Imn − (I + αm + βn).

The total sum of squares VP can be expressed as:

VP = Vr + Vt + Vres (7)

where

Vr = N

M−1∑m=0

α2m, Vt = M

N−1∑n=0

β2n, Vres =

N−1∑n=0

M−1∑m=0

γ2mn.

Dividing (7) by VP , we obtain

1 =VrVP

+VtVP

+VresVP

= Sr + St + Sres.

The third saliency measure, namely, residual saliencySres, can be computed by subtracting Sr and St from1:

Sres = 1− Sr − St.

Fig.7 summarizes the computation of saliency measures.

4.2 Interest PointsSaliency maps for Sr and St are computed over a rangeof different scales, that is for region with different radiiR (see Fig. 8). In a local region, there can be a mixtureof symmetries and antisymmetries. For example, pointsthat exhibit local scale-space maxima of Sr representregions with a locally maximum “amount of symmetry”,while points at local scale-space minima represent re-gions with a local peak of antisymmetry. Let maxlss()and minlss() denote local scale space maximum andminimum, respectively. Interest points are:

maxlss(Sr),maxlss(St),

maxlss(Sres) = maxlss(1− Sr − St),minlss(Sr) = maxlss(St + Sres) = maxlss(1− Sr),

Page 6: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

6

TABLE 1P as Dependent Effects Model

I00 = I + α0 + β0 + γ00 I01 = I + α0 + β1 + γ01 . . . I0(N−1) = I + α0 + βN−1 + γ0(N−1)1NC0 = I + α0

I10 = I + α1 + β0 + γ10 I11 = I + α1 + β1 + γ11 . . . I1(N−1) = I + α1 + βN−1 + γ1(N−1)1NC1 = I + α1

I20 = I + α2 + β0 + γ20 I21 = I + α2 + β1 + γ21 . . . I2(N−1) = I + α2 + βN−1 + γ2(N−1)1NC2 = I + α2

· · · · · · · · · · · · · · ·IM−1,0 = I + αM−1 + β0 + γM−1,0 . . . . . . IM−1,N−1 = I + αM−1 + βN−1 + γM−1,N−1

1NCM−1 = I + αM−1

1MR0 = I + β0

1MR1 = p + β1 . . . 1

MRM−1 = I + βM−1 I

Fig. 7. Computation of saliency measures.

minlss(St) = maxlss(Sr + Sres) = maxlss(1− St),minlss(Sres) = maxlss(St + Sr).

There are also other measures we can take advantageof, namely, that are Sr − St, Sres − St, and Sres − Sr.Additional sets of interest points are indicated by

maxlss(Sr − St),maxlss(St − Sr),

maxlss(Sres − St) = maxlss(1− Sr − 2St),maxlss(St − Sres) = maxlss(Sr + 2St − 1),

maxlss(Sres − Sr)) = maxlss(1− 2Sr − St),maxlss(Sr − Sres) = maxlss(2Sr + St − 1).

The interest points represent a rich set of local regionsthat can be used for image matching and object repre-sentation. Figs. 9 and 10 depict a few results.

4.3 Peculiarity of Location

The three saliency measures could also be used for thedescription of a local region. They represent a point on

a unit sphere:

Υ(x, y,R) =[√Sr,√St,√Sres

].

Fig. 11 shows Υ as an RGB colour image. Υ can identify

Fig. 12. Features. Each column shows examples oflocal regions and their polar representations belongingto features with the same properties. (a) All radii of Por all columns of P are equal; Υ = [1, 0, 0]. (b) Allcircumference of P or all rows of P are equal; Υ = [0, 1, 0].(c) All Rn of P are equal and all Cm of P are equal;Υ = [0, 0, 1].

features with different image properties corresponding

Page 7: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

7

Fig. 8. Saliency maps: (from left to right) original image, Sr, St, and Sres, for the scale parameter (from top to bottom)R = 7, R = 14, and R = 28. The image resolution is for the face image (228 × 150) and for the motorbike image(262× 201) pixels.

to red, green, and blue color in Fig. 11. Examples ofthree extreme situations are depicted in Fig. 12. Fig. 12(a)represents examples where Υ = [1, 0, 0] or Sr = 1,St = 0, and Sres = 0. Here, all changes occur betweencircumferences. For example, extrema of convex andconcave surfaces or blob-like features have large Sr andsmall St and Sres. Fig. 12(b) represents examples whereΥ = [0, 1, 0] or Sr = 0, St = 1, and Sres = 0. In theseexamples all changes occur between radii. Points onborders between regions of different intensity values andcorner-like features are examples of this kind of points.Fig. 12(c) represents examples where Υ = [0, 0, 1] orSr = 0, St = 0, and Sres = 1. In these examples, the sumsof values on radii are all equal and the same is true forthe sums of values on circumferences. Textured regionshave high Sres components. Differences between regionsas presented in Fig. 12(a), Fig. 12(b), and Fig. 12(c) donot affect the three saliency measures. Hence, we canexpect that Υ is robust to within-category variations. Υis invariant to the group of similarity transformationsand to photometric shift. The first property follows from

the fact that geometric transformations in the similaritygroup keep the sets of corresponding points on whichnormalized correlation coefficients are computed un-changed. The second property, invariance to photometricshift, can be easily seen from equations of the dependenteffects model (see Tabel 1). Effects αm, βn, and γmnare independent of the region average value. Υ itselfis not powerful enough to be a region descriptor. But, itcan easily be noticed that Cm and Rn are nothing otherthan the zero-th Fourier coefficients of the m-th row andn-th column of P . Higher Fourier coefficients used ina proper way could enrich Υ to become a descriptor.The proposed approach could also be used for otherpurposes, for example, as a compression technique. Wecan do quite a good image reconstruction by usingonly one type of effects at the interest points. Fig. 13shows an example of a reconstructed image. By usingFourier coefficients computed on different effects of thedependent effects model, high compression of imagescould be achieved.

A region of constant intensity value represents a

Page 8: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

8

Fig. 9. A set of local regions corresponding to local scale-space maxima of Sr obtained on 100 images from theCaltech Motorbikes image set with a clear background. The maxima were searched only among the 60 highest localmaxima on each image. The maxima were obtained for approximate image resolution (130×80) pixels on a scale fromR = 5 to R = 29.

Fig. 10. Regions belonging to the highest 30 local scale-space maxima of Sr obtained for scales between R = 5 andR = 40. Image resolution is approximately 140 × 150 pixels. Image shows all women from the Caltech Human-Facesimage set.

Fig. 11. Υ as an RGB colour image. Red color component corresponds to Sr, green to St, and blue to Sres. (From leftto right) Original image (436× 228 pixels); Υ for the scale parameter R = 7, R = 15, R = 30, and R = 48.

degenerate case. We have assumed that Υ = [0, 0, 0]corresponds to a region of constant intensity value.

5 IMPLEMENTATION DETAILS

The algorithm for interest points detection consists oftwo main parts:

1) the saliency maps Sr and St are first computed forthe complete scale-space;

2) scale-space local extrema are determined for eachof the saliency measures.

5.1 Computation of Sr and StDespite the fact that the proposed approach is developedin polar coordinates, we do not need to transform localregions to polar coordinates to compute Sr and St.

Instead, a look up table of pixels locations is preparedfor a local region mask in polar coordinates. The saliencymeasures are computed by using the following equationsfor sums of squares:

M−1∑m=0

(Cm − C)2 =

M−1∑m=0

C2m −(∑M−1

m=0Cm)2

M,

N−1∑n=0

(Rn −R)2 =

N−1∑n=0

R2n −

(∑N−1

n=0Rn)2

N,

M−1∑m=0

N−1∑n=0

(Imn − I)2 =

M−1∑m=0

N−1∑n=0

I2mn −

(∑M−1

m=0

∑N−1

n=0Imn)2

MN.

Page 9: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

9

Fig. 13. Image reconstruction by using the patch mean and only one type of effects of the dependent effects model atthe interest points in an image. Different effects are here represented on discs as shown in Fig. 6. The reconstructionsare obtained by drawing smaller discs over the larger discs: (from left to right) original image 186× 219 pixels; maximaof Sr−St (259 locations); reconstructed image obtained by drawing only row effects at the maxima of Sr−St; maximaof St (228 locations); reconstructed image obtained by drawing only column effects at the maxima of St.

Here, Imn = I(rm, φn). Notice that

(M−1∑m=0

Cm)2 = (N−1∑n=0

Rn)2 = (M−1∑m=0

N−1∑n=0

Imn)2.

For a given maximal scale M , the algorithm producesM − 1 saliency maps for Sr and St. For each pixel loca-tion, the saliency measures are computed through scalein an iterative way where saliency measures at scale i arecomputed from saliency measures (or their parts) at scalei−1 by adding points of the ith circumference. Hence, ateach scale i, the sums (

∑i−1m=0 Cm)2,

∑i−1m=0 C2

m,∑N−1n=0 R2

n,and

∑i−1m=0

∑N−1n=0 I

2mn need to be updated. The most time

consuming is the computation of∑N−1n=0 R2

n. It requiresall radii at each scale to be updated. For the nth radiusat the ith scale we have:

Rn(i) = Rn(i− 1) + Iin;R2n(i).

Hence, it takes MN multiplications to compute∑N−1n=0 R2

n on the complete scale space. Here, M repre-sents the number of circumferences and N the numberof radii. By adapting the value N to M (N = 2πM ), timecomplexity O(kM2) is achieved with k representing thenumber of pixels in an image. Compared to Reisfeld’sgeneralized symmetry transform [31], our algorithm com-putes two complete scale spaces ( 2M saliency maps) forthe same time complexity. It is always possible to speedup the computation by not using all information. Onepossibility is to use log-polar representation of a localregion. The other is to do computations at lower imageresolution. In this work, we experiment only with alower image resolution. We have to think about a parallelapproach and special hardware for image processing,as well. This would make it possible to compute theproposed saliency measures in only a few steps.

To use all information available in an image, saliencymeasures should be computed at subpixel accuracy,e.g., at the pixel center, pixel corners, and mid-pointsof all four pixel edges. Because this would require alot of computation, the algorithm computes saliencymeasures only at the pixel central locations starting

with C0 = N × Icentral pixel and Rn(0) = Icentral pixel,n = 0, . . . , N − 1. A slight smoothing of saliency maps isthen applied by replacing the saliency value at a pixellocation with the average saliency value computed onthe nine neighbouring pixels. This kind of averagingreduces the effects of noise due to discretisation error.

5.2 Computation of Local ExtremaLocal maxima and minima for each of the saliencymeasure are obtained by comparing each pixel valueto its eight neighbours in the current saliency map andnine neighbours in the scale level above and below. Asproposed in [22], a point is selected only if it has anextreme value compared to all of these neighbours.

6 EXPERIMENTAL RESULTS

In experiments reported here, we have followed two pro-tocols for evaluation of interest points detectors. Twelveself-similarity detectors were tested, namely, local max-ima of Sr, St, and Sres, local minima of Sr, St, and Sres,and local maxima of Sr−St, St−Sr, Sres−St, St−Sres,Sres−Sr, and Sr−Sres. Names of detectors are shortenedwhen they are computed on a difference of saliencymaps. For example, a difference Sr − St is denoted asSr−t. In Table 2, local minima detectors are labelled withthe letter a (antisymmetry) in front of their names. HS-A,HR-A, ScalSal, MSHar, and DoG stand for the Hessian-Affine detector ([27]), the Harris-Affine detector ([38],[27]), the similarity Saliency detector [14], the multi-scaleHarris detector ([27] without the affine adaptation), andthe Difference-Of-Gaussian blob detector [22], respec-tively. Due to readability of the graphs, only the resultsfor four self-similarity detectors are shown in graphs (seeFigs. 14, 19-24, 26, and 27).

6.1 Intraclass Variations and Image PerturbationsThe aim here is to measure the performance of self-similarity detectors under intra-class variations andimage perturbations in the same way as proposedin [15]. The detectors were tested on 450 imagesfrom the Caltech Human-Faces set, 200 images from

Page 10: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

10

(a) (b) (c) (d)

Fig. 14. (a) Repeatability results for the Caltech Human-Faces images. (b) Repeatability results for the Caltech Carsimages. (c) Repeatability results for the Caltech Motorbikes images with a clear background. (d) Repeatability resultsfor the Caltech Motorbikes images with background clutter.

Fig. 15. The 60 best local maxima of Sr, obtained for the same person on two different backgrounds. Due to theregular structure in the background area in the image on the right, many of the best local maxima belong to thebackground area, and hence, fewer matches are found on the face.

the Caltech Motorbikes set1 (84 images have clearbackground and 116 images have clutter in thebackground), and all 126 Caltech Cars (Rear 2) images.Affine transformations between objects of interest inimages were estimated by using the ground-truthlocations given at http://www.robots.ox.ac.uk/∼-timork/Saliency/AffineInvariantSaliency.html. Testswere performed on images with lower imageresolution2. Scores were computed on a scale fromR = 2 to R = 24 of downsampled images. We considerthat a region matches if it fulfils the three requirementsrecommended in [15] with the following conditions: itsposition matches within 10 pixels of the original imagesize; its scale is within 20% and the normalizedmutual information between the appearances(MI(A,B) = 2(H(A)+H(B)−H(A,B))/(H(A)+H(B)))is greater than 0.2. The average correspondence scoreS proposed in [15] is measured as follows. The Nbest regions are detected in each of the M imagesin the dataset. For a particular reference image i, thecorrespondence score Si is given by the proportion

1. This image set is a part of the Caltech-101 object categoriesdatabase.

2. Images were down sampled by replacing pixel value at location(i, j) with 1

4(I(i, j) + 1

2((I(i− 1, j) + I(i + 1, j) + I(i, j− 1) + I(i, j +

1))+ 14(I(i−1, j−1)+I(i+1, j−1)+I(i+1, j +1)+I(i−1, j +1)))

and taking only pixels with even i and j. Images of the Faces and Carssets were down sampled two times. Images of the Motorbikes set weredown sampled only once.

of correspondences with detected regions for all otherimages in the dataset, i.e.,

Si =Total number of matches

Total number of detected regions=

N iM

N(M − 1).

Si is computed for M2 different selections and averaged.

Fig. 14 shows the results obtained.On the Human Faces images, the best results

(Fig. 14(a)) are obtained for the local scale-space maximaof Sr and Sr − St. The local scale-space maxima of Stand Sres give low scores. The local scale-space maximaof St represent features at the borders between regionsof different intensity and corner-like features. The localscale-space maxima can trace along straight borders.When this happens on the background, they do nothave corresponding matches in other images. Corner-like features are points with high curvature on regionand object borders, so we can expect larger variationsin their position and appearance due to background,compared to region centers. The best local scale-spacemaxima of Sres are for the Faces images often featuresof the background area. The low average correspondencescores obtained do not mean that there are no local scale-space maxima in the area of the faces or that they are notrepeatable there. They are not only among the highest60 local scale-space maxima. The same can be true forother features too. The results obtained depend on thebackground area. Fig. 15 shows an example of this. We

Page 11: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

11

Fig. 16. The 60 best local scale-space maxima of Sr − St on images from the Caltech-Cars image set. Locations ofthe same car parts can vary for more than 10 pixels or 1.1% of the picture lenght. For example, discs with numberplates are not identified as corresponding regions.

Fig. 17. Circles represent 40 regions of the highest local maxima of (from left to right) Sr, St, and Sres obtained onone of the Motorbikes images with a clear background and one with clutter in the background.

notice that the highest local scale-space maxima of Sr canalso be found in the background when there is a regularstructure. In such cases, fewer matches are found on thearea of face.

Fig. 14(b)) shows the results obtained on the Caltech-Cars image set. The similarity Saliency region detectorobtained for this image set the best scores. The loweraverage correspondence scores obtained for the Caltech-Cars image set are due to larger variability of car parts.Locations of parts like number plates, exhausts, frontmirrors, back bumpers, and car lights vary and cannotbe transforemed from one image to another by an affinetransformation (see Fig. 16).

The results obtained on the Motorbikes image set(Figs. 14(c) and 14(d)) are better than the results obtainedon the Cars and Faces image sets. This is partly due tolower image resolution of the Motorbike images, mean-ing that position matches within 10 pixels can more eas-ily be accomplished. The performance of the local scale-space maxima of St and Sres is here different than on theFaces and Cars image sets. To understand the obtainedresults better, we show in Fig. 17 two examples of imageswith marked local regions obtained for the local scale-space maxima of Sr, St, and Sres. One image is from theset of images with a clear background. The other is fromthe set of images with clutter in the background. Thelocal scale-space maxima of Sres represent large regions.In the case of the clear background, they appear mostly

on the background and represent white regions withonly a few dark points on the border. JPEG compressionmoves circles away from the motorbike border. Theseregions are similar from image to image and are highlyrepeatable. In the case of background clutter, the highestlocal scale-space maxima of Sres can be found in the areaof the motorbikes. They are repeatable and give highaverage correspondence scores. The border between themotorbike and the clear background represents locationswith high St. Most of the local scale-space maxima of Stare obtained on that border and fewer inside the area ofthe motorbike, which makes features highly repeatable.On images with clutter, St for locations on the borderis reduced due to clutter (see also Fig 8). The localscale-space maxima of St can now be higher on thebackground, for example, on the border of the shadowon the floor. These locations are not repeatable; therefore,the local scale-space maxima of St obtained on imageswith clutter give lower average correspondence scores.There are also other reasons that make the results onimages with clutter wors: different lighting conditions aspictures were taken outside, occlusions by other objects,different viewing directions, and different orientationsof some parts of the motorbike, such as the front wheel.We also feel that the images with clutter exhibit greatervariability among the different motorbike images. On theimage set with clear background, the local maxima ofSres obtained the best scores. The scores obtained by

Page 12: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

12

the local scale-space maxima of Sr − St are also high.On the image set with background clutter, the similaritySaliency region detector obtained the best results.

6.2 Varying Blur, JPEG Artifacts, Lighting Change,In-Plane Orientation, Scale, and Viewpoint Change

The aim of the experiments reported here is to test theself-similarity detectors in accordance with the proto-col suggested in [26]. Detectors are compared on thebasis of the number of corresponding regions detectedin images under different geometric and photometrictransformations. Fig. 18 shows images from the test sets.The reference image is the left-most image of each set.First, accuracy and repeatability of the detection are

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 18. Examples from the image data set. (a) and(b) Image blur. (c) JPEG compression. (d) Lightingchange. (e) and (f) Zoom+rotation. (g) and (h) Viewpointchange. In the experimental comparisons, the left mostimage of each set is used as the reference image. Onlythree (the first, third, and sixth) of the six images areshown for each type of transformation.

tested. The ground truth in all cases is provided bymapping the local regions detected on the test image to a

reference image using homographies. The basic measureof accuracy is the relative amount of overlap betweencorresponding regions. Results reported in Figs. 19-27are obtained for 40% overlap error, while results aregiven in Table 2 for 20% overlap error. Two measuresare reported: the repeatability score and the absolutenumber of corresponding regions found in the test im-age. The repeatability score for a given pair of images iscomputed as the ratio between the number of region-to-region correspondences and the smaller of the numberof regions obtained in the pair of images. Second, thedistinctiveness of the detected region is tested. Detectedregions are represented by the SIFT descriptor [22].Two measures are reported: the matching score and thenumber of correct matches. The matching score is theratio between the number of correct matches and thesmaller number of detected regions in the pair of images.A match is the nearest neighbour in the descriptor space(euclidian distance). A match is deemed correct if theoverlap error of corresponding regions is less than 40%.

In all experiments reported here, interest points werecomputed on images with lower image resolution. Nineneighbouring pixels of the original image were av-eraged and represented as one pixel. All local ex-trema for scales from R = 4 to R = 24 (fromR = 12 to R = 69 for the original image resolu-tion) were taken into account and no threshold wasused. The obtained coordinates and radii of the de-tected regions were then transformed to the originalimage sizes, and all further computations were doneby using the original homographies and software fromhttp://www.robots.ox.ac.uk/∼vgg/research/affine.

6.2.1 Blur

Figs. 19 and 20 show the results obtained for the struc-tured and textured scene, both undergoing increasingamounts of image blur (see Figs. 18(a) and 18(b)). Thebest results are obtained for the local scale-space maximaof Sr − St for both scene types. We see two reasons forthis. The first reason can be clarified by Fig. 12(a). Thetopmost example in Fig. 12(a) is represented with thesame values of Sr, St, and Sres as a black disc witha white ring around it. Hence, we can expect that thesaliency measures of blob-like features are not muchaffected by blur. The second reason is more general andcan also be valid for other types of image transforma-tions. Let transformation T , applied on an image, changethe three saliency measures Sr, St, and Sres at particularlocation for ∆r, ∆t, and ∆res, respectively. The value ofSr−St changes for ∆r−∆t. When the changes ∆r and ∆t

are similar, the difference ∆r−∆t is small, and hence, thevalue of Sr − St is less affected by T than the values ofSr, St and Sres. This, of course, is not true for all pointsin an image and the difference ∆r−∆t can also be largerthan ∆r, ∆t and ∆res. But high scores obtained for themaxima of Sr − St suggest that, at the local scale-spacemaxima of Sr − St, this might be the case.

Page 13: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

13

(a) (b) (c) (d)

Fig. 19. Blur for the structured scene (Bikes sequence Fig. 18(a)). (a) Repeatability score for blur change. (b) Numberof corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

(a) (b) (c) (d)

Fig. 20. Blur for the textured scene (Trees sequence Fig. 18(b)).(a) Repeatability score for blur change. (b) Numberof corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

(a) (b) (c) (d)

Fig. 21. JPEG compression (UBC sequence Fig. 18(c)). (a) Repeatability score for different JPEG compression. (b)Number of corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

The numbers of corresponding regions detected on thestructured scene are lower than they are on the texturedscene for most detectors. Exceptions are the local scale-space maxima of St. The matching scores obtained byself-similarity detectors are high compared to the match-ing scores obtained by the Hessian-Affine, Harris-Affine,and MSER detector. For 20% overlape error (Table 2) thelocal scale-space maxima of Sres − St obtained the bestaverage repeatability score on the structured scene.

6.2.2 JPEG Artifacts

Fig. 21 shows the repeatability scores for the JPEG com-pression sequence from Fig. 18(c). The Hessian-Affineand Harris-Affine detector show here the best perfor-mance. The local scale-space maxima of Sr − St gavethe highest repeatability scores among self-similarity

detectors.

6.2.3 Lighting Changes (Fig. 18(d))

Results for lighting changes are shown in Fig. 22. Thebest repeatability scores are obtained by the MSER de-tector. The local scale-space maxima of Sr − St and Srgive slightly lower repeatability scores. But the numberof correspondences obtained by the local scale-spacemaxima of Sr − St are approximately three times largerthan the number of correspondences obtained by theMSER detector. The matching scores of self-similaritydetectors and the MSER detector are high compared tothe matching scores obtained by the Hessian-Affine andHarris-Affine detector.

Page 14: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

14

(a) (b) (c) (d)

Fig. 22. Illumination change (Leuven sequence Fig. 18(d)). (a) Repeatability score for different illumination. (b) Numberof corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

(a) (b) (c) (d)

Fig. 23. Scale change for the structured scene (Boat sequence Fig. 18(e)). (a) Repeatability score for scale change.(b)Number of corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

(a) (b) (c) (d)

Fig. 24. Scale change for the textured scene (Bark sequence Fig. 18(f)). (a) Repeatability score for scale change. (b)Number of corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

6.2.4 Scale Change and In-Plane RotationResults for scale changes and in-plane rotations areshown in Figs. 23 and 24. Features in all test imagesare detected for fixed scale range. Because features getsmaller from image to image, the relative scale range isdifferent for each test image. This makes results difficultto interpret. On the structured scene, the local scale-space maxima of Sres give the best repeatability scores(Fig. 23). The local scale-space maxima of Sres havelarge percentage of regions with radii of middle size(see Fig. 28), and hence, fewer regions are lost due tothe fixed scale range. On the textured scene, the bestrepeatability scores are obtained by the Hessian-Affinedetector. Notice that the number of correspondences issmall. For 20% overlap error (Table 2) the best averagerepeatability score is obtained by the local scale-space

maxima of Sr.

6.2.5 Viewpoint ChangeLocal regions obtained by our detectors are discs(Fig. 25), hence, they are not adapted to change inviewing angle. Our intention here has been to see howfar we can go with the proposed approach. Results forthe structured scene (Fig 18(g)) are shown on Fig. 26. TheMSER detector gives here the best results. Still, for 20◦

viewing angle change, the local scale-space maxima ofSr − St obtained slightly higher repeatability score thanthe MSER detector. For a 30◦ viewpoint angle change, therepeatability scores of self-similarity detectors are higherthan the repeatability score obtained by the Harris-Affine detector. For a 50◦ viewpoint angle change, nocorrespondence is detected that fulfils the default setting,

Page 15: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

15

Fig. 25. The 120 highest maxima of Sr−St obtained on the first and the third image of the Graffiti sequence (Fig. 18(g)).

(a) (b) (c) (d)

Fig. 26. Viewpoint change for the structured scene (Graffiti sequence Fig. 18(g)). (a) Repeatability score for viewpointchange. (b) Number of corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

(a) (b) (c) (d)

Fig. 27. Viewpoint change for the textured scene (Wall sequence Fig. 18(h)). (a) Repeatability score for viewpointchange. (b) Number of corresponding regions. (c) Matching score. (d) Number of correct nearest matches.

i.e., 40% overlap error. On the textured scene (Fig. 18(h)),self-similarity detectors have obtained surprisingly goodresults (see Fig. 27). The local scale-space maxima ofSr − St, give the best repeatability scores up to 50◦.

6.2.6 Region Size

Figure 28 shows histograms of region size for differentself-similarity detectors. The regions belonging to thelocal scale-space maxima of Sres tend to have larger radiithan the local scale-space maxima of Sr−St, Sr, and St.We can also notice that the local scale-space maxima ofSr − St have larger percentage of regions with radii ofmiddle size than the local scale-space maxima of Sr andSt.

7 CONCLUSION

In this paper we have presented a new approach todetecting interest points in images. The total sum ofsquares computed on intensity values of a local region isdivided into three components: between circumferencessum of squares, between radii sum of squares, and theremainder. The three components normalized by thetotal sum of squares determine three new region saliencymeasures and are computed at different scales. Herescale corresponds to the radius of a local region. Localextrema in scale-space are located for each of the saliencymeasures. The extrema are features with complementaryimage properties.

The performance of the new approach was demon-

Page 16: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

16

TABLE 2Average Repeatability Obtained for 20% Overlap Error.

Images Sr aSr St aSt Sres aSres Sr−t St−r Sres−t St−res Sres−r Sr−res HS-A HR-A MSERBikes 54.5 41.7 46.3 49.3 51.1 43.1 54.8 42.1 56.3 47.0 48.8 44.8 48.2 32.8 33.6Trees 31.0 25.8 26.1 28.4 29.9 25.4 33.7 24.4 29.3 27.1 28.3 29.0 20.0 9.8 11.5UBC 58.0 50.4 55.7 58.2 57.5 55.4 57.1 50.8 57.7 59.0 54.2 49.7 73.7 69.0 33.0

Leuven 50.2 38.9 45.3 48.3 48.0 46.6 48.6 39.7 49.7 46.6 44.4 46.8 40.0 32.0 66.7Boats 28.5 23.2 23.6 28.6 28.2 21.3 25.2 19.9 29.0 24.2 25.4 27.7 29.7 22.3 27.5Bark 18.8 15.2 13.0 17.1 15.8 13.5 16.3 12.9 17.5 13.7 17.3 15.3 11.8 10.6 9.8

Graffiti 8.9 7.6 8.1 9.7 9.2 7.9 9.7 7.3 9.2 8.2 8.5 8.6 17.7 13 51.7Walls 20.7 18.3 15.4 22.8 22.0 14.4 21.3 14.8 22.6 16.1 20.5 18.5 24.5 17.3 31.4∑

270.6 221.1 233.5 262.4 261.7 227.6 266.7 211.9 271.3 241.9 247.4 240.4 265.6 206.8 265.2

Fig. 28. Histograms of region radius for different detectors for the reference image of Fig. 18(g). Note that the y axesdo not have the same scale in all cases.

strated on a wide variety of image sets. The obtainedresults compare favourably with the results obtained bythe leading interest point detectors from the literature.The proposed approach gives a rich set of highly distinc-tive local regions that can be used for object recognitionand image matching.

Our future research goal is to extend the proposedapproach to a local region descriptor by using higherFourier coefficients computed on different effects of thedependent effects model. Next, we would like to findcriteria for joining local regions into larger structures andthen simplify these structures into primitives that can beused for object representation and recognition.

A deficiency of the proposed method is that it doesnot include an affine adaptation of local regions. Hence,the proposed method is not appropriate for tasks likewide based stereo. However, this is not a fundamentallimitation of the method. Circumferences can be replacedwith ellipses and this is one of the possible directions forthe future work.

APPENDIX

Rotation T with α = 2πn maps locations (r, iα + φ) to

(r, (i + 1)α + φ); i = 0, 1, 2, . . . , n − 1. Because I(r, (i +1)α + φ) = a + bI(r, iα + φ) and (r, nα + φ) = (r, φ) wecan express the relation between I(r, α + φ) and I(r, φ)in two different ways:

I(r, α+ φ) = a+ bI(r, φ)

I(r, φ) =n−2∑i=0

abi + bn−1I(r, α+ φ). (8)

By replacing variable I(r, φ) in the first line of (8) with∑n−2i=0 ab

i+bn−1I(r, α+φ) and in the second line I(r, α+

φ) with a+ bI(r, φ) we obtain:

I(r, α+ φ) =n−1∑i=0

abi + bnI(r, α+ φ)

I(r, φ) =n−1∑i=0

abi + bnI(r, φ). (9)

Let us now subtract the second equation of (9) from thefirst:

I(r, α+ φ)− I(r, φ) = bn(I(r, α+ φ)− I(r, φ)). (10)

From here, bn = 1 and∑n−1i=0 ab

i = 0. For even n, twosolutions for b are b = 1 and b = −1. Solution b = 1gives

∑n−1i=0 ab

i = na and from here a = 0 while solutionb = −1 gives

∑n−1i=0 ab

i = 0. From the first line of (8) itfollows that a = I(r, φ)+I(r, α+φ). For odd n, parameterb can only have one value, that is b = 1. In this case∑n−1i=0 ab

i = na and hence a = 0.

ACKNOWLEDGMENT

The author would like to thank Richard I. Hartley, JirıMatas, and Jan Sochman for reading the paper draft andgiving valuable comments, and Michal Perd’och, DusanOmercevic, and Luka Cehovin for help with softwareand experimental work. This research work has beensupported by the Research program Computer VisionP2-0214 (RS).

REFERENCES[1] H. Blum, “Biological shape and visual science I,” J. Theoret. Biol.,

vol 38, pp. 205-287, 1973.[2] M. Brady, “Criteria for representations of shape, Human and Ma-

chine Vision, Beck and Rosenfeld (eds.): Academic Press, 1983.[3] V. Di Gesu and C.Valenti, “The Discrete Symmetry Transform in

Computer Vision,” Technical Report DMA 01195, Palermo Univ.,1995.

Page 17: Self-Similarity and Points of Interestoddelki.ff.uni-lj.si/biblio/oddelek/osebje/dokumenti/Maver10_Self... · Self-Similarity and Points of Interest Jasna Maver Abstract—In this

17

[4] V. Di Gesu and C. Valenti, “Symmetry Operators in ComputerVision,” Proc. First CCMA Workshop Vision Modelling and InformationCoding, Oct. 1995.

[5] A. Chella, V. Di Gesu, I. Infantino, D. Intravaia, and C.Valenti, “ACooperating Strategy for Object Recognition,” Shape, Contour, andGrouping in Computer Vision, pp. 264-276,1999.

[6] V. Di Gesu and R. Palenichka, “A Fast Recursive Algorithm toCompute Local Axial Moments,” Signal Processing, vol.81, pp. 265-273, 2001.

[7] H. Cornelius and G. Loy, “Detecting Rotational Symmetry UnderAffine Projection,” Proc. ICPR, II, pp. 292-295, 2006.

[8] H. Cornelius, M. Perdoch, J. Matas and G. Loy, “Efficient SymmetryDetection Using Local Affine Frames,” Proc. SCIA, pp. 152-161,2007.

[9] P. J. Giblin and S. A. Brassett, “Local symmetry of Plane Curves,”The American Mathematical Monthly, vol. 92, no. 10, pp. 689-707,1985.

[10] H. Deng, W. Zhang, E. Mortensen, T. Dietterich, and L. Shapiro,“Principal Curvature-Based Region Detector for Object Recogni-tion,” Proc. of IEEE Conference on Computer Vision and PatternRecognition, 2007.

[11] R. O. Duda and P. E. Hart, “Use of the Hough Transform to DetectLines and Curves in Pictures,” Comm. ACM, vol. 15, no. 1, pp. 11-15, Jan 1972. 871-887, 2000.

[12] C. Harris and M. Stephens, “A combined corner and edge detec-tor,” Proc. of the 4th Alvey Vision Conference, pp. 147-151, 1988.

[13] B. Johansson and G. Granlund, “Fast Selective Detection of Ro-tational Symmetries using Normalized Inhibition,” Proc. ECCV, I,pp. 871-887, 2000.

[14] T. Kadir and M. Brady, “Saliency, Scale and Image Description,”International Journal of Computer Vision, vol. 45, no. 2, pp. 83-105,2001.

[15] T. Kadir, A. Zisserman, and M. Brady, “An affine invariant salientregion detector,” Proc. Eighth European Conference on ComputerVision, pp. 345-457, 2004.

[16] C. Kimm, D. Ballard, and J. Sklansky, “Finding Circles by an Arrayof Accumulators,” Comm. ACM, vol. 18, no. 2, pp. 120-122, 1997.

[17] P. Kovesi, “Symmetry and Asymmetry from local Phase,” Proc.Thenth Australian Joint Conference on Artificial Intelligence, pp. 185-190, 1997.

[18] P. Kovesi, “Image Features from Phase Congruency,” Videre: Jour-nal of Computer Vision Research, vol. 1, no. 3, 1999.

[19] S. Lee, R. T. Collins, and Y. Liu, “Rotation Symmetry GroupDetection Via Frequency Analysis of Frieze-Expansions,” Proc. ofIEEE Conference on Computer Vision and Pattern Recognition, 2008.

[20] C.-C. Lin and W.-C. Lin, “Extracting Facial Features by an In-hibitory Mechanism Based on Gradient Distributions,” PatternRecognition, vol. 29, no. 12, pp. 2079-2101, 1996.

[21] T. Lindeberg, “Feature Detection with Automatic Scale Selection,”International Journal of Computer Vision, vol. 30, no. 2, pp. 77-116,1998

[22] D. G. Lowe, “Distinctive Image Features from Scale-InvariantKeypoints,” International Journal of Computer Vision, vol. 60, no. 2,pp. 91-110, 2004

[23] G. Loy and J. O. Eklundh, “Detecting Symmetry and SymmetricConstellations of features,” Proc. ECCV, Part II, pp. 508-521,2006.

[24] G. Loy and A. Zelinsky, “Fast Radial Symmetry for DetectingPoints of Interest,” IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, vol. 25, no. 8, pp. 959-973, 2003.

[25] J, Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Proc. ofthe british Machine Vision Conference, pp.384-393, 2002.

[26] K. Mikolajczyk, T.Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F.Schaffalitzky, T. Kadir, and L. Van Gool, “A Comparison of AffineRegion Detectors,” International Journal of Computer Vision, vol. 65,pp. 43-72, 2005.

[27] K. Mikolajczyk and C. Schmid, “An affine invariant interest pointdetector,” Proc. of the 7th European Conference on Computer Vision,pp., 2002.

[28] L. G. Minor and J. Sklansky, “Detection and Segmentation of Blobsin Infrared Images,” IEEE Trans. Systems, man, and Cybernetics,vol. 11, no. 3, pp. 194-201, 1981.

[29] H. Morovec, “Obstacle Avoidance and Navigation in the RealWorld by a Seeing Robot Rover,” Tech. Report CMU-RI-TR-3,Carnegie-Mellon University, Robotic Institute, 1980.

[30] R. M. Palenichka, M. B. Zaremba, and C. Valeti, “A Fast RecursiveAlgorithm for the Computation of Axial Moments,” Proc. 11th Int’lConf. Image Analysis and Processing, pp. 95-100, 2001.

[31] D. Reisfeld, H. Wolfson, and Y.Yeshurun, “Context Free Atten-tional operators: the Generalized Symmetry Transform,” Int’l J.Computer Vision, special issue on qualitative vision, vol. 14, pp 119-130, 1995.

[32] T. Riklin-Raviv, N. Kiryati, and N. Sochen, “Segmentation byLevel Sets and Symmetry,” Proc. CVPR, pp. 1015-1022, 2006.

[33] G. Sela and M. D. Levine, “Real-Time Attention for RoboticVision,” Real-Time Imaging, vol. 3, pp. 173-194, 1997.

[34] J. Shi and C. Tomasi, “Good Feature to Track,” Proc. CVPR, pp.593-600, 1994.

[35] C. Schmid and R. Mohr, “Local gray value invariants for imageretrieval,” IEEE Transactions on Pattern Analysis and Machine Intelli-gence, vol. 19, no. 5 pp. 530535, 1997.

[36] S. M. Smith and J. M. Brady, “SUSAN - a new approach to lowlevel image processing,” International Journal of Computer Vision,vol. 23, no. 1, pp. 45-78, 1997

[37] M. Smithson, Statistics with Confidence, SAGE Publications Lon-don, Thousand Oaks, New Delhi, 2000.

[38] F. Schaffalitzky and A. Zisserman,“ Multi-view matching forunordered image sets, or How do I organize my holiday snaps?,”Proc. ECCV, pp. 414431, 2002.

[39] F. Schaffalitzky and A. Zisserman, “Automated Location Match-ing in Movies,” Computer Vision and Image Understanding,vol. 92 no. 2 pp. 236264, 2003.

[40] Q. B. Sun, W. M. Huang, and J. K. Wu, “Face Detection Basedon Color and Local Symmetry Information,” Proc. Third int’l Conf.Face and Gesture Recognition, pp. 130-135, 1998.

[41] M. Trojkovic and M. Hedley, “Fast corner detection,” Image andVision Computing, vol. 16, pp. 75-87, 1998.

[42] T. Tuytelaars and L. Van Gool, “Content-based image retrievalbased on local affinely invariant regions,” Int. Conf. on VisualInformation Systems, pp. 493-500, 1999.

[43] T. Tuytelaars and L. Van Gool, “Wide baseline stereo matchingbased on local, affinely invariant regions,” Proc. Eleventh BritishMachine Vision Conference, pp. 412-425, 2000.

Jasna Maver received her PhD degree in com-puter science from the University of Ljubljana in1995. From 1990 to 1991, she was with the De-partment of Computer and Information Science,GRASP Laboratory, University of Pennsylvania,working on the problem of next view planning.She is an Associate Professor of computer andinformation science at the University of Ljubl-jana. She holds the position at the Departmentof Library and Information Science and BookStudies of the Faculty of Arts. She is also a

research associate of the Computer Vision Laboratory of the Facultyof Computer and Information Science. Her research interests includemethods for learning and objects recognition.


Recommended