+ All Categories
Home > Documents > hier - Brown University

hier - Brown University

Date post: 03-Feb-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
8
Hierarchical Matching of Deformable Shapes Pedro F. Felzenszwalb University of Chicago [email protected] Joshua D. Schwartz University of Chicago [email protected] Abstract We describe a new hierarchical representation for two- dimensional objects that captures shape information at mul- tiple levels of resolution. This representation is based on a hierarchical description of an object’s boundary and can be used in an elastic matching framework, both for compar- ing pairs of objects and for detecting objects in cluttered images. In contrast to classical elastic models, our repre- sentation explicitly captures global shape information. This leads to richer geometric models and more accurate recog- nition results. Our experiments demonstrate classification results that are significantly better than the current state- of-the-art in several shape datasets. We also show initial experiments in matching shapes to cluttered images. 1. Introduction Humans can often recognize objects using shape infor- mation alone. This has proven to be a challenging task for computer vision systems. One of the main difficulties is in developing representations that can effectively capture im- portant shape variations. We want to be able to compare different objects, and to detect objects in cluttered images. The computational complexity of these tasks and the recog- nition accuracy obtained are highly dependent on the choice of a shape representation. This paper describes an approach for matching shapes based on a hierarchical description of their boundaries. This approach can be used both for determining the similarity between two shapes and for matching a deformable shape model to a cluttered image. By using a hierarchical model, we are able to develop simple elastic matching algorithms that can take global geometric information into account. Our matching algorithms are based on a compositional procedure. We combine matchings between adjacent seg- ments on two curves to form matchings between longer seg- ments. This approach makes it possible to consider the ge- ometric arrangement among the endpoints of the matchings being combined. For long matchings, the endpoints are far away, which means that our measure of deformation cap- Figure 1. The composition of matchings between adjacent seg- ments on two curves to form a matching between longer segments. tures global geometric properties. Figure 1 illustrates the procedure, where we combine a matching from A 1 to B 1 with a matching from A 2 to B 2 to obtain a longer matching between two curves. The quality of the combination de- pends on both the quality of the matchings being combined and the similarity between the geometric arrangements of points (p 1 ,p 2 ,p 3 ) and (q 1 ,q 2 ,q 3 ). We have tested the hierarchical representation and com- positional matching procedure in a variety of situations and obtained excellent performance. In classification tasks, we obtain better recognition results than other methods on sev- eral shape datasets, including the MPEG-7 shape dataset [15], a Swedish leaf dataset [26], and a silhouette dataset from Brown University [24]. We have also used the ETHZ dataset [12] to demonstrate how hierarchical matching can be used for matching shapes to real, cluttered images. These experiments illustrate how the approach is not restricted to matching pre-segmented shapes. Instead, we can match a model shape directly to an unorganized set of contours ex- tracted from natural images. Most of the previous elastic matching methods look for maps between two curves while minimizing a measure of local bending and stretching (see [2], [23] and references within). The methods in [6] and [13] use a similar idea to match a curve to cluttered images. Local deformation mod- els are appealing from an algorithmic perspective. Usually dynamic programming can be used to find optimal match- ings. However, as described in [2, 23] these methods can 1
Transcript
Page 1: hier - Brown University

Hierarchical Matching of Deformable Shapes

Pedro F. FelzenszwalbUniversity of [email protected]

Joshua D. SchwartzUniversity of [email protected]

Abstract

We describe a new hierarchical representation for two-dimensional objects that captures shape information at mul-tiple levels of resolution. This representation is based onahierarchical description of an object’s boundary and can beused in an elastic matching framework, both for compar-ing pairs of objects and for detecting objects in clutteredimages. In contrast to classical elastic models, our repre-sentation explicitly captures global shape information. Thisleads to richer geometric models and more accurate recog-nition results. Our experiments demonstrate classificationresults that are significantly better than the current state-of-the-art in several shape datasets. We also show initialexperiments in matching shapes to cluttered images.

1. Introduction

Humans can often recognize objects using shape infor-mation alone. This has proven to be a challenging task forcomputer vision systems. One of the main difficulties is indeveloping representations that can effectively capture im-portant shape variations. We want to be able to comparedifferent objects, and to detect objects in cluttered images.The computational complexity of these tasks and the recog-nition accuracy obtained are highly dependent on the choiceof a shape representation.

This paper describes an approach for matching shapesbased on a hierarchical description of their boundaries. Thisapproach can be used both for determining the similaritybetween two shapes and for matching a deformable shapemodel to a cluttered image. By using a hierarchical model,we are able to develop simple elastic matching algorithmsthat can take global geometric information into account.

Our matching algorithms are based on a compositionalprocedure. We combine matchings between adjacent seg-ments on two curves to form matchings between longer seg-ments. This approach makes it possible to consider the ge-ometric arrangement among the endpoints of the matchingsbeing combined. For long matchings, the endpoints are faraway, which means that our measure of deformation cap-

A 1 A 2 B 1B 2q 2 q 1q 3p 3p 2

p 1Figure 1. The composition of matchings between adjacent seg-ments on two curves to form a matching between longer segments.

tures global geometric properties. Figure 1 illustrates theprocedure, where we combine a matching fromA1 to B1

with a matching fromA2 toB2 to obtain a longer matchingbetween two curves. The quality of the combination de-pends on both the quality of the matchings being combinedand the similarity between the geometric arrangements ofpoints(p1, p2, p3) and(q1, q2, q3).

We have tested the hierarchical representation and com-positional matching procedure in a variety of situations andobtained excellent performance. In classification tasks, weobtain better recognition results than other methods on sev-eral shape datasets, including the MPEG-7 shape dataset[15], a Swedish leaf dataset [26], and a silhouette datasetfrom Brown University [24]. We have also used the ETHZdataset [12] to demonstrate how hierarchical matching canbe used for matching shapes to real, cluttered images. Theseexperiments illustrate how the approach is not restricted tomatching pre-segmented shapes. Instead, we can match amodel shape directly to an unorganized set of contours ex-tracted from natural images.

Most of the previous elastic matching methods look formaps between two curves while minimizing a measure oflocal bending and stretching (see [2], [23] and referenceswithin). The methods in [6] and [13] use a similar idea tomatch a curve to cluttered images. Local deformation mod-els are appealing from an algorithmic perspective. Usuallydynamic programming can be used to find optimal match-ings. However, as described in [2, 23] these methods can

1

Page 2: hier - Brown University

(a) (b)

Figure 2. (a) Two curves that are almost indistinguishable by localproperties alone. (b) Two objects that are similar at a coarse levelbut quite dissimilar at a local level.

only address some aspects of shape similarity. Consider thecurves in Figure 2(a). While they represent different char-acters (6 and U) they can be transformed into each otherwithout much bending and stretching. The two shapes areessentially indistinguishable if we focus on local propertiesalone. On the other hand, while the objects in Figure 2(b)are perceptually similar, they have completely different lo-cal boundary properties.

Our hierarchical representation captures geometric prop-erties at different levels of resolution. At the finest level,these properties are related to standard local descriptions(capturing local curvature, for example). At coarser levels,the properties capture global shape aspects. As in classicelastic matching approaches, we use a dynamic program-ming algorithm for matching. But, as opposed to these othermethods, ours does not solve a shortest path problem due toits compositional nature. Our compositional approach is re-lated to the work in [4].

Hierarchical representations have proven to be useful ina variety of situations. The arc-tree in [14] gives a hierar-chical description of a curve based on recursive selectionof midpoints. This representation was used to perform geo-metric queries such as detecting intersections between twocurves. Our representation can be thought of as a modifiedarc-tree in which the only information kept at each nodeis the relative position of the selected midpoint. Recursivemidpoint selection is also a standard method used for poly-gon simplification in computer graphics [22].

In vision, multiscale representations such as the curva-ture scale-space (CSS) have been previously used for shaperecognition [21, 20, 28]. The CSS captures critical curva-ture points of a contour at different levels of smoothing. Ourrepresentation is also based on a multiresolution approach,but we rely only on subsampling to define coarse geometricproperties. The method in [28] uses dynamic programmingfor matching multiscale descriptions, but this method is notcompositional in contrast to ours. Other hierarchical meth-ods include the hierarchical graphical models in [8] and hi-erarchical procrustes matching [19].

The methods in [1] and [9] use triangulated graphs torepresent shapes and to model deformations of objects. Ourwork is related since we use the geometric arrangement ofsets of three points to capture shape information. Our algo-

rithm for matching shapes to cluttered images, like that of[12], works by linking edge contours.

There are many other methods for representing, match-ing and recognizing shapes. These include methods basedon the medial axis transform and the shock graph [5], [25],[24], procrustes analysis [7], shape contexts [3] and the in-ner distance [16]. We experimentally compare our algo-rithm to several of these approaches in Section 5.

2. The Shape-Tree

We start by describing our hierarchical representation foropen curves. LetA be an open curve specified by a se-quence of sample points(a1, . . . , an). Let ai be a midpointonA. For example, we usually takei = ⌊n/2⌋. Anotheroption is to choose the sample point such that the coarsecurve(a1, ai, an) approximatesA as well as possible. LetL(ai|a1, an) denote the location ofai relative toa1 andan.The locations of the first and last sample points can be usedto define a coordinate frame where we measure the loca-tion of the midpoint. The first and last sample points definea canonical scale and orientation, so the relative locationL(ai|a1, an) is invariant to similarity transformations.

The choice of a midpoint,ai, breaks the original curveinto two halves,A1 = (a1, . . . , ai) andA2 = (ai, . . . , an).The hierarchical description ofA is defined recursively, wekeep track ofL(ai|, a1, an) and the hierarchical descriptionof A1 andA2. This hierarchical description can be repre-sented by a binary tree, as illustrated in Figure 3. We callthis representation theshape-treeof a curve. Each node inthe shape-tree stores the relative location of a midpoint withrespect to the start and end point of a subcurve. The leftchild of a node describes the subcurve from the start to themidpoint while the right child describes the subcurve fromthe midpoint to the end. The leaves of this tree representlocations of sample points,ai, relative to their neighboringpoints,ai−1 andai+1. Note that a subtree rooted at a nodecorresponds to the shape-tree of a subcurve.

Nodes in the bottom of the shape-tree represent relativelocations of three sequential points along the curve. Thesenodes capture local geometric properties such as the an-gle formed at a point (which is a measure of curvature)and the relative distance between adjacent sample points.On the other hand, nodes near the root of the tree capturemore global information encoded by the relative locationsof points that are far from each other. This is a local prop-erty of a subsampled version of the original curve. Theshape-tree contains only the locations of points relative totwo other points. This makes the representation invariant tosimilarity transformations.

Given the tree representation forA, along with the lo-cation of its start and end pointsa1 andan, the curve canbe recursively reconstructed. First, the start and end pointsof the curve are placed. Because the location of a midpoint

2

Page 3: hier - Brown University

11 1

2 33 3 4 5

5 5 55 7 6

9 77 89

9

Figure 3. A shape-tree. Filled circles represent endpoints of sub-curves and unfilled circles represent midpoints. Each node storesthe location of a midpoint relative to the endpoints. The midpointbecomes an endpoint when a subcurve is divided.

of A relative to the start and the end is known, it can beplaced. This process continues down the shape-tree untilwe have placed every sample point ofA. By placing theinitial pointsa1 andan at arbitrary locations, a translated,rotated and scaled version ofA can be obtained.

A closed curve can be represented in a similar fashion.Let B be a closed curve, specified by a sequence of sam-ple points(b1, . . . , bn), wherebn = b1. Now let bi be amidpoint onB. The open curvesB1 = (b1, . . . , bi) andB2 = (bi, . . . , bn) can each be represented by a shape-tree.Given a shape-tree representation of each side of a closedcurve and a location for the start/end point and the first mid-point, the curve can be reconstructed at any location, orien-tation, and size. We simply reconstruct each side using theprocedure outlined above.

We note that for a continuous curve it is possible to de-fine an infinite shape-tree. This infinite tree gives a densesampling of the points in the curve, fully capturing its ge-ometry up to similarity transformations.

2.1. Deformations

We can deform a curve by perturbing the relative loca-tions stored in its shape-tree representation. To explore thisidea we need to pick a particular representation for the rel-ative locations of the midpoints in a curve.

Bookstein coordinates [7] encode the relative locationsof three points as a point in the plane. They give a sim-ple way to represent the relative location,L(ai|a1, an), of amidpoint in the shape-tree. Letv1, v2 andv3 be three dis-tinct points. There exists a unique similarity transformationthat mapsv1 to (−0.5, 0) and v2 to (0.5, 0). This trans-formation mapsv3 to a location that we call theBooksteincoordinateof v3 with respect tov1 andv2.

Figure 4. Random deformations obtained by adding independentnoise to the nodes in a shape-tree representation of an object. Thedeformed squares illustrate how the method preserves importantglobal properties while generating a wide range of variation.

Figure 4 shows some examples where we added indepen-dent noise to the Bookstein coordinates of each midpoint ina shape-tree before reconstructing a curve. The results arecurves that are perceptually similar to the originals. Notethat in the case of the square the deformed objects still seemto have four sides that meet at a right angle, even though thesides are quite deformed.

3. Elastic Matching

LetA andB be two open curves. When matching thesecurves, we build a shape-tree forA and look for a mappingfrom points inA to points inB such that the shape-tree ofA is deformed as little as possible.1 Here, we measure thetotal amount of deformation as a sum over deformations ap-plied to each node in the shape-tree ofA. The hierarchicalnature of the shape-tree ensures that both local and globalgeometric properties are preserved by a good matching. Inpractice, we use use a non-uniform weighting over deforma-tions applied to different nodes in the shape-tree. We allowlarger deformations near the bottom of a shape-tree as thesedo not change the global appearance of an object.

SupposeA = (a1, . . . , an) andB = (b1, . . . , bm). Weassume thata1 maps tob1 while an maps tobm. The shape-tree ofA defines a midpointai dividing the curve into twohalvesA1 andA2. The best match fromA toB can be de-fined by a search for a pointbj onB whereai maps to. Thispoint is used to divideB into two halvesB1 andB2 whereA1 andA2 map to respectively. We sayA andB are sim-ilar if we can find a midpoint onB such thatA1 is similarto B1, A2 is similar toB2 and the relative locations of themidpointsL(ai|a1, an) andL(bj |b1, bm) are similar. Thesimilarity between subcurves is defined in the same man-ner. The cost of matchingA to B can be expressed by a

1The method described here is not symmetric. It is possible to define asymmetric method, but that leads to a less efficient algorithm.

3

Page 4: hier - Brown University

recursive equation,

ψ(A,B) = minbj∈B

(ψ(A1, B1) + ψ(A2, B2) +

λA ∗ dif (L(ai|a1, an), L(bj |b1, bm))). (1)

wheredif measures the difference between the relative lo-cations of the midpoints onA andB andλA is a weightingfactor. For the experiments in this paper we used a weight-ing proportional on the length ofA (the curve being de-formed), giving a higher weights to deforming the relativelocations of points that are far away. We used the squaredProcrustes distance [7] between(a1, ai, an) and(b1, bj , bm)for definingdif .2

For the base case we need to defineψ(A,B) when eitherA or B have two sample points. A curve with two samplepoints is just a line segment. We let the cost of matchingone line segment with another be zero, while the cost ofmatching a line segment with a curve be exactly what itwould be if the line segment was further subdivided to havethe same number of sample points as the curve.

The recursive equation (1) can be solved using dynamicprogramming over the shape-tree ofA. Let v be a nodein the shape-tree ofA. Consider the subcurveA′ corre-sponding to the subtree rooted atv. Let T (v) be a tableof costs whereT (v)[s, e] is the cost of matchingA′ to thesubcurve ofB given by(bs, ..., be). The tableT (v) can becomputed using equation (1) once the tables for the childrenof v have been computed. The algorithm computes all ta-bles by starting at the leaves of the shape-tree and workingin order of decreasing depth. The cost of matchingA toBis T (r)[1,m], wherer is the root of the shape-tree.

There areO(n) tables to be computed, and each tablehasO(m2) entries. To compute an entry, we have to searchfor an ideal midpoint onB. So, the dynamic programmingprocedure takesO(nm3) time overall. After all tables arecomputed, we can find the best matching fromA to B bytracing back from the root of the shape-tree to the leaves, asin standard dynamic programming procedures.

WhenA andB are closed curves, we first breakA in twohalves,A1 = (a1, . . . , ai) andA2 = (ai, . . . , an), where,as before,a1 equalsan. We match each node in the shape-trees ofA1 andA2 to each subcurve ofB. The cost ofmatchingA to B, as a function of wherea1 andai mapto, is given byT1(r1)[s, e] + T2(r2)[e, s]. Herer1 andr2are the roots in the shape-trees ofA1 andA2, while s ande are locations inB which a1 andai map to. This leadsto anO(nm3) algorithm for matching closed curves. Inpractice, we use between 50 and 100 sample points in each

2The Euclidean distance between Bookstein coordinates is not a verygood measure of difference between relative locations of midpoints. Book-stein coordinates are better seen as points in the Poincare plane, wheregeodesic distance corresponds to a natural deformation measure.

(a) (b)

(c) (d)

Figure 5. Detecting a bottle in an image. The input image is shownin (a). The soft edge map is shown in (b). In (c), we have the imagecontours extracted from (b). Our final detection is shown in (d).

curve. Our current implementation takes about 0.5 secondsto compute a matching in a 3Ghz computer.

The formulation above assumes that each part ofA has acorresponding part onB. In many situations two curves aresimilar except that one of them has a missing or extra part.To make the matching robust to these transformations weboundψ(A′, B′) from above using a cost proportional toλA′ ∗(|A′|+ |B′|). This models a process where we replacea subcurve ofA with a subcurve ofB. Since the shape-tree ofA is fixed in advance, this process can only replacecertain parts ofA. To allow for more flexibility in dealingwith occlusions, we usually compute matchings using 2 to4 different shape-trees and pick the best one. It is also pos-sible to give a dynamic programming algorithm that allowsarbitrary parts ofA andB to be replaced, but that algorithmruns inO(n3m3) time.

4. Matching to Cluttered Images

Generalizing the ideas from the last section, we can alsomatch a model curve to a cluttered image. This algorithmproceeds in four stages. First, given a color image, we com-pute an edge strength map. Then, we extract a set of imagecontours from the edge map. After this, we match each im-age contour to all sub-countors of our model using dynamicprogramming. Finally, we use a second dynamic program-ming procedure to compose these matches together, form-ing an optimal matching between the model and a subset ofthe image contours. These stages are illustrated in Figure 5.

For the first stage, we use the PB edge operator [18] tocompute an edge strength map. For the second stage, we

4

Page 5: hier - Brown University

trace smooth contours in the edge map using the methodfrom [10]. The result is a set of salient contours in the im-age. An example can be found in Figure 5(c).

Let M be a model curve,C be the set of contours ex-tracted from an image andP denote the set of endpoints ofcontours inC. Our goal is to find a matching betweenMand a subset ofC. Leta andb be sample points inM , whilep andq are points inP . We useMatch(a, b, p, q) to denotea matching from the subcurve ofM from a to b to a subsetof the contours inC such thata maps top andb maps toq.

In the third stage of the algorithm, we compute the bestmatching between each contour inC and each subcurve ofM . This is done using the method from the last section. IttakesO(nm3) time to compute a table giving the cost ofdeforming an image contour withn sample points to ev-ery possible subcurve in a model withm sample points.Thus, the overall running time of the third stage is lin-ear in the total length of the contoursC and cubic in thelength of the model. This stage generates a set of matchingsMatch(a, b, p, q) that are stitched together to form largermatchings in the last stage.

We use the following compositional rule to stitch partialmatchings together. Letq andr be two points inP suchthat ||q − r|| ≤ τ , for some small thresholdτ . If we havetwo matchingsMatch(a, b, p, q) and Match(b, c, r, s) thenwe can compose them to get a matchingMatch(a, c, p, s).We allow q andr to be different so that we can composeadjacent contours in the image even if their endpoints do notexactly align. Mismatches between endpoint locations canbe caused by the edge detection or edge tracing procedure.In analogy to the expression in equation (1), the cost of thecomposed matching is the sum of the costs of the matchingsbeing composed plus a measure of the differences betweenthe relative locations of the midpoints in the model and theimage. Here we take the “midpoint” in the image to be theaverage(q + r)/2.

Because of occlusions and missing edges we would liketo allow a subcurve of the model to be left unmatched eventhough regions around it are matched. This is captured byconsidering “gap matches”Match(a, b, p, q) for every pairof sample pointsa andb in the model and pointsp andqin P . In these matchingsa is mapped top andb is mappedto q while the subcurve betweena andb is left unmatched.The cost of a gap match is proportional to the arclength ofthe subcurve froma to b.

A complete match betweenM and a subset of the con-tours is given by a pair of matchingsMatch(a, b, p, q) andMatch(b, a, q′, p′), where both||p′ − p|| and||q′ − q|| are atmostτ . Figure 6 illustrates the stitching procedure. We canfind the best complete matching using a second dynamicprogramming step. We sequentially compute the cheapestmatching of typeMatch(a, b, p, q) in order of increasing ar-clength of subcurves in the model. This stage of the al-

(a) Model (b) Image contours (c) Final result

Figure 6. The initial matchingMatch(a, b, p, q) can be com-posed with the gap matchMatch(b, c, q, r) to form a matchingMatch(a, c, p, r). Becauses and t are close, the initial match-ings Match(c, d, r, s) and Match(d, e, t, u) can be composed toform a matchingMatch(c, e, r, u). At this point, matchingsMatch(a, c, p, r) andMatch(c, e, r, u) could be composed. Con-tinuing in this way, we stitch together the boundary of the object.

gorithm runs inO(m3k3) time, wherem is the number ofsample points in the model andk is the number of end-points inP . In the future we plan to use the algorithms in[11] to compute optimal matches even faster. Those algo-rithms would compose matchings in order of their qualityto avoid considering many possibilities that are consideredby the dynamic programming procedure.

5. Experiments

5.1. Shape Classification

MPEG-7 Shape Database

The MPEG-7 shape database [15] is a widely used datasetfor testing shape recognition methods. The database has1400 silhouette images, with 20 images per object classfrom a total of 70 different classes. Figure 7 shows some ofthe images in the database. The standard method for mea-suring the recognition rate of an algorithm in this datasetis as follows. For every image in the database, we look atthe 40 most similar images and count how many of thoseare in the same class as the query image. The final score ofthe test is the ratio of the overall number of correct hits ob-tained to the best possible number of correct hits. The bestpossible number is 1400 * 20 since there are 1400 query im-ages and 20 images per class. This is a hard dataset due tothe large intraclass variability in each category. Table 1 liststhe recognition rate we obtained using the shape-tree defor-mation method, together with results from other algorithms.Note that our method outperforms all previous systems.

Swedish Leaf Database

The Swedish leaf dataset [26] has pictures of 15 speciesof leaves, with 75 images per species for a total of 1125images. Figure 8 shows some example images from this

5

Page 6: hier - Brown University

Figure 7. Some of the objects in the MPEG-7 dataset. One image per class for the first 40 classes (the database has 70 classes).

Method Recognition rateShape-tree 87.70%

IDSC + DP + EMD [17] 86.56%Hierarchical Procrustes [19] 86.35%

IDSC + DP [16] 85.40%Generative Models [27] 80.03%

Curve Edit [23] 78.14%SC + TPS [3] 76.51%

Visual Parts [15] 76.45%CSS [20] 75.44%

Table 1. Classification results on the MPEG-7 dataset.

Method Recognition rateShape-tree 96.28%

IDSC + DP [16] 94.13%SC + DP [16] 88.12%

Fourier descriptors [16] 89.60%Soderkvist [26] 82.40%

Table 2. Classification results on the Swedish leaf dataset.

dataset. Note that some species are indistinguishable to theuntrained eye. Similar to the methods in [26] and [16], werandomly select 25 training images from each species andclassify the remaining images using a nearest neighbor ap-proach. Table 2 compares our classification rate to the othermethods that have been tested on this dataset. The shape-tree matching algorithm outperforms the other methods bya significant amount.

Brown Database

We also tested the shape-tree matching algorithm on the sil-houette database from [24]. The dataset has 11 examplesfrom 9 different object categories for a total of 99 images.One interesting aspect of this dataset is that many of theshapes have missing parts and added clutter. Figure 9 showssome of the images. The recognition results in this datasetare measured as follows. For each shape in the database, wecheck if the 10 closest matches are in the same category asthe query shape. Table 3 summarizes the results of differentmethods. With our method, all of the 7 best matches foreach shape are in the correct category.

Figure 8. Leaves from the Swedish leaf dataset, one leaf perspecies. Note the similarity among some species.

Figure 9. Images from the Brown dataset. Two per category.

Figure 10. The models used for matching in the ETHZ dataset.

5.2. Matching in Cluttered Images

To test our matching algorithm on cluttered images, weran experiments on a set of 80 images of swans and bot-tles from the ETHZ dataset [12]. Matching for each class isdone with a single hand-drawn model shown in Figure 10.This makes this dataset a good test for elastic matching.The objects in each image often have substantially differentshape from the model. Interestingly, several images in thedataset are paintings, drawings, or computerized renderingsof scenes. Our algorithm performs very well on these im-ages. A sampling of our results can be found in Figures 11and 12. Note that our current implementation simply findsthe best match in each picture.

6. Summary

We introduced a hierarchical shape representation withthe goal of explicitly capturing both global and local geo-metric properties of an object. This representation is cap-

6

Page 7: hier - Brown University

Figure 11. Some example results of matching a bottle to images in the ETHZ dataset. Only the best match in each image is shown. Mostof the gaps in each matching are due to missing edges.

Figure 12. Some example results of matching a swan to images in the ETHZ dataset. Only the best match in each image is shown. Thethird image on the top shows a mistake, due to missing edges on the swan and extra edges on the water.

7

Page 8: hier - Brown University

Method 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10thShape-tree 99 99 99 99 99 99 99 97 93 86

IDSC + DP [16] 99 99 99 98 98 97 97 98 94 79Shock-Graph Edit [24] 99 99 99 98 98 97 96 95 93 82Generative Models [27] 99 97 99 98 96 96 94 83 75 48

Table 3. Retrieval results on the dataset from [24]. Ideally the top 10 matches of each of the 99 shapes would be a shape in the samecategory. The table summarizes the number of correct matches in eachrank.

tured by a tree, which we term theshape-treeof an object.We can define deformations of an object in terms of inde-pendent deformations applied to each node in its shape-tree.Since some of the nodes in the shape-tree capture global ge-ometric information, the process of applying a small defor-mation to each node preserves perceptually important as-pects of the object’s shape.

We have used the shape-tree deformation model to de-velop a simple and efficient algorithm for matching curves.Our experimental results show that this method is very ac-curate when used for classifying objects from several largedatabases. Moreover, the matching algorithm can be ex-tended to detecting deformable objects in cluttered images.Our future work will be directed towards refining and eval-uating this process.

Acknowledgment This material is based upon work supported by

the National Science Foundation under Grant No. 0534820.

References

[1] Y. Amit and A. Kong. Graphical templates for model regis-tration. PAMI, 18(3):225–236, 1996.

[2] R. Basri, L. Costa, D. Geiger, and D. Jacobs. Determin-ing the similarity of deformable shapes.Vision Research,38:2365–2385, 1998.

[3] S. Belongie, J. Malik, and J. Puzicha. Shape matching andobject recognition using shape contexts.PAMI, 24(4):509–522, April 2002.

[4] E. Bienenstock, S. Geman, and D. Potter. Compositionality,mdl priors, and object recognition. InNIPS, 1997.

[5] H. Blum. Biological shape and visual science.TheoreticalBiology, 38:205–287, 1973.

[6] J. Coughlan, A. Yuille, C. English, and D. Snow. Efficientdeformable template detection and localization without userinitialization. CVIU, 78(3):303–319, June 2000.

[7] I. Dryden and K. Mardia.Statistical Shape Analysis. JohnWiley and Sons, 1998.

[8] X. Fan, C. Qi, D. Liang, and H. Huang. Probabilistic contourextraction using hierarchical shape representation. InICCV,pages I: 302–308, 2005.

[9] P. Felzenszwalb. Representation and detection of deformableshapes.PAMI, 27(2):208–220, February 2005.

[10] P. Felzenszwalb and D. McAllester. A min-cover approachfor finding salient curves. InIEEE Workshop on Percepep-tual Organization, 2006.

[11] P. Felzenszwalb and D. McAllester. The generalized A* ar-chitecture. Journal of Artificial Intelligence Research, Toappear 2007.

[12] V. Ferrari, T. Tuytelaars, and L. Van Gool. Object detectionby contour segment networks. InECCV, 2006.

[13] U. Grenander, Y. Chow, and D. Keenan.Hands: A PatternTheoretic Study of Biological Shapes. Springer-Verlag, 1991.

[14] O. Gunther and E. Wong. The arc tree: An approximationscheme to represent arbitrary curved shapes.Computer Vi-sion, Graphics, and Image Processing, 51:313–337, 1990.

[15] L. Latecki, R. Lakamper, and U. Eckhardt. Shape descriptorsfor non-rigid shapes with a single closed contour. InCVPR,2000.

[16] H. Ling and D. Jacobs. Using the inner-distance for classifi-cation of articulated shapes. InCVPR, 2005.

[17] H. Ling and K. Okada. An efficient earth mover’s distance al-gorithm for robust histogram comparison.PAMI, 29(5):840–853, May 2007.

[18] D. Martin, C. Fowlkes, and J. Malik. Learning to detect nat-ural image boundaries using local brightness, color, and tex-ture cues.PAMI, 26(5):530–549, May 2004.

[19] G. McNeill and S. Vijayakumar. Hierarchical procrustesmatching for shape retrieval. InCVPR, 2006.

[20] F. Mokhtarian, S. Abbasi, and J. Kittler. Efficient and ro-bust retrieval by shape content through curvature scale space.In A. Smeulders and R. Jain, editors,Image Databases andMulti-Media Search, pages 51–58. World Scientific, 1997.

[21] F. Mokhtarian and A. Mackworth. A theory of multi-scalecurvature-based shape representations for planar curves.PAMI, 14(8):789–805, 1992.

[22] U. Ramer. An iterative procedure for the polygonal approxi-mation of plane curves.Computer Graphics and Image Pro-cessing, 1:244–256, 1972.

[23] T. Sebastian, P. Klein, and B. Kimia. On aligning curves.PAMI, 25(1):116–124, January 2003.

[24] T. Sebastian, P. Klein, and B. Kimia. Recognition of shapesby editing their shock graphs.PAMI, 25(5):550–571, 2004.

[25] K. Siddiqi, A. Sokoufandeh, S. Dickinson, and S. Zucker.Shock graphs and shape matching.IJCV, 35(1):13–32, 1999.

[26] O. Soderkvist. Computer vision classification of leaves fromswedish trees. Master’s thesis, Linkoping University, 2001.

[27] Z. Tu and A. Yuille. Shape matching and recognition: Usinggenerative models and informative features. InECCV, 2004.

[28] N. Ueda and S. Suzuki. Learning visual models from shapecontours using multiscale convex/concave structure match-ing. PAMI, 15(4):337–352, 1993.

8


Recommended