+ All Categories
Home > Documents > IEEE TRANSACTIONS ON PATTERN ANALYSIS AND...

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND...

Date post: 28-Mar-2018
Category:
Upload: builien
View: 219 times
Download: 3 times
Share this document with a friend
14
Image Segmentation Using Higher-Order Correlation Clustering Sungwoong Kim, Member, IEEE, Chang D. Yoo, Senior Member, IEEE, Sebastian Nowozin, and Pushmeet Kohli Abstract—In this paper, a hypergraph-based image segmentation framework is formulated in a supervised manner for many high-level computer vision tasks. To consider short- and long-range dependency among various regions of an image and also to incorporate wider selection of features, a higher-order correlation clustering (HO-CC) is incorporated in the framework. Correlation clustering (CC), which is a graph-partitioning algorithm, was recently shown to be effective in a number of applications such as natural language processing, document clustering, and image segmentation. It derives its partitioning result from a pairwise graph by optimizing a global objective function such that it simultaneously maximizes both intra-cluster similarity and inter-cluster dissimilarity. In the HO-CC, the pairwise graph which is used in the CC is generalized to a hypergraph which can alleviate local boundary ambiguities that can occur in the CC. Fast inference is possible by linear programming relaxation, and effective parameter learning by structured support vector machine is also possible by incorporating a decomposable structured loss function. Experimental results on various data sets show that the proposed HO-CC outperforms other state-of-the-art image segmentation algorithms. The HO-CC framework is therefore an efficient and flexible image segmentation framework. Index Terms—Image segmentation, correlation clustering, structural learning Ç 1 INTRODUCTION I MAGE segmentation which can defined as a clustering of image pixels into disjoint coherent regions is currently being used in many of the state-of-the-art high-level image/scene understanding tasks such as object class seg- mentation, scene segmentation, surface layout labeling, and single view 3D reconstruction [1], [2], [3], [4], [5]. Its use provides the following three benefits: (1) coherent support regions, commonly assumed to be of a single label, serve as a good prior for many labeling tasks; (2) these coherent regions allow extraction of a more consis- tent feature that provides surrounding contextual infor- mation through pooling many feature responses over the region; and (3) a small number of larger coherent regions, compared to large number of pixels, significantly reduces the computational cost for a labeling task. Many segmentation algorithms have been proposed in the literature that can be broadly classified into two groups—graph based (examples include min-cuts [9], normalized cuts [10] and Felzenszwalb-Huttenlocher (FH) segmentation algorithm [11]) and non-graph based (exam- ples include K-means [6], mean-shift [7], and EM [8]). Com- pared to non-graph-based segmentations, graph-based segmentations have been shown to produce more consistent segmentations by adaptively balancing local judgements of similarity [12]. Graph-based image segmentation algorithms can be further categorized into either node-labeling or edge- labeling algorithms. In contrast to the node-labeling frame- work of the min-cuts and normalized cuts, the edge-labeling framework of the FH algorithm does not require a pre-speci- fied number of segmentations in an image. Correlation clustering (CC) is a graph-partitioning algo- rithm [13] that infers the edge labels of the graph by simulta- neously maximizing intra-cluster similarity and inter-cluster dissimilarity by optimization of a global objective (discrimi- nant) function. Furthermore, the CC can be formulated as a linear discriminant function which allows for approximate polynomial-time inference by linear programming (LP) and also allows large margin training based on structured sup- port vector machine (S-SVM) [14]. Finley and Joachims [15] consider a framework that uses the S-SVM for training the parameters in the CC for noun-phrase clustering and news article clustering. Taskar derived a max-margin formulation, different from the S-SVM, for learning the edge scores in the CC [16] for applications involving two different segmenta- tions of a single image. No experimental comparisons or quantitative results are provided in [16]. We have recently explored a supervised CC over a pair- wise superpixel graph for task-specific image segmentation [17], and it has been shown to perform better than other state-of-the-art image segmentation algorithms. Although it derives its segmentation result by optimizing a global objective function, which leads to a S. Kim is with Qualcomm Research Korea, 119 Nonhyeon Dong, Gang- nam Gu, Seoul 135-820, South Korea. E-mail: [email protected]. C.D. Yoo is with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong Dong, Yuseong Gu, Daejeon 305-701, South Korea. E-mail: [email protected]. S. Nowozin and P. Kohli are with Machine Learning and Perception, Microsoft Research Cambridge, 7 JJ Thomson Ave, Cambridge, Cambridge- shire CB30FB, United Kingdom. E-mail: {Sebastian.Nowozin, pkohli}@microsoft.com. Manuscript received 14 Mar. 2012; revised 16 Dec. 2013; accepted 17 Jan. 2014. Date of publication 27 Jan. 2014; date of current version 29 July 2014. Recommended for acceptance by R. Yang. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPAMI.2014.2303095 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014 1761 0162-8828 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Transcript
Page 1: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

Image Segmentation UsingHigher-Order Correlation Clustering

Sungwoong Kim,Member, IEEE, Chang D. Yoo, Senior Member, IEEE,

Sebastian Nowozin, and Pushmeet Kohli

Abstract—In this paper, a hypergraph-based image segmentation framework is formulated in a supervised manner for many high-level

computer vision tasks. To consider short- and long-range dependency among various regions of an image and also to incorporate

wider selection of features, a higher-order correlation clustering (HO-CC) is incorporated in the framework. Correlation clustering (CC),

which is a graph-partitioning algorithm, was recently shown to be effective in a number of applications such as natural language

processing, document clustering, and image segmentation. It derives its partitioning result from a pairwise graph by optimizing a global

objective function such that it simultaneously maximizes both intra-cluster similarity and inter-cluster dissimilarity. In the HO-CC, the

pairwise graph which is used in the CC is generalized to a hypergraph which can alleviate local boundary ambiguities that can occur in

the CC. Fast inference is possible by linear programming relaxation, and effective parameter learning by structured support vector

machine is also possible by incorporating a decomposable structured loss function. Experimental results on various data sets show that

the proposed HO-CC outperforms other state-of-the-art image segmentation algorithms. The HO-CC framework is therefore an

efficient and flexible image segmentation framework.

Index Terms—Image segmentation, correlation clustering, structural learning

Ç

1 INTRODUCTION

IMAGE segmentation which can defined as a clustering ofimage pixels into disjoint coherent regions is currently

being used in many of the state-of-the-art high-levelimage/scene understanding tasks such as object class seg-mentation, scene segmentation, surface layout labeling,and single view 3D reconstruction [1], [2], [3], [4], [5]. Itsuse provides the following three benefits: (1) coherentsupport regions, commonly assumed to be of a singlelabel, serve as a good prior for many labeling tasks; (2)these coherent regions allow extraction of a more consis-tent feature that provides surrounding contextual infor-mation through pooling many feature responses over theregion; and (3) a small number of larger coherent regions,compared to large number of pixels, significantly reducesthe computational cost for a labeling task.

Many segmentation algorithms have been proposed inthe literature that can be broadly classified into twogroups—graph based (examples include min-cuts [9],

normalized cuts [10] and Felzenszwalb-Huttenlocher (FH)segmentation algorithm [11]) and non-graph based (exam-ples include K-means [6], mean-shift [7], and EM [8]). Com-pared to non-graph-based segmentations, graph-basedsegmentations have been shown to produce more consistentsegmentations by adaptively balancing local judgements ofsimilarity [12]. Graph-based image segmentation algorithmscan be further categorized into either node-labeling or edge-labeling algorithms. In contrast to the node-labeling frame-work of the min-cuts and normalized cuts, the edge-labelingframework of the FH algorithm does not require a pre-speci-fied number of segmentations in an image.

Correlation clustering (CC) is a graph-partitioning algo-rithm [13] that infers the edge labels of the graph by simulta-neously maximizing intra-cluster similarity and inter-clusterdissimilarity by optimization of a global objective (discrimi-nant) function. Furthermore, the CC can be formulated as alinear discriminant function which allows for approximatepolynomial-time inference by linear programming (LP) andalso allows large margin training based on structured sup-port vector machine (S-SVM) [14]. Finley and Joachims [15]consider a framework that uses the S-SVM for training theparameters in the CC for noun-phrase clustering and newsarticle clustering. Taskar derived a max-margin formulation,different from the S-SVM, for learning the edge scores in theCC [16] for applications involving two different segmenta-tions of a single image. No experimental comparisons orquantitative results are provided in [16].

We have recently explored a supervised CC over a pair-wise superpixel graph for task-specific image segmentation[17], and it has been shown to perform better than otherstate-of-the-art image segmentation algorithms.

Although it derives its segmentation result byoptimizing a global objective function, which leads to a

� S. Kim is with Qualcomm Research Korea, 119 Nonhyeon Dong, Gang-nam Gu, Seoul 135-820, South Korea.E-mail: [email protected].

� C.D. Yoo is with the Department of Electrical Engineering, KoreaAdvanced Institute of Science and Technology, 373-1 Guseong Dong,Yuseong Gu, Daejeon 305-701, South Korea.E-mail: [email protected].

� S. Nowozin and P. Kohli are with Machine Learning and Perception,Microsoft Research Cambridge, 7 JJ Thomson Ave, Cambridge, Cambridge-shire CB30FB, United Kingdom.E-mail: {Sebastian.Nowozin, pkohli}@microsoft.com.

Manuscript received 14 Mar. 2012; revised 16 Dec. 2013; accepted 17 Jan.2014. Date of publication 27 Jan. 2014; date of current version 29 July 2014.Recommended for acceptance by R. Yang.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPAMI.2014.2303095

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014 1761

0162-8828� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

discriminatively-trained discriminant function, the pair-wise CC (PW-CC) is restricted to resolving segmentboundary ambiguities corresponding to only local pair-wise edge labels of a graph. Therefore, to capture long-range dependencies of distant nodes in a global context,this paper proposes higher-order correlation clustering (HO-CC) to incorporate higher-order relations. Generalizingthe PW-CC over a pairwise superpixel graph, we developa HO-CC over a hypergraph that considers higher-orderrelations among superpixels. An edge in the hypergraphof the proposed HO-CC can connect to two or morenodes representing the superpixels as in [18].

Hypergraphs have been previously used to lift certainlimitations of conventional pairwise graphs [19], [20],[21]. However, previously proposed hypergraphs forimage segmentation are restricted to partitioning basedon the generalization of a normalized-cut framework,which suffer from the following three difficulties. First,inference is slow and difficult especially with increasinggraph size. To approximate the inference process, a num-ber of algorithms have been introduced based on thecoarsening algorithm [20] and the hypergraph Laplacianmatrices [19]. These are heuristic approaches and there-fore sub-optimal. Second, incorporating a supervisedlearning algorithm for parameter estimation under thespectral hypergraph partitioning framework is difficult.This is in line with the difficulties in learning spectralgraph partitioning. This requires a complex and unstableeigenvector approximation which must be differentiable[22], [23]. Third, region-based features are utilized in arestricted manner. Almost all previous hypergraph-basedimage segmentation algorithms have been restricted tocolor variances as region features.

The proposed HO-CC framework alleviates all of theabove difficulties by generalizing the PW-CC and makinguse of the hypergraph. The hypergraph which is con-structed based on the correlation information of the super-pixels can be equivalently formulated as a lineardiscriminant function. A richer feature vector involvinghigher-order relations among visual cues of the superpixelscan be utilized. For fast inference, a LP relaxation is used,and for tractable S-SVM training of the parameters withunbalance class labeled data, a decomposable structured-loss function is defined, which allows the efficient use of thecutting-plane algorithm to approximately solve the con-strained optimization. Experimental results on various datasets show that the proposed HO-CC outperforms otherstate-of-the-art image segmentation algorithms.

An earlier version of this paper appeared as Kim et al.[24]. This paper provides a more detailed description of theproposed HO-CC, additional empirical results, and in-depthanalysis of the performances on image segmentation tasks.

Our main contributions can be summarized as follows:(1) the hypergraph-based HO-CC approach that takes intoaccount higher-order relationships between super-pixels;(2) inference using a LP relaxation of the problem; (3) usingsupervised learning for discriminative clustering via a cut-ting plane algorithm that can handle a decomposable lossfunction; and (4) the demonstration of segmentation resultsthat improve on those obtained by state-of-the-art segmen-tations methods.

The rest of the paper is organized as follows. Section 2describes the PW-CC in [17], and Section 3 presents theproposed HO-CC. Section 4 describes structural learningfor supervised image segmentation based on the S-SVMand cutting plane algorithm. A number of experimentaland comparative results are presented and discussed inSection 5, followed by a conclusion in Section 6.

2 PAIRWISE CORRELATION CLUSTERING

As alluded earlier, the CC is basically an algorithm to parti-tion a pairwise graph into disjoint groups of coherent nodes[13], and it has been used in natural language processingand document clustering [15], [25], [26]. This sectionpresents the PW-CC that has been developed to solve animage segmentation task by partitioning a pairwise super-pixel graph [17].

2.1 Superpixels

The proposed image segmentation is based on superpixelswhich are small coherent regions preserving almost allboundaries between different regions. This is an advantagesince superpixels significantly reduce computational costand allow feature extraction to be conducted from a largercoherent region. Both the pairwise and higher-order CCmerges superpixels into disjoint coherent regions over asuperpixel graph. Therefore, the proposed CC is not areplacement to existing superpixel algorithms, and per-formances might be influenced by baseline superpixels.

2.2 Pairwise Correlation Clustering over a PairwiseSuperpixel Graph

Define a pairwise undirected graph G ¼ ðV; EÞ where a nodecorresponds to a superpixel and a link between adjacentsuperpixels corresponds to an edge (see Fig. 1a). A binarylabel yjk for an edge ðj; kÞ 2 E between nodes j and k isdefined such that

yjk ¼ 1; if j and k belong to the same region;0; otherwise:

�(1)

Fig. 1. Illustrations of a part of (a) the pairwise graph (b) and the tripletgraph built on superpixels.

1762 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 3: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

A discriminant function is defined over image x and label yof all edges as

F ðx;y;wÞ ¼X

ðj;kÞ2ESimwðx; j; kÞyjk (2)

¼X

ðj;kÞ2Ehw;fjkðxÞiyjk (3)

¼ w;X

ðj;kÞ2EfjkðxÞyjk

* +(4)

¼ hw;Fðx;yÞi; (5)

where the similarity measure between nodes j and k,Simwðx; j; kÞ, is parameterized by w and takes values ofboth signs such that a large positive value indicatesstrong similarity while a large negative value indicatesstrong dissimilarity. Note that the discriminant functionF ðx;y;wÞ is assumed to be linear in both the parametervector w and the joint feature map Fðx;yÞ, and fjkðxÞ is apairwise feature vector which reflects the correspondencebetween the jth and the kth superpixels. An image seg-mentation is to infer the edge label y over the pairwisesuperpixel graph G by maximizing F such that

y ¼ argmaxy2YðGÞ

F ðx;y;wÞ; (6)

where YðGÞ is a subset of f0; 1gE that corresponds to a validsegmentation and is the set of multicuts [27] of the graph G.However, solving (6) over YðGÞ is generally NP-hard.

2.3 LP Relaxation for Pairwise CorrelationClustering

We approximate YðGÞ by means of a common multicutLP relaxation [27], [28] with the following two con-straints: (1) cycle inequality and (2) odd-wheel inequal-ity. The LP relaxation to approximately solve (6) can beformulated as

argmaxy

Xðj;kÞ2E

hw;fjkðxÞjk

s:t: y 2 ZðGÞ;where ZðGÞ � YðGÞ is a relaxed polytope defined by the fol-lowing two linear inequalities.

1. Cycle inequality: Let Pathðj; kÞ be the set of pathsbetween nodes j and k. The cycle inequality is a gen-eralization of the triangle inequality [27] and isdefined as

ð1� yjkÞ �Xðs;tÞ2p

ð1� ystÞ; p 2 Pathðj; kÞ: (8)

2. Odd-wheel inequality: Let a q-wheel be a connectedsubgraph S ¼ ðVs; EsÞ with a central vertex j 2 Vs

and a cycle of q vertices in C ¼ Vs n fjg. For everyodd qð� 3Þ-wheel, a valid segmentation y satisfies

Xðs;tÞ2EðCÞ

ð1� ystÞ �Xk2C

ð1� yjkÞ � 1

2q

� �; (9)

where EðCÞ denotes the set of all edges in the outercycle C.

Although the number of inequalities (8) and (9) is expo-nentially large in the size of the graph, it is nevertheless pos-sible to optimize (7) in polynomial time. The identificationof a violated inequality—the so called separation problem—from both sets (8) and (9) is possible in polynomial time[29], [30]. A famous result in combinatorial optimizationstates the equivalence between optimization and separation[31]. Thus, the polynomial time solvability of (7) isguaranteed.

The relation between the solutions of (6) and (7) is as fol-lows: if the LP solution to (7) is integral, that is for allðj; kÞ 2 E we have yjk 2 f0; 1g, then the solution y is theexact solution to (6). If instead, it is fractional, then we takethe floor of a fractionally-predicted label of each edge inde-pendently for simply obtaining a feasible but potentiallysuboptimal solution to (6).

2.4 The Need for Higher-Order Models

Even though the PW-CC described above can use a richpairwise feature vector with an optimized parameter vector(which will be presented later), it often produces incorrectlypredicted segments due to segment boundary ambiguitiescaused by limited pairwise relations of neighboring super-pixels (see Fig. 2). Therefore, to incorporate higher-orderrelations of distant superpixels, we develop a HO-CC bygeneralizing the CC over a hypergraph.

3 HIGHER-ORDER CORRELATION CLUSTERING

This section describes the proposed HO-CC for image seg-mentation in three steps. In the first step, we define thehypergraph representation. Second, we generalize the LPrelaxation (7) for hypergraphs. Finally, a feature vector con-sisting of pairwise and higher-order feature vectors to char-acterize relationship among superpixels over a hypergraphis presented.

Fig. 2. Example of segmentation result by PW-CC. (a) Original image.(b) Ground-truth. (c) Superpixels. (d) Segments obtained by PW-CC.

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1763

Page 4: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

3.1 Hypergraph

The proposed HO-CC is defined over a hypergraph inwhich an edge referred to as hyperedge can connect to two ormore nodes. For example, as shown in Fig. 1b, one can intro-duce binary labels for each adjacent vertices forming a trip-let such that yijk ¼ 1 if all vertices in fi; j; kg are in the samecluster; otherwise, yijk ¼ 0. Define a hypergraph HG ¼ðV; EÞ where V is the set of all nodes (superpixels) and E isthe set of all hyperedges (subsets of V) such that

Se2E ¼ V.

Here, a hyperedge e has at least two nodes, i.e., jej � 2.Therefore, the hyperedge set E can be divided into two dis-joint subsets: pairwise edge set Ep ¼ fe 2 E j jej ¼ 2g andhigher-order edge set Eh ¼ fe 2 E j jej > 2g such thatEp

S Eh ¼ E. Note that in the proposed hypergraph for HO-CC all hyperedges containing just two nodes (8ep 2 Ep) arelinked between adjacent superpixels. The pairwise super-pixel graph is a special hypergraph where all hyperedgescontain just two (neighboring) superpixels: Ep ¼ E. A binarylabel ye for a hyperedge e 2 E is defined such that

ye ¼1; if all nodes in e belong to the same region;

0; otherwise:

�(10)

3.2 Higher-Order Correlation Clustering over aHypergraph

Similar to the PW-CC, a linear discriminant function isdefined over image x and label y of all hyperedges as

F ðx;y;wÞ¼

Xe2E

Homwðx; eÞye (11)

¼Xe2E

hw;feðxÞiye (12)

¼Xep2Ep

hwp;fepðxÞiyep þXeh2Eh

hwh;fehðxÞiyeh (13)

¼ hw;Fðx;yÞi; (14)

where the homogeneity measure among nodes in e,Homwðx; eÞ, is also the inner product of the parameter vec-tor w and the feature vector feðxÞ and takes values of bothsigns such that a large positive value indicates strong homo-geneity while a large negative value indicates high degree ofnon-homogeneity. Note that the proposed discriminantfunction for the HO-CC is decomposed into two terms byassigning different parameter vectors to the pairwise edgeset Ep and the higher-order edge set Eh such thatw ¼ ½wp;wh�. Thus, in addition to the pairwise similaritybetween neighboring superpixels, the proposed HO-CCconsiders a broad homogeneous region reflecting higher-order relations among superpixels.

From a given image, a hypergraph is constructed as fol-lows. First, unsupervised multiple partitionings areobtained by merging not pixels but superpixels with differ-ent image quantizations using the ultrametric contour maps[32]. Then, the obtained regions are used to define hyper-edges of the hypergraph. For example, in Fig. 3, there arethree region layers, one superpixel (pairwise) layer and twohigher-order layers. All edges (black line) in the pairwisesuperpixel graph from the first layer are incorporated intothe pairwise edge set Ep. Hyperedges (yellow line) corre-sponding to regions (groups of superpixels) in the secondand third layers are included in the higher-order edge setEh. Note that we can further decompose the higher-orderterm in (13) into two terms associated with the second andthird layers, respectively, by assigning different parametervectors; however for simplicity, this paper aggregates allhigher-order edges from all higher-order layers into a singlehigher-order edge set assigning the same parameter vector.

The use of unsupervised multiple partitionings enables toobtain reasonable candidate regions for defining higher-order edges. Other methods to define higher-order edges arealso possible. For instance, from the baseline pairwise super-pixel graph, the fully connected subgraphs referred to as cli-ques which have more than two nodes can be obtained, and

Fig. 3. Hypergraph construction from multiple partitionings. (a) Multiple partitionings from baseline superpixels. (b) Hyperedge (yellow line) corre-sponding to a region in the second layer. (c) Hyperedge (yellow line) corresponding to a region in the third layer.

1764 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 5: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

these cliques can be associated to the higher-order edges.However, the use of the cliques in the proposed frameworkis empirically hard to produce broad regions which consistof more than four fully connected superpixels.

3.3 LP Relaxation for Higher-Order CorrelationClustering

An image segmentation is to infer the hyperedge label, y,over the hypergraph HG by maximizing the discriminantfunction F such that

y ¼ argmaxy2YðHGÞ

F ðx;y;wÞ; (15)

where YðHGÞ is also the subset of f0; 1gE that corresponds toa valid segmentation.

In order to define the state of the higher-order edgevariables in relations to the pairwise edge variables, weintroduce two types of inequalities: the first enforces thatwhen a pairwise edge places (or labels) adjacent super-pixels belonging to a certain higher-order edge as beingin different clusters, the higher-order edge cannot placethe two in the same cluster; and the second enforces thatwhen all pairwise edges of a set of superpixels agree thatall superpixels in the set are in the same cluster, then thehigher-order edge of the set must place all the superpixelsas belonging to one cluster (see Table 1). We define novelconstraints for labels on pairwise and higher-order edges,referred to as higher-order inequalities, to formalize thisintuition as follows:

yeh � yep ; 8ep 2 Epjep � eh;

ð1� yehÞ �X

ep2Epjep�eh

ð1� yepÞ: (16)

Proposition. The set of all binary solutions satisfyingthe inequalities (8), (9), and (16), which forms the HO-CC prob-lem, represents exactly the set of consistent cluster assignments.

Proof. All solutions satisfying the pairwise inequalities (8)and (9) lead to consistent pairwise edge label assign-ments. Inclusion of (16) does not make any pairwise solu-tion inconsistent. Also, by formalizing the above intuitivereasoning as (16), for binary variables, all higher-orderedge assignments are consistent with all pairwise edgeassignments. tuThe LP relaxation to approximately solve (15) is formu-

lated as

argmaxy

Xep2Ep

hwp;fepðxÞiyep þXeh2Eh

hwh;fehðxÞiyeh

s:t: y 2 ZðHGÞ;

where ZðHGÞ � YðHGÞ is the relaxed polytope defined bythe cycle inequality of (8), odd-wheel inequality of (9), andhigher-order inequality of (16).

Due to the exponentially large number of constraints,we use the cutting plane algorithm [33], which is summa-rized in Algorithm 1, to solve (17) efficiently. The algo-rithm works with a small set of constraints that defines aloose relaxation S to the feasible set. It iteratively tightensS by means of violated inequalities. In each iteration, theoptimal y on the current set of constraints is found, thenviolated inequalities are searched. When a violatedinequality is found, it is added to the current constraintset to reduce S, and (17) is re-solved with the tightenedrelaxation (reduced S). Here, the search for a violatedinequality runs in polynomial time.

TABLE 1Label Validity for Segmentation from the Hypergraph (Triplet Graph) in Fig. 1b

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1765

Page 6: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

Note that the proposed HO-CC follows the concept ofsoft constraints: superpixels belonging to a hyperedge arenot forced but encouraged to merge if a hyperedge ishighly homogeneous. This is in line with recent higher-order models for high-level image understanding [1],[34], [35].

3.4 Feature Vector

We construct a 481-dimensional feature vector feðxÞ ¼½fepðxÞ;feh

ðxÞ� by concatenating several visual cues withdifferent quantization levels and thresholds. The pair-wise feature vector fepðxÞ reflects the correspondencebetween neighboring superpixels, and the higher-orderfeature vector feh

ðxÞ characterizes a more complex rela-tions among superpixels in a broader region to measurehomogeneity. The magnitude of w determines the impor-tance of each feature, and this importance is task-depen-dent. Thus, w is estimated by supervised trainingdescribed in Section 4.

3.4.1 Pairwise Feature Vector

We extract several visual cues from a superpixel, includingbrightness (intensity), color, texture, and shape. Based onthese visual cues, we construct a 321-dimensional pairwisefeature vector fep by concatenating a color difference featurefc, texture difference feature ft, shape/location differencefeature fs, edge strength feature fe, joint visual word poste-rior feature fv, and bias as follows:

fep ¼hfcep;ft

ep;fs

ep;fe

ep;fv

ep; 1i: (18)

� Color difference feature fcep: The color difference fea-

ture fcep

is composed of 26 color distances betweentwo adjacent superpixels based on RGB andHSV channels. Specifically, we calculate 18 earthmover’s distances (EMDs) [36] between two colorhistograms extracted from each superpixel with vari-ous numbers of bins and thresholds for ground dis-tance. In addition, six absolute differences (one foreach color channel) between the means of the twosuperpixels and two x2-distances between hue/satu-ration histograms of the two superpixels areconcatenated in fc

ep.

� Texture difference feature ftep: The 64-dimensional

texture difference feature ftepis composed of 15 abso-

lute differences (one for each texture-response)between the means of two superpixels using15 Leung-Malik (LM) filter banks [37] and onex2-distance and 48 EMDs (from various numbers ofbins and thresholds for ground distance) betweentexture histograms of the two superpixels.

� Shape/location difference feature fsep: The five-

dimensional shape/location difference feature fsep

iscomposed of two absolute differences between thenormalized (x/y) center positions of the two super-pixels, the ratio of the size of the smaller superpixelto that of the larger superpixel, the percentage ofboundary with respect to the smaller superpixel, andthe straightness of boundary [4].

� Edge strength feature feep: The 15-dimensional edge

strength feature feep

is a 1-of-15 coding of the quan-tized edge strength proposed by Arbel�aez et al. [32].

� Joint visual word posterior feature fvep: The 210-

dimensional joint visual word posterior feature fvepis

defined as the vector holding the joint visual wordposteriors for a pair of neighboring superpixelsusing 20 visual words [38] as follows. First, a 52-dimensional raw feature vector xj based on color,texture, location, and shape features described in [4]is extracted from the jth superpixel. Then, the visualword posterior distribution P ðvijxjÞ is computedusing the Gaussian RBF kernel where vi denotes theith visual word. Let VjkðxÞ be a 20-by-20 matrixwhose elements are the joint visual word posteriorsbetween nodes j and k defined such that

VjkðxÞ

¼

P ðv1jxjÞP ðv1jxkÞ P ðv1jxjÞP ðv20jxkÞP ðv2jxjÞP ðv1jxkÞ P ðv2jxjÞP ðv20jxkÞ

..

. . .. ..

.

P ðv20jxjÞP ðv1jxkÞ P ðv20jxjÞP ðv20jxkÞ

266664

377775:

(19)

The joint visual word posterior feature betweennodes j and k, fv

jkðxÞ, is defined as

fvjkðxÞ ¼ vec

�VjkðxÞ

�þ vec�V TjkðxÞ

�; (20)

where vecðV Þ be the 210ð¼ 20 21=2Þ-dimensionalvector whose elements are from the upper triangularpart of V .This joint visual word posterior feature could over-come the weakness of class-agnostic features andincorporate the contextual information.

3.4.2 Higher-Order Feature Vector

We construct a 160-dimensional higher-order featurevector feh

by concatenating the variance feature fvaeh, edge

strength feature feeh, template matching feature ftm

ehand bias as follows:

feh¼ �

fvaeh;fe

eh;ftm

eh; 1�: (21)

� Variance feature fvaeh: The 44-dimensional variance

feature is a generalized version of the color/texturedifference feature used in the pairwise graph. Wecalculate 14 color variances among superpixels in ahyperedge based on the average RGB and HSV val-ues and the hue/saturation histograms with eightbins. In addition, 30 texture variances from 15 meantexture responses and texture response histogramwith 15 bins are incorporated into the variance fea-ture vector.

� Edge strength feature feeh: The 15-dimensional edge

strength feature feeh

is a ‘1-normalized histogram ofthe quantized edge strengths of neighboring super-pixels in eh.

� Template matching feature ftmeh: The 44-dimensional

color/texture features and five-dimensional shape/

1766 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 7: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

location features of all (task-specific ground truth)regions in the training images are clustered usingk-means with k ¼ 100 to obtain 100 representative tem-plates of distinct regions. The 100-dimensional templatematching feature vector is composed of the matchingscores between a region defined by hyperedge andthese templates using theGaussian RBF kernel.

Note that in each feature vector, the bias (=1) is aug-mented in order to obtain a proper similarity/homogeneitymeasure which can either be positive or negative.

4 STRUCTURAL LEARNING

The proposed discriminant function is defined over thesuperpixel graph, and therefore, the ground-truth segmenta-tion needs to be transformed to the ground-truth edge labelsin the superpixel graph. For this, we first assign a single dom-inant segment label to each superpixel by majority votingover the superpixel’s constituent pixels and then obtain theground-truth edge labels according to whether dominantlabels of superpixels in a hyperedge are equal or not.

Using this ground-truth edge labels of the training data,we use the S-SVM to estimate the parameter vector fortask-specific CC. We use the cutting plane algorithm withLP relaxation (17) for loss-augmented inference to solvethe optimization problem of the S-SVM, since fast conver-gence and high robustness of the cutting plane algorithmin handling a large number of margin constraints are well-known [14].

4.1 Structured Support Vector Machine

Given N training samples fðxn;ynÞgNn¼1 where yn is theground-truth edge labels for the nth training image xn, theS-SVM [14] optimizes w by minimizing a quadratic objec-tive function subject to a set of linear margin constraints:

minw;�

1

2kwk2 þ C

XNn¼1

�n

s:t: 8n;y 2 ZðHGÞ;hw;DFðxn;yÞi � Dðyn;yÞ � �n;

8n; �n � 0;

(22)

where DFðxn;yÞ ¼ Fðxn;ynÞ �Fðxn;yÞ, and C < 0 is aconstant that controls the tradeoff between margin maxi-mization and training error minimization. In the S-SVM,the margin is scaled with a loss Dðyn;yÞ, which is thedifference measure between prediction y and ground-truth label yn of the nth image. The S-SVM offers goodgeneralization ability as well as the flexibility to chooseany loss function [14].

4.2 Cutting Plane Algorithm

The exponentially large number of margin constraints andthe intractability of the loss-augmented inference problemmake it difficult to solve the constrained optimizationproblem of (22). Therefore, we apply the cutting planealgorithm [14] to approximately solve the constrainedoptimization problem. The cutting plane algorithm is sum-marized in Algorithm 2. In each iteration, the most vio-lated constraint for each training sample is approximately

found by performing the loss-augmented inference usingthe LP relaxation. The computational cost for inferencecan be greatly reduced when a decomposable loss such asthe Hamming loss is used. When a loss function can bedecomposed in the same manner as the joint feature map,it can be added to each edge score in the inference. It canthen be checked whether the constraint found tightensthe feasible set of (22) or not, and when it does, then theparameter vector w and � are updated by solving therestricted problem of (22) on the current set of active con-straints that includes it. The theoretical convergence androbustness of the cutting plane algorithm was studiedby Tsochantaridis et al. [14]. The LP relaxations for loss-augmented inferences are considered to be well suited tostructured learning [39], [40], [41].

4.3 Label Loss

A non-negative and decomposable loss functionD : Z Z ! Rþ enables efficient loss-augmented inferencein the cutting plane algorithm. The loss can be absorbedinto the edge homogeneity, and the loss-augmented infer-encing can be performed by the LP relaxation which is usedin the original inference.

The most popular loss function that is non-negativeand decomposable is the Hamming loss which is equiva-lent to the number of mismatches between yn and y atthe edge level in this CC. In the proposed CC for imagesegmentation, however, the number of edges which arelabeled as 1 is considerably higher than that of edgeswhich are labeled as 0. This imbalance leads to the clus-tering of the whole image as one segment when we usethe Hamming loss in the S-SVM. Therefore, we use thefollowing modified Hamming loss function:

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1767

Page 8: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

Dðyn;yÞ ¼Xe2E

De

�yne ; ye

�(23)

¼Xep2Ep

�Rpy

nepþ yep � ðRp þ 1Þynepyep

�þDn

Xeh2Eh

�Rhy

nehþ yeh � ðRh þ 1Þynehyeh

�;

(24)

where Dn is the relative weight of the loss at higher-order edge level to that of the loss at pairwise edge level.In addition, Rp and Rh control the relative importancebetween the incorrect merging of the superpixels and theincorrect separation of the superpixels by imposing dif-ferent weights to the false negative and the false posi-tive, as shown in Table 2. Here, we set Dn ¼ jEpj

jEhj, andboth Rp and Rh are set to be less than 1 to overcome theunbalanced problem mentioned above.

5 EXPERIMENTS

To evaluate segmentations obtained by various algo-rithms against the ground-truth segmentation, we con-ducted image segmentations on three benchmark datasets: Stanford background data set (SBD) [2], Berkeleysegmentation data set (BSDS) [42], and MSRC data set[43]. For image segmentation based on CC, we initiallyobtain baseline superpixels (438 superpixels per image onaverage) by the gPb contour detector and the orientedwatershed transform [32] and then construct a hyper-graph. The function parameters are initially set to zero,and then based on the S-SVM, the structured outputlearning is used to estimate the parameter vectors. Notethat the relaxed solutions in loss-augmented inference areused during training, while in testing, our simple round-ing method is used to produce valid segmentation results.Rounding is only necessary when the LP relaxation failsto be exact, that is, when fractional solutions from LP-relaxed CC are obtained.

We compared the proposed HO-CC to the followingthree unsupervised and three supervised image segmenta-tion algorithms:

� Mean-shift: Comaniciu and Meer [7] devised amode-seeking algorithm to locate points oflocally-maximal density in a feature space.

� Multiscale NCut: Cour et al. [44] devised a multi-scale spectral image segmentation algorithm bydecomposing an image partitioning graph into dif-ferent scales in the normalized cut framework.

� gPb-owt-ucm: The oriented watershed transform—ultrametric contour map algorithm [32] produceshierarchical regions of superpixels obtained by usingthe gPb contour detector.

� gPb-Hoiem: Hoiem et al. [4] grouped superpixelsbased on pairwise same-label likelihoods. Thesuperpixels were obtained by the gPb contourdetector, and the pairwise same-label likelihoodsestimated by a boosted decision tree were inde-pendently learnt from the training data where thesame 321-dimensional pairwise feature vector wasused as an input to the boosted decision tree.

� Supervised NCut: A supervised learning algorithmfor parameter estimation under the normalized cutframework is applied. For this, the affinity matrix onthe same pairwise superpixel graph is defined as

Ajk ¼ minð1; expf�hw;fjkigÞ; if ðj; kÞ 2 E;0; otherwise;

where the same 321-dimensional pairwise featurevector fjk was used. Afterwards, the standard pair-wise affinity learning with the square-square lossfunction and the gradient descent algorithm [45] isused for supervised training.

� PW-CC: The PW-CC is described in Section 2. Apairwise superpixel graph is obtained with the same321-dimensional pairwise feature vector.

Note that we used the codes publicly released by theauthors for Mean-shift, (multiscale) NCut, gPb-owt-ucm,and gPb-hoiem. Specifically, when we performed the super-vised image segmentation algorithms such as the gPb-hoiem and supervised NCut, we modified each code to usethe same pairwise feature vector as for our method.

We consider four performance measures: probabilisticRand index (PRI) [46], segmentation covering (SCO) [32], varia-tion of information (VOI) [47], and boundary displacement error(BDE) [48]. When the predicted segmentation is close to theground-truth segmentation, the PRI and SCO increaseswhile the VOI and BDE decreases.

An implementation of the HO-CC is available at http://slsp.kaist.ac.kr/xe/?mid=software.

5.1 Stanford Background Data Set

The SBD consists of 715 outdoor images with correspondingpixel-wise annotations such that each pixel is labeled witheither one of seven background classes or a generic fore-ground class. From the given pixel-wise ground-truth anno-tations, we obtain a ground-truth segmentation for eachimage. We employed five-fold cross-validation with thedata set randomly split into 572 training images and 143 testimages for each fold.

Fig. 4 shows the four measures obtained from seg-mentation results according to the average number ofregions. Note that the performance varies with differentnumbers of regions, and for this reason, we designedeach algorithm to produce multiple segmentations (20 to40 regions). Specifically, multiple segmentations in theproposed algorithm were obtained by varying Rp (0.01-0.15) and Rh (0.4-0.6) in the loss function during training.When Rh is fixed, as Rp increases, the number of seg-mented regions of a test image tends to decrease, sincethe false negative error is penalized more compared tothe false positive error. The same observation is also

TABLE 2Label Loss at the Edge Level

1768 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 9: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

verified when Rp is fixed and Rh increases. Irrespectiveof the measure, the proposed HO-CC performed betterthan other algorithms including the PW-CC.

Fig. 5 shows some examples of segmentations. The pro-posed HO-CC yielded the best segmentation results. Incor-rectly predicted segments by the PW-CC were reduced in

Fig. 4. Obtained evaluation measures from segmentation results on the SBD.

Fig. 5. Results of image segmentation on the SBD.

Fig. 6. Obtained evaluation measures from segmentation results according to the different set of features on the SBD.

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1769

Page 10: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

the segmentation results obtained by the HO-CC owing tothe higher-order relations in broad regions. The gPb-Hoiemand the supervised NCut treat each edge as an independentpairwise instance during training, therefore, the segmenta-tion results are not stable (producing inconsistent localregions) even though it uses the same pairwise features.

Regarding the runtime of our algorithm, we observedthat for test-time inference it took on average around15 seconds (graph construction and feature extraction:14s, LP: 1s) per image on a 2.67 GHz processor, whereasthe overall training took 20 hours on the training set. Interms of the LP runtime, HO-CC took about four timesmore time than PW-CC on average.

The performance improvement is obtained from bothhigher-order features and higher-order constraints. Seg-mentation results obtained by HO-CC without higher-orderfeatures were observed to be very similar to those obtainedby PW-CC: without higher-order features, higher-orderconstraints did not tighten the relaxation for PW-CC. How-ever, as shown in Fig. 6, we observed that the performancegap between the HO-CC with the full higher-order featurevector (160-dim) and the HO-CC with the simple higher-order feature vector (45-dim, variances only) was smallerthan that between the HO-CC with the simple higher-orderfeature vector and the PW-CC.

In order to confirm improvements obtained by HO-CCare statistically significant, we performed statistical hypoth-esis tests for each performance measure. The Friedman test[49], [50] was used to evaluate the null-hypothesis that all

the algorithms perform equally well. Table 3 shows theobtained average ranks. Under the null-hypothesis, all aver-age ranks should be equal. However, as shown in Table 3,the ranks are different, and the null-hypothesis is rejectedfor all the four measures. This is also verified by theobtained p-values which are numerically equal to zeros forall the four measures. Furthermore, we performed a post-hoc test, called Nemenyi test [50], [51] for pairwise compari-son of algorithms, testing for the null-hypothesis of pair-wise equal performance. The Nemenyi test is based on the

TABLE 3Average Ranks by Friedman Test on the SBD

Fig. 7. Examples of partitionings by multiple human subjects and singleprobabilistic (real-valued) ground-truth partitioning.

TABLE 4Quantitative Results on the BSDS300 Test Set

Fig. 8. Boundary precision-recall curve on the BSDS300 test set.

Fig. 9. Obtained evaluation measures from segmentation results of gPb-owt-ucm, PW-CC, and HO-CC on the BSDS300 test set according tothe average number of regions.

1770 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 11: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

difference of the average performance ranks achieved bythe algorithms; if the difference between two ranks exceedsa critical value, the null-hypothesis is refuted. As a result,at the level a ¼ 0:05, with the PRI and BDE measures,the HO-CC is statistically significantly superior to allother algorithms except PW-CC, with the VOI measures,

the HO-CC is statistically significantly superior to all otheralgorithms except Mean-shift, and with the SCO measures,the HO-CC is statistically significantly superior to allother algorithms.

5.2 Berkeley Segmentation Data Set

The BSDS300 contains 300 natural images split into the 200training images and 100 test images. Since each image issegmented by multiple human subjects, we defined a singleprobabilistic (real-valued) ground-truth segmentation ofeach image for training in the proposed HO-CC (see Fig. 7).The gPb-Hoiem and the supervised NCut used a differentground-truth for training on the BSDS: declare two super-pixels to lie in the same segment only if all human subjectsdeclare them to lie in the same segment.

TABLE 5Quantitative Results on the BSDS500 Test Set

Fig. 10. Boundary precision-recall curve on the BSDS500 test set.

TABLE 6Quantitative Results on the BSDS500 Test Set

According to the Number of Layers

TABLE 7Quantitative Results on the BSDS500 Test SetAccording to Different Superpixel-Groupings

for Hypergraph Construction

Fig. 11. Results of image segmentation on the BSDS test set.

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1771

Page 12: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

Table 4 and Fig. 8 shows the obtained results at a uni-versal fixed scale (ODS) in terms of various performancemeasures including the boundary F-measure and theboundary precision-recall curve. Note that for each algo-rithm, the same parameters which produce the best F-measure were used for all other performance measuresin evaluating algorithms. For example, the level-thresh-old of 0.12 for gPb-owt-ucm, Rp of 0.15 for the PW-CC,and ðRp; RhÞ of (0.01,0.1) for the HO-CC were used forproducing segmentation results at ODS listed in Table 4,since these values gave the best results with regards tothe F-measure. Irrespective of the measure, the proposedHO-CC gave the best results, which are similar or evenbetter than the best results ever reported on the BSDS300[32], [52], [53].

We changed the level-threshold for the gPb-owt-ucmand Rp and Rh for the PW-CC and HO-CC to produce dif-ferent numbers of regions per image, on average, andobserved that the HO-CC always performed better thanthe PW-CC and the gPb-owt-ucm (see Fig. 9), as on theSBD. Improvement of 1 percent in PRI, 1.5 percent inSCO, 0.1 in VOI, and 1 pixel in BDE on the BSDS test setis comparable to the improvements reported in [32], [52](1 percent in PRI, 2 percent in SCO, 0.05 in VOI, and1 pixel in BDE). We observed that in comparison to thePW-CC, by the proposed HO-CC, 78 segmentation resultswere improved, 9 results did not change, and the rest 13results got worse on the BSDS test set.

We also performed experiments on the BSDS500 data setand obtained the results at ODS. As shown in Table 5 andFig. 10, the HO-CC performed the best on the BSDS500.

We increased the number of layers from two to three bysplitting the original higher-order layer into two layersaccording to the edge-strengths obtained from the gPb-owt,then assigned different parameter vectors to each layer. Theobtained performance is shown in Table 6. The performanceof the hypergraph which has the three layers (HO-CC-Layer3) was a little improved in comparison to that of thehypergraph which has the two layers (HO-CC-Layer2). Thissmall improvement is due to a small number of hyperedgesassociated with the third layer.

The performances obtained by HO-CC might be influ-enced by candidate regions for defining higher-orderedges. Therefore, we used a different superpixel-groupingmethod—category independent object proposals (CIOP)[54]. As shown in the Table 7, the hypergraphs based on thegPb-owt performed a little better than that based on thegPb-CIOP, but the gap is not critical.

Fig. 11 shows some example segmentations on BSDS testimages obtained by various segmentation algorithms. Theproposed HO-CC yielded the best segmentation results.

5.3 MSRC Data Set

The MSRC data set is composed of 591 natural images. Wesplit the data into 45 percent training, 10 percent validation,and 45 percent test sets, following [43]. We used theground-truth object instance labeling of [55], which doesnot contain void regions and is more precise than the origi-nal ground-truth, for both training and testing (includingthe performance evaluation) on the MSRC. On average, allpartitioning algorithms were set to produce approximately15 disjoint regions per image on the MSRC data set. Regard-ing the performances according to the number of regions,we observed the same tendency on the MSRC data set as on

TABLE 8Quantitative Results on the MSRC Test Set

Fig. 12. Results of image segmentation on the MSRC test set.

1772 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014

Page 13: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

the BSDS data set. As shown in Table 8 and Fig. 12, the pro-posed HO-CC gave the best results on the test set.

We also trained on the MSRC data set and tested on theBSDS data set. This decreases the performance over trainingand testing on the BSDS data set. This observation is alsotrue in the reverse direction, i.e., when training on the BSDSdata set and testing on the MSRC data set. Overall, this sug-gests that the two data sets have different statistics, and theproposed framework allows the segmentation to be tunedto the particular data set at hand.

6 CONCLUSION

This paper proposed theHO-CC over a hypergraph tomergesuperpixels into homogeneous regions. The LP relaxationwas used to approximately solve the inference problem overa hypergraph where a rich feature vector was defined basedon several visual cues involving higher-order relationsamong superpixels. The S-SVM was used for supervisedtraining of parameters in CC, and the cutting plane algo-rithm with LP-relaxed inference was applied to solve theoptimization problem of S-SVM. Experimental resultsshowed that the proposed HO-CC outperformed otherimage segmentation algorithms on various data sets. Theproposed framework is applicable to a variety of other tasks.

ACKNOWLEDGMENTS

This work was supported by the National Research Founda-tion of Korea (NRF) grant funded by the Korea government(MSIP) (No. NRF-2011-0017202 and No. NRF-2010-0028680).

REFERENCES

[1] L. Ladick�y, C. Russell, P. Kohli, and P.H.S. Torr, “AssociativeHierarchical CRFs for Object Class Image Segmentation,” Proc.IEEE Int’l Conf. Computer Vision, 2009.

[2] S. Gould, R. Fulton, and D. Koller, “Decomposing a Scene intoGeometric and Semantically Consistent Regions,” Proc. IEEE Int’lConf. Computer Vision, 2009.

[3] M.P. Kumar and D. Koller, “Efficiently Selecting Regions forScene Understanding,” Proc. IEEE Conf. Computer Vision and Pat-tern Recognition, 2010.

[4] D. Hoiem, A.A. Efros, and M. Hebert, “Recovering Surface Layoutfrom an Image,” Int’l J. Computer Vision, vol. 75, pp. 151-172, 2007.

[5] B. Liu, S. Gould, and D. Koller, “Single Image Depth Estimationfrom Predicted Semantic Labels,” Proc. IEEE Conf. Computer Visionand Pattern Recognition, 2010.

[6] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman,and A. Wu, “An Efficient K-Means Clustering Algorithm: Analy-sis and Implementation,” IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 24, no. 7, pp. 881-892, July 2002.

[7] D. Comaniciu and P. Meer, “Mean Shift: A Robust ApproachToward Feature Space Analysis,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.

[8] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld:Image Segmentation Using Expectation-Maximization and ItsApplication to Image Querying,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.

[9] F. Estrada and A. Jepson, “Spectral Embedding and Mincut forImage Segmentation,” Proc. British Machine Vision Conf. (BMVC),2004.

[10] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8,pp. 888-905, Aug. 2000.

[11] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-BasedImage Segmentation,” Int’l J. Computer Vision, vol. 59, pp. 167-181,2004.

[12] F. Estrada and A. Jepson, “Benchmarking Image SegmentationAlgorithms,” Int’l J. Computer Vision, vol. 85, pp. 167-181, 2009.

[13] N. Bansal, A. Blum, and S. Chawla, “Correlation Clustering,”Machine Learning, vol. 56, pp. 89-113, 2004.

[14] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun,“Large Margin Methods for Structured and Independent Out-put Variables,” J. Machine Learning Research, vol. 6, pp. 1453-1484, 2005.

[15] T. Finley and T. Joachims, “Supervised Clustering with SupportVector Machines,” Proc. Int’l Conf. Machine Learning, 2005.

[16] B. Taskar, “Learning Structured Prediction Models: A Large Mar-gin Approach,” PhD dissertation, Stanford Univ, 2004.

[17] S. Kim, S. Nowozin, P. Kohli, and C.D. Yoo, “Task-Specific ImagePartitioning,” IEEE Trans. Image Processing, vol. 22, no. 2, pp. 488-500, Feb. 2013.

[18] C. Berge, Hypergraphs. North-Holland, 1989.[19] L.Ding andA. Yilmaz, “Image Segmentation as Learning onHyper-

graphs,” Proc. Int’l Conf. Machine Learning and Applications, 2008.[20] S. Rital, “Hypergraph Cuts and Unsupervised Representation for

Image Segmentation,” Fundamenta Informaticae, vol. 96, pp. 153-179, 2009.

[21] A. Ducournau, S. Rital, A. Bretto, and B. Laget, “A MultilevelSpectral Hypergraph Partitioning Approach for Color ImageSegmentation,” Proc. IEEE Int’l Conf. Signal and Image ProcessingApplications, 2009.

[22] F. Bach and M.I. Jordan, “Learning Spectral Clustering,” Proc.Neural Information Processing Systems, 2003.

[23] T. Cour, N. Gogin, and J. Shi, “Learning Spectral GraphSegmentation,” Proc. Int’l Conf. Artificial Intelligence and Statistics,2005.

[24] S. Kim, S. Nowozin, P. Kohli, and C.D. Yoo, “Higher-Order Corre-lation Clustering for Image Segmentation,” Proc. Neural Informa-tion Processing Systems, 2011.

[25] T. Joachims and J.E. Hopcroft, “Error Bounds for CorrelationClustering,” Proc. Int’l Conf. Machine Learning, 2005.

[26] A. McCallum and B. Wellner, “Toward Conditional Models ofIdentity Uncertainty with Application to Proper NounCoreference,” Proc. IJCAI Workshop Information Integration on theWeb, 2003.

[27] S. Chopra and M.R. Rao, “The Partition Problem,” Math. Program,vol. 59, pp. 87-115, 1993.

[28] S. Nowozin and S. Jegelka, “Solution Stability in Linear Program-ming Relaxations: Graph Partitioning and UnsupervisedLearning,” Proc. Int’l Conf. Machine Learning, 2009.

[29] M.M. Deza, M. Gr€otschel, and M. Laurent, “Clique-Web Facets forMulticut Polytopes,” Math. Operations Research, vol. 17, no. 4,pp. 981-1000, 1992.

[30] M.M. Deza and M. Laurent, “Geometry of Cuts and Metrics,” ser.Algorithms and Combinatorics, vol. 15, Springer, 1997.

[31] M. Gr€otschel, L. Lov�asz, and A. Schrijver, “The Ellipsoid Methodand Its Consequences in Combinatorial Optimization,” Combina-torica, vol. 1, pp. 169-197, 1981.

[32] P. Arbel�aez, M. Maire, C. Fowlkes, and J. Malik, “Contour Detec-tion and Hierarchical Image Segmentation,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 33, no. 5, pp. 898-916, May2011.

[33] L. Wolsey, Integer Programming. John Wiley & Sons, 1998.[34] P. Kohli, L. Ladick�y, and P.H.S. Torr, “Robust Higher Order

Potentials for Enforcing Label Consistency,” Int’l J. ComputerVision, vol. 82, pp. 302-324, 2009.

[35] L. Ding and A. Yilmaz, “Interactive Image Segmentation UsingProbabilistic Hypergraphs,” Pattern Recognition, vol. 43, pp. 1863-1873, 2010.

[36] O. Pele and M. Werman, “Fast and Robust Earth Mover’s Dis-tances,” Proc. IEEE Int’l Conf. Computer Vision, 2009.

[37] T. Leung and J. Malik, “Representing and Recognizing the VisualAppearance of Materials Using Three-Dimensional Textons,” Int’lJ. Computer Vision, vol. 43, pp. 29-44, 2001.

[38] D. Batra, R. Sukthankar, and T. Chen, “Learning Class-SpecificAffinities for Image Labelling,” Proc. IEEE Conf. Computer Visionand Pattern Recognition, 2008.

[39] T. Finley and T. Joachims, “Training Structural SVMs WhenExact Inference Is Intractable,” Proc. Int’l Conf. Machine Learn-ing, 2008.

[40] A. Kulesza and F. Pereira, “Structured Learning with Approxi-mate Inference,” Proc. Neural Information Processing Systems, 2007.

[41] A.F.T. Martins, N.A. Smith, and E.P. Xing, “Polyhedral OuterApproximations with Application to Natural Language Parsing,”Proc. Int’l Conf. Machine Learning, 2009.

KIM ET AL.: IMAGE SEGMENTATION USING HIGHER-ORDER CORRELATION CLUSTERING 1773

Page 14: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …slsp.kaist.ac.kr/paperdata/Image_Segmentation_Using… ·  · 2014-10-03Image Segmentation Using ... document clustering, and image segmentation.

[42] C. Fowlkes, D. Martin, and J. Malik, The Berkeley SegmentationData Set and Benchmark (BSDB), http://www.cs.berkeley.edu/projects/vision/grouping/, 2014.

[43] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost:Joint Appearence, Shape and Context Modeling for Multi-ClassObject Recognition and Segmentation,” Proc. European Conf. Com-puter Vision (ECCV), 2006.

[44] T. Cour, F. Benezit, and J. Shi, “Spectral Segmentation with Multi-scale Graph Decomposition,” Proc. IEEE Conf. Computer Vision andPattern Recognition, 2005.

[45] S. Turaga, K. Briggman, M. Helmstaedter, W. Denk, and H. Seung,“Maximin Affinity Learning of Image Segmentation,” Proc. NeuralInformation Processing Systems, 2009.

[46] W.M. Rand, “Objective Criteria for the Evaluation of ClusteringMethods,” J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.

[47] M. Meila, “Computing Clusterings: An Axiomatic View,” Proc.Int’l Conf. Machine Learning, 2005.

[48] J. Freixenet, X. Munoz, D. Raba, J. Marti, and X. Cufi, “YetAnother Survey on Image Segmentation: Region and BoundaryInformation Integration,” Proc. European Conf. Computer Vision(ECCV), 2002.

[49] M. Friedman, “A Comparison of Alternative Tests of Significancefor the Problem of M Rankings,” Annals of Math. Statistics, vol. 11,pp. 86-92, 1940.

[50] J. Demsar, “Statistical Comparisons of Classifiers over MultipleData Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.

[51] P.B. Nemenyi, “Distribution-Free Multiple Comparisons,” PhDdissertation, Princeton Univ, 1963.

[52] T. Kim, K. Lee, and S. Lee, “Learning Full Pairwise Affinities forSpectral Segmentation,” Proc. IEEE Conf. Computer Vision and Pat-tern Recognition, 2010.

[53] S.R. Rao, H. Mobahi, A.Y. Yang, S.S. Sastry, and Y. Ma, “NaturalImage Segmentation with Adaptive Texture and BoundaryEncoding,” Proc. Asian Conf. Computer Vision (ACCV), 2009.

[54] I. Endres and D. Hoiem, “Category Independent ObjectProposals,” Proc. European Conf. Computer Vision (ECCV), 2010.

[55] T. Malisiewicz and A.A. Efros, “Improving Spatial Support forObjects via Multiple Segmentations,” Proc. British Machine VisionConf. (BMVC), 2007.

Sungwoong Kim (S’07-M’12) received the BSand PhD degrees in electrical engineering fromthe Korea Advanced Institute of Science andTechnology (KAIST), Daejeon, Korea, in 2004and 2011, respectively. Since 2012, he has beenwith Qualcomm Research Korea where he is asenior engineer. His research interests includemachine learning for multimedia signal process-ing, discriminative training, and graphicalmodeling.

Chang D. Yoo (S’92-M’96-SM’11) received theBS degree in engineering and applied sciencefrom the California Institute of Technology in1986, the MS degree in electrical engineeringfrom Cornell University in 1988, and the PhDdegree in electrical engineering from the Massa-chusetts Institute of Technology (MIT) in 1996.From January 1997 to March 1999, he worked atKorea Telecom as a senior researcher. He joinedthe Department of Electrical Engineering at theKorea Advanced Institute of Science and Tech-

nology in April 1999. From March 2005 to March 2006, he was with theResearch Laboratory of Electronics at MIT. His current research inter-ests include the application of machine learning and digital signal proc-essing in multimedia.

Sebastian Nowozin is a researcher in theMachine Learning and Perception group atMicrosoft Research Cambridge. He received hisMaster of Engineering degree from the ShanghaiJiaotong University (SJTU) and his diplomadegree in computer science with distinction fromthe Technical University of Berlin in 2006. Hereceived his PhD degree summa cum laude in2009 for his thesis on learning with structureddata in computer vision, completed at the MaxPlanck Institute for Biological Cybernetics,

T€ubingen and the Technical University of Berlin. His research interest isat the intersection of computer vision and machine learning. He regularlyserves as PC-member and reviewer for machine learning (NIPS, ICML,AISTATS, UAI, ECML, JMLR) and computer vision (CVPR, ICCV,ECCV, PAMI, IJCV) conferences and journals.

Pushmeet Kohli is a senior research scientist inthe Machine Learning and Perception group atMicrosoft Research Cambridge, and is a part ofthe Association for Computing Machinery’s(ACM) Distinguished Speaker Program. Hisresearch has appeared in conferences and jour-nals in Computer Vision, Machine Learning,Robotics, AI, Computer Graphics, and HCI con-ferences. He has won best paper awards in ICV-GIP 2006, 2010, ECCV 2010 and ISMAR 2011.His PhD thesis, titled “Minimizing Dynamic and

Higher Order Energy Functions using Graph Cuts”, was the winner ofthe British Machine Vision Association’s “Sullivan Doctoral ThesisAward”, and was a runner-up for the British Computer Society’s“Distinguished Dissertation Award”. Dr. Kohli’s research has also beenfeatured in popular media outlets such as Forbes, The Economic Times,New Scientist and MIT Technology Review.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

1774 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 9, SEPTEMBER 2014


Recommended