'
&
$
%
STATISTICAL SHAPE AND APPEARANCE
MODELS FOR SEGMENTATION AND
CLASSIFICATION
ANDREY LITVIN
Dissertation submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
BOSTON
UNIVERSITY
BOSTON UNIVERSITY
COLLEGE OF ENGINEERING
Dissertation
STATISTICAL SHAPE AND APPEARANCE MODELS FOR
SEGMENTATION AND CLASSIFICATION
by
ANDREY LITVIN
B.S., Saint-Petersburg State University (Russia), 1996M.S., Boston University, 2000
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
2006
Approved by
First Reader
William C. Karl, Ph.D.Professor of Electrical and Computer Engineering and Professor ofBiomedical Engineering
Second Reader
Janusz Konrad, Ph.D.Associate Professor of Electrical and Computer Engineering
Third Reader
Stan Sclaroff, Ph.D.Associate Professor of Computer Science
Fourth Reader
Jayant Shah, Ph.D.Professor of Mathematics
Acknowledgments
First, I would like to express my profound gratitude to my advisor Professor Clem Karl,
whose expertise, encouragement, patience, and help in organizing and directing my own
ideas had an unmeasurable importance at all stages of my work. I am particularly thankful
for my advisor’s support in publishing our results, presenting them at conferences, and
writing reports. His mentoring helped me focus my work, develop key ideas and ultimately
made this dissertation possible. I am thankful to other committee members who provided
valuable advice at different stages of my PhD work. I would like to thank Professor Stan
Sclaroff for important suggestions in my prospectus preparation as well as for valuable
discussions. I also would like to thank Professor Janusz Konrad for support throughout
my PhD study. Professor Konrad’s advice regarding my research was very helpful at
uncountable occasions. I am particularly thankful to Professor Konrad for advising me
in a separate research project that did not become a part of this dissertation. I would
like to express particular gratitude to professor Jayant Shah for many discussion we had
at early stages of my PhD study. His expertise motivated me in several directions that I
have taken, especially in shape classification focus area of this thesis. I am also thankful
to Professor David Castanon for valuable advice throughout my PhD study. I am thankful
to all Boston University faculty members for support and high quality teaching. I am
particularly thankful to all ISS group faculty memebers, whose classes provided me with
important foundations and interesting ideas for my research. I recognize that this work
would not be possible without financial support provided to me by my advisor, Professor
Clem Karl and the department of Electrical and Computer Engineering.
I would like to thank other collegues from our lab who were part of my life throughout
my time at Boston University and made my experience enjoyable in many aspects. Yong-
gang Shi has been a great motivation for my own research. Our many discussions helped
me to formulate my ideas. Our collaboration on travel arrangements made our conference
trips productive and enjoyable. I would like to thank Robert Weisenseel for encourage-
iii
ment and help in taking the right strategy in my research, as well as for great help in
maintaining computational resources in our lab. I appreciate John Kaufhold’s suggestions
in medical imaging research trends and industry opportunities. Zhengrong Ying has been
my teammate in class projects and a resourceful person who could offer advice in all as-
pects of life. I am happy to have many other friends in our ISS lab: Shuchin Aeron, Julia
Pavlovich, Karen Jenkins, George Atia. I thankfully remember Mirko Ristivojevic’s and
Nikola Bozinovic’s companionship during our research visit to I3S lab at Sophia-Antipolis,
France in June 2004. Without them, that experience would not be productive. I would
like to thank Professor Michel Barlaud from I3S lab for his assistance during our visit. I
would like to thank many other students from our lab who helped to enrich my experience,
among them Zhuangli Liang, Serdar Ince, Mujdat Cetin, Lingmin Meng and others.
I would like to thank Shen Hong and Shuping Qing at Siemens Corporate Research,
who provided me with guidance and shared valuable experience during my internship. I
am grateful to my other Siemens colleagues and friends, including Mikael Rousson, Jian
Li, Jie Shao, and Vassilis Athitsos for valuable ideas and support.
I would like to thank Dr. David Kennedy, at Center for Morphometric Analysis of
Harvard Medical School and Massachussetts General Hospital, for providing the brain
MRI data used in this dissertation. The knee MRI data used in this work were provided
by Paul Debevec at ICT Graphics lab.
I would also like to thank my family for the support they provided me through my
entire life and, in particular, I must acknowledge my wife Dan Xie, without whose love,
support, and help, I would not have finished this thesis.
iv
STATISTICAL SHAPE AND APPEARANCE MODELS FOR
SEGMENTATION AND CLASSIFICATION
(Order No. )
ANDREY LITVIN
Boston University, College of Engineering, 2006
Major Professor: William C. Karl, Ph.D.Professor of Electrical and Computer Engineering,Professor of Biomedical Engineering
ABSTRACT
In this dissertation we develop and apply models of shape and models of image intensities
(appearance models) in object-based image processing tasks. We make contributions in
three areas of interest: constructing novel flexible models of shape and of image intensities,
using these models to extract object boundaries from images, and analyzing differences
between groups of shapes from given, extracted object boundaries.
In the shape and appearance model construction and application areas of focus we
are motivated by the task of extracting the object boundaries from images by an evolving
closed curve technique named curve-evolution. We develop and apply novel models of shape
and models of appearance for incorporation in such curve-evolution-based object boundary
extraction. In our first major contribution, we start with the statistical shape model based
on maximum entropy principle and designed to capture perceptual shape similarity of
training shape samples. In sampling experiments, this statistical shape model has been
shown to generate new shape samples with prominent visual features of the original training
shapes used to construct the model. For the first time, we develop methods to incorporate
this maximum entropy model into object boundary extraction tasks. We show that indeed
incorporation of such a prior can have a dramatic effect in object boundary extraction
v
problems, favoring the solution similar to the training shapes.
In our next major contribution, we develop a new model of shape based on the notion of
shape distributions. Shape distributions have been introduced as cumulative distribution
functions of parameters continuously defined on contours. Shape distributions have been
used before for shape classification tasks, but our work is their first use for object bound-
ary extraction. The resulting shape models show an excellent ability to preserve prominent
visual object structures during boundary extraction in challenging segmentation problems
involving high noise, object obstruction, and weak or even missing intensity edges. Fur-
ther, these models exhibit robustness to limited training data. These models eliminate the
need for shape alignment at the model construction and estimationsteps, often a difficult
and critical task. We further extend these models to capture information on the relative
configurations of multiple contours, which helps to extract multiple boundaries more effi-
ciently. We also extend the shape distribution concept to model image intensities. This
allows us to achieve superior results on images where the desired object boundaries do not
coincide with visible edges and where image regions cannot be identified based on their
homogeneity.
In another major contribution, we focus on the identification and analysis of the dif-
ferences in extracted Corpus Callosum shapes of the brain. Historically, such analysis has
been based solely on area or volume measures. In contrast, we use medial axis based rep-
resentations of the shape to capture a far richer understanding of the underlying shape.
We develop statistically-based feature ranking metrics in order to reduce the dimension
of the original feature space, construct shape classifiers, and visualize inter-class shape
differences.
vi
Preface
The work presented in this dissertation was carried out during my time as a graduate
student at Boston University. This period was from the Spring semester of 1999 through
the Spring semester of 2006. The publications directly resulting from this research and on
which I am listed as the first author, can be found in the references and are cited below.
(Litvin and Karl, 2002)
(Litvin and Karl, 2003)
(Litvin and Karl, 2004a)
(Litvin and Karl, 2004b)
(Litvin and Karl, 2005a)
(Litvin et al., 2006)
Additionally, several papers on unrelated research were published during my stay at
Boston University on which I am listed as an author.
(Litvin et al., 2000a)
(Litvin et al., 2000b)
(Alcayde et al., 2001)
(Litvin et al., 2003)
vii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Major contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 8
2.1 Object based image processing . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Curve evolution framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Energy minimization based curve evolution . . . . . . . . . . . . . . 10
2.2.2 Probabilistic formulation for curve evolution . . . . . . . . . . . . . 12
2.3 Shape parameterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Parametric non-curve object representations . . . . . . . . . . . . . . 14
2.3.2 Explicit curve-based parameterization . . . . . . . . . . . . . . . . . 15
2.3.3 Implicit curve parameterization by level sets . . . . . . . . . . . . . . 15
2.4 Shape modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Constructing distance measures on shapes . . . . . . . . . . . . . . . 18
2.4.2 Constructing shape variability model . . . . . . . . . . . . . . . . . . 19
2.5 Shape distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Shape model prior work summary and motivation . . . . . . . . . . . . . . . 29
2.7 Appearance models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.1 Role of distribution in appearance modeling and histogram equalization 33
2.8 Distribution difference measures . . . . . . . . . . . . . . . . . . . . . . . . 34
2.9 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
viii
3 Maximum entropy shape model as a curve evolution prior 42
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Constructing a pdf on the space of shapes . . . . . . . . . . . . . . . . . . . 43
3.3 Application to curve-evolution based segmentation . . . . . . . . . . . . . . 49
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Shape-distribution-based prior shape model 56
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Our Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 A prior energy based on shape distributions . . . . . . . . . . . . . . 59
4.3 Minimizing flow computation . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Exact solution using variational framework . . . . . . . . . . . . . . 62
4.3.2 Numerical solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Feature function choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Intensity histogram equalization connection . . . . . . . . . . . . . . . . . . 70
4.6 Extension to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.6.1 Formulation 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6.2 Surface flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6.3 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 Extensions and additional issues . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7.1 Weighted distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7.2 Geodesic distance between distributions . . . . . . . . . . . . . . . . 75
4.7.3 Shape distribution uniqueness issues . . . . . . . . . . . . . . . . . . 77
4.7.4 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Applications of shape distribution based shape priors 80
5.1 Shape focusing by shape term guided evolution . . . . . . . . . . . . . . . . 80
5.2 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
ix
5.3 Image segmentation with occlusion . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 Average shape computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6 Joint segmentation of multiple objects using shape distribution based
shape prior 113
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.1 Flow computation for inter-object distance feature function . . . . . 118
6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.4 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7 Shape and appearance modeling with feature distributions for image seg-
mentation 125
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2 Shape distribution principles . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3 Extension to combined intensity and shape priors . . . . . . . . . . . . . . . 127
7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.5 Multivariate distributions extension . . . . . . . . . . . . . . . . . . . . . . 138
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8 Shape-Based Classification and Morphological Analysis using Medial Axes
and Feature Selection 141
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2 Data and skeleton-based feature extraction . . . . . . . . . . . . . . . . . . 143
8.2.1 Fixed topology skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.2.2 Nested local symmetry sets method . . . . . . . . . . . . . . . . . . 146
8.2.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.3 Inter-class shape differences. Detection and visualization. . . . . . . . . . . 151
8.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
x
9 Conclusions and future research 165
9.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
A Variational solution for the curve flow minimizing shape distribution
based prior energy 169
A.1 Inter-point distance function . . . . . . . . . . . . . . . . . . . . . . . . . . 170
A.2 Boundary curvature feature function . . . . . . . . . . . . . . . . . . . . . . 174
A.3 Multiscale curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
A.3.1 Computation of feature function . . . . . . . . . . . . . . . . . . . . 176
A.3.2 Curve flow computation . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.4 Feature classes with weighting function . . . . . . . . . . . . . . . . . . . . . 184
A.4.1 Inter-point distance feature function . . . . . . . . . . . . . . . . . . 185
A.5 Relative inter-object distances. . . . . . . . . . . . . . . . . . . . . . . . . . 190
B Curve flow for intensity based feature function 193
C Multidimensional CDF based shape prior 196
References 199
Curriculum Vitae 208
xi
List of Tables
5.1 Experiment 1: Segmentation errors computed using different error measure.
First error measure is computed as a symmetric area difference (Hamming
distance in eq. 5.13) between final segmented region and true shape. The
second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 91
5.2 Experiment 2: Segmentation errors computed using different error measure.
First error measure is computed as a symmetric area difference (Hamming
distance in eq. 5.13) between final segmented region and true shape. The
second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 93
5.3 Experiment 3: Segmentation errors computed using different error measure.
First error measure is computed as a symmetric area difference (Hamming
distance in eq. 5.13) between final segmented region and true shape. The
second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 95
6.1 Symmetric difference (area based) segmentation error. For each object the
error measure is computed as a symmetric difference between final segmented
region and true segmented region. The values in the table are computed as
a sum of error measures for individual objects. . . . . . . . . . . . . . . . . 123
xii
List of Figures
2·1 Level set curve embedding. Curve Γ is given by a set of points Γ : Φ = 0,
Φ is called level set function. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2·2 An example of constructing a shape distribution for a curve (left) based on
curvature κ(s) measured along the boundary (second graph). Third and
fourth graphs show the sketches of pdf(κ) and cumulative distribution func-
tion H(κ) of the samples of curvature respectively. Note the invariance of
H(κ) with respect to the choice of the initial point of arc-length parameter-
ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2·3 L1 difference measure computed on PDFs and CDFs. Difference value is
given by shaded area. Panels (A) and (C): modes of p1 and p2 are very
close; L1 difference on CDFs produces small value. Panels (B) and (D):
modes of p1 and p2 are further from each other. L1 difference on CDFs gives
larger value, while L1 difference on PDFs does not change. . . . . . . . . . . 36
3·1 MCMC move proposal in (Zhu, 1999). Point i in configuration A is moved
into one of the 8 positions under the constraints on the length of linelets
connecting node i with its neighbors. . . . . . . . . . . . . . . . . . . . . . . 45
3·2 Our new scheme of proposed MCMC move (3 nodes, including the start-
ing node are shown). Black circles represent the initial node configuration.
White circles represent 2 candidate positions for the starting node and the
new positions of the neighbors after a starting node was moved into the new
(bottom) position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
xiii
3·3 Our scheme of open curve MCMC move proposal. Black circles represent
the initial configuration. White circles represent the final configuration after
the curve is bent at the trial node. . . . . . . . . . . . . . . . . . . . . . . . 48
3·4 Ground truth image with noise added constructed on the shape from dataset
1 (panel A) and dataset 2 (panel B). Black solid line: true shape; White
dashed line: initial contour . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3·5 Data set 1. Result of segmentation without using prior shape model (left)
and the best result using the model (right). True boundary is shown by
straight line and circles show reconstructed boundary. . . . . . . . . . . . . 52
3·6 Same as figure 3·5 for data set 2. . . . . . . . . . . . . . . . . . . . . . . . . 52
3·7 Segmentation obtained using penalty function in eq. 3.13. . . . . . . . . . . 54
4·1 An example of constructing a shape distribution for a curve (left) based
on curvature κ(s) measured along the boundary (second graph). Third
and fourth graphs show the sketches of pdf(κ) and cumulative distribution
function H(κ) of curvature respectively. Note the invariance of H(κ) with
respect to the choice of the initial point of arc-length parameterization. . . 57
4·2 Illustration of the descent on the manifold procedure to find the the curve
flow β(s)+, eq. 4.14. The surface represent the space of all realizable feature
function flows S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4·3 Feature function #1 values computed on the discretized curve. We show
interpoint distances d13..d15. . . . . . . . . . . . . . . . . . . . . . . . . . 66
4·4 Left: Graphic interpretation of the division of the space Ω into four sub-
spaces ΩS : Ω1, Ω2, Ω3, and Ω4. Corresponding intervals are [0, 0.125],
[0.125, 0.25], [0.25, 0.375], and [0.375,0.5]. Right: a curve with 3 pairs of
points, members of Ω1, Ω1, and Ω4 respectively. . . . . . . . . . . . . . . . . 68
4·5 Feature function #2 in discrete case: interpoint angles α−1,1,2..α−n,1,n are
shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
xiv
4·6 Surface triangulation by the level set function zero crossings extraction. . . 75
5·1 Evolution of an initial contour under the sole action of the prior flow: initial
(dot-dashed), target (dashed), and resulting (solid) contours. (A) - prior
constructed on the inter-point distances (#1); (B) - prior constructed on
multi-scale curvatures (#2); (C) - Both feature classes #1 and #2 are included. 82
5·2 Target plane shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5·3 Evolution of the contour under the action of prior flow: initial (dot-dashed),
and final (solid) contours. Target contour is shown in Figure 5·2. . . . . . . 83
5·4 Evolution of the contour under using multi-scale prior defined on a group of
level sets: initial (dot-dashed), and resulting (solid) contours. . . . . . . . . 87
5·5 Segmentation results. A: Our method; B: PCA; C: Curve length penalty
prior; D: Method in (Leventon et al., 2000b); White - final result; Black -
true shape boundary; Dashed line - initial curve. Symmetric area distance
(in pixels) between true boundary and final result is shown on the top of
each panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5·6 Prior shapes used to construct the prior in our experiment. A: triangular
shapes (experiment 1); B: polygonal shapes (experiment 2). . . . . . . . . . 90
5·7 Segmentation results, polygonal prior. A: Our method; B: PCA; C: Curve
length penalty prior; White - final result; Black - true shape boundary;
Dashed line - initial curve. Symmetric area distance (in pixels) between
true boundary and final result is shown on the top of each panel. . . . . . . 92
5·8 Experiment 3: (A) - training shapes; (B) - noise free image . . . . . . . . . 94
5·9 Experiment 3 segmentation results. A: Our method; B: PCA; C: Curve
length penalty prior; White - final result; Black - true shape boundary;
Dashed line - initial curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
xv
5·10 Knee cartilage segmentation results. A: Initial (dashed line) and true (solid
line) contours; B: Our method (solid line); C: Leventon’s (solid line) D:
Curve length penalty prior (solid line). . . . . . . . . . . . . . . . . . . . . . 96
5·11 Occlusion experiment 1. (A) - training shapes; (B) - true object contour
(thick line) superimposed with training shapes (thin lines). This plot illus-
trates that the prominent feature location is different in the true object and
in all training shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5·12 Occlusion experiment 1. Noisy image is shown in all four panels. (A) - Initial
contour (dashed line); (B,C,D): Dashed line - occluded region; Black solid
line - true boundary; White solid line - segmentation result. (B) - occlusion
pattern 1, result using our prior. (C) - occlusion pattern 1, result using PCA
prior (C). (D) occlusion pattern 2, result using our prior. . . . . . . . . . . 101
5·13 Experiment 2: Segmentation with occlusion. (A) - prior shapes; (B) - result
using our prior and (C) - PCA prior. Dashed white rectangle - occluded
region; Dashed white circle - initial contour; Black solid line - true boundary;
White solid line - segmentation result. . . . . . . . . . . . . . . . . . . . . . 103
5·14 Training planes shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5·15 Experiment 3: segmentation of plane with occlusion. Plane silhouette #2
was used to form the image. Dashed white rectangle - occluded region;
Dashed white smooth contour - initial curve; Black solid line - true boundary;
White solid line - segmentation result. . . . . . . . . . . . . . . . . . . . . . 105
5·16 The average shape of 2 triangles obtained using different shape distant mea-
sures: solid lines - prior shapes; dashed line - corresponding average shape;
filled areas - the family of solutions. (A) - asymmetric distance based mea-
sure; (B) - area based measure. One of the possible solution is shown by
dashed line; (C) - our distribution difference measure (dash-dotted line -
evolution result; dashed line - scaled result). . . . . . . . . . . . . . . . . . 108
xvi
5·17 Initial (dash-dotted contour) and average shapes (solid contour) for 2 groups
of shapes. Prior shapes in each group are shown on the top of each panel. . 110
5·18 Experiment 3: Example shapes from (Klassen et al., 2004). . . . . . . . . . 110
5·19 Experiment 3: Average shapes computed on shapes in Figure 5·18: (A)
Result in (Klassen et al., 2004); (B) Our result. . . . . . . . . . . . . . . . . 111
6·1 Interaction matrix graphical interpretation using directed diagram. Three
objects are sketched in the right panel with assigned object indices. Arrows
in the right panel correspond to non-zero entries in the matrix Z. . . . . . . 117
6·2 Feature function #3 used in this work illustrated for a curve C1 discretized
using 6 nodes. Feature values for curve C1 are defined as the shortest signed
distances from the curve C2 to nodes of the curve C1. . . . . . . . . . . . . 118
6·3 Synthetic 2 shape example: (A) Bi-level noise free image; (B) Segmenta-
tion with curve length prior; (C) - shape distribution prior including only
autonomous feature functions #1 and #2; (D) - shape distribution prior
including directed feature function #3 along with autonomous feature func-
tions. Solid black line shows the true objects boundaries; dashed white lines
- initial boundary position; solid lines - final boundary. . . . . . . . . . . . . 120
6·4 Brain MRI segmentation: (a) Multiple structures and interactions used
for feature function #3; (b) Segmentation with independent object curve
length prior. (c) Segmentation using multi-object PCA technique in (Tsai
et al., 2004) (d) Segmentation with new multi-object shape distribution
prior. Solid black line shows the true objects boundaries; solid white line -
final segmentation boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7·1 Image patch based feature values measured along the boundary. Point O
(patch coordinate system origin) is positioned at Γ(s) (current boundary
point). j-axis is aligned with local inward normal. Two instances are shown. 128
xvii
7·2 Example 1. Segmentation with shape/intensity distribution prior. True
shape - black solid line; Initial contour - dashed line; Final segmentation
contour - solid while line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7·3 Five points x(k) used to construct feature functions according to Eq. 7.2 in
Experiment 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7·4 Example 2. Segmentation with shape/intensity distribution prior. True
shape - black solid line; Initial contour - dashed line; Final segmentation
contour - solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7·5 Example 3. (A) - Segmentation with shape distribution prior and maximum
mutual information data term (Kim et al., 2002a); (B) - Segmentation with
shape/intensity distribution prior and shape distribution prior. True shape
- solid black line; Initial contour - dashed line; Final segmentation contour -
solid white line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7·6 Expert segmentation of the left lenticular nucleus showing variation in inten-
sity within the structure and lack of a consistent gradient along the boundary.135
7·7 Example 4. (a) Segmentation with shape/intensity distribution prior. (b)
Segmentation with only shape prior and intensity model in (Kim et al.,
2002a). True shape - black solid line; Initial contour - dashed line; Final
segmentation contour - solid line. . . . . . . . . . . . . . . . . . . . . . . . . 136
7·8 LADAR image of a tank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7·9 Semi-synthetic tank image segmentation with intensity and shape prior: (A)
- intensity and shape prior; (B) - shape prior and threshold intensity term
in eq. 7.6. True shape - solid black line; Initial contour - dashed line; Final
segmentation contour - solid white line. . . . . . . . . . . . . . . . . . . . . 139
8·1 CC shape sketch is shown along with a skeleton found using trial end points.
Maximum radius circle centered at xk is shown. . . . . . . . . . . . . . . . . 145
xviii
8·2 Extracted fixed topology skeleton. Circles represent the sampled discrete
points on the skeleton. Outside border is the segmented Corpus Callosum
shape. Dark regions near the border comprise the skeleton badness measure
in eq. 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8·3 Skeletons obtained from male subjects using the fixed topology method. . . 148
8·4 Skeletons obtained from male subjects using the nested local symmetry sets
method. Principle and secondary skeleton branches are shown. . . . . . . . 149
8·5 Features extracted from the medial axis . . . . . . . . . . . . . . . . . . . . 150
8·6 Male and female Corpus-Callosum differences and importance of individual
features using p-value based feature ranking . . . . . . . . . . . . . . . . . . 153
8·7 Normal and Schizophrenia Corpus-Callosum differences and importance of
individual features using p-value based feature ranking . . . . . . . . . . . . 154
8·8 Feature importance visualization using linear classifier weight as the feature
ranking score. Top: Male/female case; Bottom: normal/schizophrenia case. 155
8·9 Feature importance visualization using feature selection frequency as the
feature ranking score. Normal/schizophrenia case. . . . . . . . . . . . . . . 156
8·10 Male/female classification testing errors using t-test feature selection method.
Testing error is shown coded by color as a function of the number of Ada-
boost iteration (horizontal axis) and the number of features chosen (vertical
axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8·11 Classification testing error for gender and schizophrenia versus normal is
shown for different combinations of feature selection technique, classifica-
tion method, and number of features retained. “T-test; linear” - T-test
method of feature selection, MMSE classifier; “T-test; Ada Boosting” - T-
test method of feature selection, Ada Boosting classifier; “Weights; linear”
- linear weights feature selection method, MMSE classifier; “Weights; Ada
Boosting” - linear weights feature selection method, Ada Boosting classifier. 161
xix
8·12 Feature selection normalized probability. Male/female classification with
p-value feature selection. Log of the normalized probability that a given
feature is chosen in the set of N features. Horizontal axis: N - number of
selected features; Vertical axis - feature index (1 through 37). . . . . . . . . 163
A·1 Inter-point distance augmentation due to curve deformation . . . . . . . . . 171
A·2 Illustration of feature value computation for feature function #2 . . . . . . 177
A·3 Four cases of the relative positions of three curve points. The support angle
can not be determined unambiguously. . . . . . . . . . . . . . . . . . . . . . 177
A·4 Sequential computation of the angles for a particular “base” point s1, start-
ing from r = 1 (assuming inside of the curve is up-wards). . . . . . . . . . . 178
A·5 Local perturbation of the curve at point ~Γ(s1 +s2). Perturbation εβ(s1 +s2)
is infinitely small comparing to |~Γ(s1, s1 + s2)| . . . . . . . . . . . . . . . . . 180
A·6 Illustration of 2 cases when the sign of the angle increment dα(1) is different
for the same curve perturbation εβ(s). . . . . . . . . . . . . . . . . . . . . . 180
xx
List of Abbreviations
PDF . . . . . . . . . . . . . . Probability Distribution Function
CDF . . . . . . . . . . . . . . Cumulative Distribution Function
MRI . . . . . . . . . . . . . . Magnetic Resonance Imaging
CT . . . . . . . . . . . . . . Computerized Tomography
MAP . . . . . . . . . . . . . . Maximum a posteriori
PCA . . . . . . . . . . . . . . Principal Component Analysis
IID . . . . . . . . . . . . . . Independent Identically Distributed
SNR . . . . . . . . . . . . . . Signal-to-Noise Ratio
xxi
1
Chapter 1
Introduction
1.1 Motivation
Humans’ exceptional ability to interpret a visual scene is largely defined by an object-
oriented processing aptitude. Humans are able to extract the knowledge about the shapes
of objects, and efficiently distinguish common features based on a very small number of
samples. Learned knowledge of shape similarity efficiently generalizes to unseen shape
examples given large variabilities and partial observations. On the one hand, this ability
allows us to efficiently identify objects in the scene in the presence of severe noise, obscura-
tion, and clutter. Indeed, humans are so good at reasoning about and finding shapes that
we even tend to find them where none really exists, as shown by some psycho-visual exper-
iments. On the other hand, we are able to efficiently and robustly discriminate between
shapes of different categories. Naturally, the potential of using shape related concepts has
been seen in various domains, including image processing, computer vision, and applied
mathematics. Effort has been made both to study and mimic human abilities, and to
create new algorithms that otherwise perform shape oriented tasks. Gains from robust
shape oriented processing are expected in machine vision, medical imagery interpretation
and countless other applications.
In this dissertation we contribute to shape based image processing in the following
directions:
1) Modeling: This direction is focused on formulating the knowledge of commonalities
within groups of shapes (shape models), and formulating the model of appearance
of shapes in images (appearance models). These shape and appearance models are
2
designed to be used in image interpretation through the shape extraction task (Di-
rection 2). We focus on such shape and appearance modeling techniques that are
applicable in a curve evolution framework. The goal of our effort is to construct
efficient and robust models.
2) Shape extraction: This direction aims at extracting the boundary of objects of in-
terest in images. A curve evolution framework, using both shape and appearance
models, attempts to move the boundary curve to identify/segment the object. Our
focus is to adapt the models developed in Direction 1 to work in a curve evolution
framework. By combining effective shape and appearance models we aim at im-
proving segmentation results in challenging situations. These situation include noisy
images, weak edges, and high shape variability.
3) Shape Inference: Given the segmentation boundaries extracted from images, one may
be interested in further analysis of these shapes. A common problem is to detect mor-
phological differences in shapes and learn the relationships between these differences
and disease progression. Motivated by the Human Brain Project (Koslow and Huerta,
1997), we develop statistical methods for testing for morphological population differ-
ences and for subsequently identifying and localizing these morphological differences.
This work exploits a skeleton-based representation of the extracted brain shape.
The potential of shape based image processing has led to a considerable body of prior
work on various shape-related aspects. Numerous works explore human perception theories
and their direct application to shape models, theories of shapes and shape spaces, shape
extraction frameworks and algorithms. Applications range from medical image diagnostic
tools and processing aids to machine vision, communication and other areas. We review
the prior work related to our directions in Chapter 2.
In the first two directions of this dissertation we use an inherently object-based frame-
work in describing the boundary as a closed contour. A curve evolution framework is
then used to evolve the contour. In a typical curve evolution scheme, the curve is evolved
3
by the combination of data forces (dependent on the image data) and prior forces that
regularize or constrain the solution curve. Such a framework is very popular due to its
many advantages. Focusing on one boundary allows for computational efficiency of the
evolution process. Often, computations can be performed exclusively in the neighborhood
of the boundary. Curve evolution permits flexible initialization and allows easy topology
handling when combined with level set framework. Finally, different forces may be used,
making the approach flexible.
According to our focus on curve evolution based methods, we are interested in shape
modeling approaches that can be incorporated into curve evolution. However, despite the
large body of prior work on shapes, at the time this work started, combinations of curve
evolution and shape priors had been largely limited to generic priors that incorporate a
penalty on curve length, bending energy, or similar aggregate measures. Such priors can
provide a regularizing (smoothing) effect to the solution but indiscriminantly smooth out
salient features of shapes. On the other hand, they are often too generic, treating all
shapes in the same way. At present, generic priors still dominate in use within curve
evolution based methods. Our approaches to construct a prior were conceived as alterna-
tive techniques, aiming at richer, data dependent priors applicable in the curve evolution
framework. A few other approaches to construct and use a prior were proposed. Notably,
deformable template approaches explicitly define the space of deformations (for example
using PCA analysis) with respect to the “average” shape (Staib and Duncan, 1996). These
methods can work well when observed configurations are well represented in the training
data, but they suffer from poor generalizability to unseen shapes. Our approaches to con-
struct the prior are designed to address the limitations of the generic prior approaches
and approaches based on explicit templates. Our approaches have distinctively different
properties from existing approaches. First, our shape prior encodes existence of prominent
shape features while being invariant to the location of these features within the shape. This
property allows the model to generalize to large variations preserving certain prominent
shape features. Second, due to the invariance properties of the shape descriptors used, the
4
registration (alignment) of shapes is not necessary when using our framework. We aim
at challenging segmentation problems with little training data available and large shape
deformations.
Existing appearance models fall into three major categories. Boundary-based ap-
proaches assume that object boundaries coincide with image edges, attempting to locate
boundaries in areas of high image gradient. Region-based approaches assume uniformity
of certain statistics of the image intensity inside and outside the boundary. For example,
these approaches attempt to maximize the uniformity of region statistics or their separa-
tion. Template based approaches, such as the Active Appearance Model (AAM), assume
a template image is linked to the deformable boundary. The model tries to find the match
between the warped template and the image being segmented, finding the desired object
boundary as a deformed boundary template. These methods attempt to match the inten-
sities local to the boundary.
Our method to model appearance is built upon our shape modeling approach and ex-
tends its properties. It allows the solution of challenging segmentation problems, which
create difficulties for current boundary-based and region-based approaches. Our appear-
ance modeling approach attempts to capture the general appearance properties near the
object boundary and thus uses a richer description of the intensity properties than is typ-
ically used in boundary and region based approaches. Yet our method is constructed on
distributions of intensities, and thus it attempts to abstract and generalize these proper-
ties, in contrast to template-based methods. In this way it has the potential for greater
robustness and flexibility with respect to large variations of boundary appearance along
the boundary itself and across training boundaries and images than template-based ap-
proaches, which emphasize direct intensity matching between template and data.
In the shape inference part of this work, we focus on the problem of identification of
morphological differences in corpus callosum shapes of the brain and the task of automatic
classification of shapes based on observing prior shapes with known category memberships.
Historically, such analysis is based solely on area or volume measures but the information
5
contained in these measures is not sufficient to draw conclusions about a shapes’ distinc-
tiveness nor is it sufficient to localize and quantify specific differences. We explore rich
skeleton-based shape descriptors that allow us to use statistical approaches to identify the
shape population differences and build optimal classifiers based on these descriptors.
1.2 Major contributions
In this part, we summarize the major contributions of this dissertation. The first major
contribution of this dissertation is the incorporation of the probability distribution shape
model proposed in (Zhu, 1999) into the curve evolution framework. This shape modeling
approach was motivated by the research on human shape perception and attempted to
capture such shape perception in a probabilistic context. We proposed an approach to
efficiently construct such a model from training data and incorporate the resulting model
into a curve-evolution based energy minimization framework. To our knowledge this is the
first time models from the perceptual modeling community had been used as priors for
object boundary estimation in images.
In the second major contribution of this dissertation we develop an approach to shape
modeling based on the concept of shape distributions. Shape distributions have been pre-
viously used for shape classification and show evidence of being able to encode shape under
large deformations while preserving visual similarity (Osada et al., 2002). Our work con-
tributes in using the shape distributions as a prior in object boundary estimation problems
and incorporating this prior in a curve evolution framework to solve challenging segmen-
tation problems. We further extend our shape distribution concept to 3D and to joint
multi-object shape modeling. Our new shape modeling methodology allows us to improve
the solutions to segmentation problems that present difficulties for current segmentation
methods. Particular difficulties include low signal-to-noise ratio, small training data sets,
large shape variability, object occlusion, and weak/diffuse boundaries.
In the third major contribution we extend the shape distribution concepts to model the
appearance of objects in images through intensity defined distributional descriptors. We ar-
6
rive at the framework of modeling both shape and appearance under the principle we name
the joint shape and appearance distribution model. Our new appearance model, assisted by
the shape prior, allows us to approach segmentation problems that pose severe difficulties
for current approaches. In particular, we are able to segment images where intended object
boundaries do not coincide with edges and where regions can not be separated based on
commonly used region statistics.
In the fourth major contribution of this dissertation we propose tools to study mor-
phometric differences in manually segmented outlines of the corpus callosum shapes. We
exploit skeleton based parametric shape descriptions. We present novel statistically-based
feature ranking metrics, and use these metrics to reduce the dimension of the original fea-
ture space and to construct classifiers on the reduced feature space. Our feature ranking
metrics also allow for intuitive inter-class shape difference visualization.
This dissertation is organized as follows:
Chapter 2 reviews the background related to the topics of this dissertation and used
methods.
Chapter 3 introduces the maximum entropy shape modeling approach and implements
it in the curve evolution context for image segmentation.
Chapter 4 presents the shape distribution-based modeling approach. This chapter
describes the model construction and its use in curve evolution based shape inferencing.
Chapter 5 presents the applications, demonstrating the properties of the shape distribution-
based shape model.
Chapter 6 extends our shape distribution based shape modeling approach to modeling
relationships between multiple objects.
Chapter 7 introduces a framework of unified shape and appearance modeling using
shape distributions concept.
Chapter 8 presents our results on morphometric differences analysis of the shapes of
the Corpus Callosum brain structure.
Chapter 9 summarizes our results and discusses the possibilities of future work that
7
extends the contributions of this dissertation.
Chapter 2
Background
In this chapter we review the technical background of the methods used in this dissertation
and prior work that is related to the major topics of this dissertation. First, we introduce
the basic concepts of object-based image processing. Next, we consider curve evolution
approaches in greater detail since curve evolution will be a major focus of this work.
We discuss different curve parameterizations, with particular attention paid to level set
methods. We review prior work on shape distances and models and on appearance models.
Finally, we review some elements of classification theory, relevant to the morphological
shape analysis direction of this thesis.
2.1 Object based image processing
In this dissertation we consider object-based tasks and concentrate on the use of shape and
intensity prior information in these tasks. The concrete applications of interest are image
segmentation, shape clustering, shape averaging, image reconstruction, and shape classifi-
cation. First, we will consider the boundary extraction thrust as the driving application.
In this context, the prior information is generally divided into shape and appearance priors.
The shape prior describes the geometry of the object boundaries and/or relationships be-
tween boundaries of multiple objects. The appearance prior formulates prior knowledge on
image intensities with respect to object boundaries. The shape prior imposes a structure
on the solution boundary or merely acts as a regularizer. The appearance prior links the
solution to the data, specifying the way the data (image) influences the solution.
A first choice made in a given application is that of a low level object (shape) descrip-
tion. Two types of approaches can be distinguished. In the first type of approach, the
8
9
shape is described parametrically as a geometrical primitive shape (such as an ellipse) or a
combination thereof. An advanced version of such an approach is a skeleton-based param-
eterization utilized in part of this work. In skeleton based parameterizations, the shape is
described as a sequence of connected ribbons. Each ribbon is assigned a width parameter
that implies the shape boundary points.
A second type of approach to describe shapes can be named boundary-based or active
contour approaches, where the shape is explicitly defined by its boundary: a curve in 2D
or surface in 3D. Boundary based shape descriptions are the primary tool used in this
dissertation. In turn, boundary contour descriptions can be realized in two distinct ways.
Parametric boundary descriptions define a boundary as a sequence of nodes, landmarks
or spline segments. The geometric active contour method describes a boundary implicitly
using a higher order function called a level-set function. In this dissertation we utilize both
approaches to describe a contour. Throughout this dissertation, it is assumed that the
underlying contour is closed and non-self-intersecting. Although for some special cases this
assumption can be relaxed, real life objects can be described by one or several non-self-
intersecting closed contours. Curve evolution approaches (Caselles et al., 1997) to evolve
such parameterized shapes are the basic tool used here. Curve evolution methods are
intrinsically geometric since the boundary describes the object of interest directly. Focusing
on one boundary allows for computational efficiency of the evolution process (computations
may have to be performed in the neighborhood of or on the boundary). Curve evolution
permits easy topology handling and flexible initialization. Finally, different forces may be
used to move the curve, making the approach highly flexible.
2.2 Curve evolution framework
In this section we give a detailed review of the curve evolution framework. We consider
different underlying formulations and practical implementation approaches.
In a curve evolution framework, the boundary is continuously evolved using the curve
force specified at each point on the boundary. Typically, the force includes the shape and
10
the intensity components. The shape component constrains or regularizes the boundary,
incorporating the prior knowledge on the shape of the boundary. The intensity component
depends on underlying data and the model of the relationship between the curve and the
data, which we call the appearance prior.
We distinguish three possibilities to formulate the curve evolution approach and to
compute the curve forces. First, a heuristic approach can be used to specify all or some
of the forces (Shen et al., 2005). The resulting approach may be effective but lacks the
integrity of a general principles formulation. Second, the energy minimization based ap-
proach defines the energy functional, which relates curve, image, and prior information.
The solution curve is then defined as the minimizer of this energy functional and is com-
puted through the evolution process. Curve force is typically computed as the gradient
flow - the direction of curve deformation that corresponds to the fastest decrease of the
energy. Third, a curve, an image and prior information can be encoded in an explicit
probabilistic framework, for instance using the MAP formulation. While being the most
theoretically justified approach, a probabilistic framework is problematic to implement in
practice, notably, because “true” probability distribution function definition and normal-
ization constant computation is difficult on the infinite dimensional spaces of shapes. In
fact, probabilistic approaches used in practice often do not include strict formulations for
the probability distributions used. On the other hand, any probabilistic approach can be
expressed as an energy minimization approach, but not conversely. Hence, the energy min-
imization approach presents an adequately general and convenient framework. Below we
give more detailed overviews of the energy based and probabilistic formulations of curve
evolution.
2.2.1 Energy minimization based curve evolution
In the energy minimization based curve evolution approach, one defines an energy func-
tional that depends on one or multiple curves. This energy functional is typically designed
to be minimized at the correct solution (i.e. desired object boundaries). The energy func-
11
tional can be constructed by analyzing the structure of the problem, desirable solutions,
etc.
In the energy based formulation, an energy E(Γ) depending on the hypothesized object
boundary Γ is defined. The solution boundary is the minimizer of this energy:
Γ∗ = argmin
Γ
E(Γ) (2.1)
The energy typically consists of two types of terms: intensity term(s) and shape term(s).
E(Γ) = Eint(Γ) + αEshape(Γ) (2.2)
Sometimes these are referred to as “external” and “internal” terms. The intensity term
Eint insures the fidelity of the solution to the image data. Eint reflects the sensor model of
the expected appearance of the data corresponding to a given scene (appearance model);
for example, for image segmentation problems, this term can be the negative log-likelihood
of the image intensities. The shape term Eshape reflects the prior information about shape.
The parameter α is a regularization coefficient that weighs the strength of the prior. For
certain problems, the data term is absent in the formulation. An example of such a situation
is the curve morphing task, in which a curve must be evolved to match the prior curve(s).
Although the prior curve(s) can be interpreted as data, in order to avoid confusion, we will
call the energy term that depends on the prior curves - a shape term Eshape.
Given the energy functional E(Γ) we find the minimizing curve flow −∇E(s). It can
be interpreted as a force acting on the curve in the normal direction F (s) = −∇E(s). The
curve is evolved according to the following differential equation:
dΓ
dt= −∇E(s) ~N(s) (2.3)
where ~N(s) is the normal direction to the curve Γ at s. In a discrete implementation
Γt+1(s) = Γt(s) − k∇E(s) ~N(s) (2.4)
12
where t indicates the evolution time of the minimization and k is a speed coefficient chosen
small enough to guarantee numerical stability.
The gradient curve flow minimizing eq. 2.2 can be found using variational approaches
(Charpiat et al., 2003) or shape gradient approaches (Jehan-Besson, 2003). In this work
we utilize a variational approach. The possibility to apply shape gradient tools is a topic
of further research.
The following steps are involved in the variational approach to computing the gradient
curve flow (Charpiat et al., 2003) which minimizes the energy E(Γ):
1. Compute the Gateaux semi-derivative of the energy functional with respect to per-
turbation β. Let s be the arc-length of the curve Γ, then β(s) defines the normal
displacement of the curve at s. The Gateaux semi-derivative is defined as a directional
derivative
G(E, β) = limε→0
E(Γ + εβ) − E(Γ)
ε(2.5)
The space of perturbations β(s) defines a linear and continuous subspace of a Hilbert
space L2(Γ). The Hilbert product is given by < β1, β2 >=∫
Γ β1, β2.
2. If the Gateaux semi-derivative of a linear functional E exists, we can apply the
Rietz representation theorem (Rudin, 1966), and the Gateaux semi-derivative can be
represented as
G(E, β) =< ∇E, β > (2.6)
were ∇E is the gradient (flow) minimizing the functional E. Thus, one needs to cast
the Gateaux semi-derivative in the form 2.6.
2.2.2 Probabilistic formulation for curve evolution
Sometimes, the problem of finding an optimal object boundary in an image can be cast in
the probabilistic form. Given the image I, we want to find a maximum a posteriori (MAP)
13
contour Γ∗ given by
Γ∗ = argmax
Γ
p(Γ|I) (2.7)
Using Bayes’ rule
p(Γ|I) ∼ p(I|Γ)p(Γ) (2.8)
where p(I|Γ) is the image likelihood and p(Γ) is a shape prior. The image likelihood is
a pdf of an image given the object contour, which is constructed using a model of image
appearance. The shape prior is the pdf on the space of possible shapes.
In fact, the energy-based and probabilistic formulations are closely related. Taking the
negative log of eq. 2.8 we obtain
−log p(I|Γ) = −log p(I|Γ)−log p(Γ) (2.9)
The maximization in eq. 2.7 is equivalent to minimization of −log p(I|Γ), and is thus equiv-
alent to an energy minimization problem with E(Γ) = − log p(I|Γ). In this formulation
the image likelihood can be interpreted as p(I|Γ) = e−Eint(Γ) and prior shape pdf can be
interpreted as p(Γ) = e−Eshape(Γ).
The advantage of the probabilistic approach is the knowledge of the regularization
constant α in the energy formulation given proper normalization of the probability distri-
butions. The difficulty of this approach is that the proper normalization is often hard to
find (for instance we need to integrate over the space of all curves). Hence, in this work
we use the energy formulation.
2.3 Shape parameterizations
Given the boundary extraction problem formulation, such as energy minimization based
or probabilistic, the next question is how to parameterize and deform the object. In this
section we review different options to parameterize the shape and in particular, level set
approach. Parametric non-curve object representations define the shape through a set of
parameters which are not boundary points coordinates. Such approaches are not directly
14
combined with the curve evolution framework. Explicit and implicit curve parameteriza-
tions are naturally used in the curve evolution framework.
2.3.1 Parametric non-curve object representations
Starting with a description of the lowest dimensionality, let us briefly review shape param-
eterizations as geometrical primitives. Such approaches are incompatible with the curve
evolution framework but can be used in energy based and probabilistic boundary extrac-
tion formulations. One of the simplest approaches to parameterize an object is to assume
the ellipse shape (Poonawala, 2003). Only six parameters are sufficient in this case. Of
course, such a parameterization is only useful for a narrow range of problems, such as to
describe biological cells. Rectangular shapes were used to describe man made objects, see
(Minguez and Montano, 2005). Combinations of primitive shapes can be used to describe
more complex objects (Zhu and Yuille, 1996). Another significant strategy is to parame-
terize the object using medial primitives, as in the MREP approach of (Pizer et al., 1996;
Pizer et al., 2003). In the MREP approach, the object is described by a graph of medial
atoms and implied boundary points. Closely related skeleton or medial axis approaches
model the object by a medial axis (skeleton) and implied boundary (Dimitrov et al., 2000;
Galand et al., 1999; Shah, 2005; Tari and Shah, 2000). The unpruned skeleton is a result
of the Blum transform of the object boundary (Blum, 1967). The MREP representation
can be considered as a coarsely sampled version of the skeleton representation. Skeleton
based approaches are able to capture information about shape in a meaningful and visually
intuitive way, which warrants its use in some applications. In this dissertation we use a
skeletal shape representation for the analysis of brain morphology in Chapter 8. The shape
descriptors presented above are ground level descriptors in the sense that they are not de-
rived from or defined on the object boundaries. Conversely, these descriptions usually
imply the boundary.
15
2.3.2 Explicit curve-based parameterization
We now consider curve based descriptors. A variety of explicit curve representations
have been proposed. Curves can be represented by landmarks (Bookstein, 1991), sam-
pled boundary points (Cootes et al., 1993), basis function coefficients (Kunttu et al., 2004;
Staib and Duncan, 1992) or splines (Sukmarg and Rao, 2000) among others. The main
advantage of the explicit parameterization is low computational complexity, while the dis-
advantages are the need for periodic re-parameterization (re-sampling in case of uniform
sampling of the boundary) and the need to cope with topology changes during evolution
(self intersection, split and merge, etc), see (McInerney and Terzopoulos, 1995). Direct
curve parameterization by arc-length is the basis for the curve evolution framework used
here. In this dissertation we use arc-length parameterization and its numerical implemen-
tation through uniform curve sampling to compute functions defined along the curve.
Higher level descriptors can be defined on the shape boundary representations. One
family of such descriptors, we name distribution-based descriptors, plays a crucial role in
this dissertation. The descriptors of this kind are reviewed in detail in Section 2.5.
2.3.3 Implicit curve parameterization by level sets
The implicit curve parameterization known as the level set framework (Sethian, 1999) has
gained significant popularity due to its many advantages. Level-set representations allow
easy incorporation of curve forces defined directly on level set function, implicit handling of
topology, straightforward implementation, and easy extension to higher dimensions. Under
the level set framework, the curve (2D) or surface (3D) is defined as a zero level set of the
function Φ(x) (see figure 2·1 for a 2D curve illustration). Although different possibilities
exist, this function is typically taken to be the signed distance function of the curve (or
surface), defined to be negative inside and positive outside of the object:
Φ(x) =
d(x, Γ) if x is outside Γ
−d(x, Γ) if x is inside Γ(2.10)
16
where d(x, Γ) is Euclidean distance between the point x and the curve Γ.
Evolution of the curve is performed by evolving the embedding level-set function using
the update flow Φt(x). In the discrete implementation, the level-set function is iteratively
updated according to
Φ(x)′ = Φ(x) + δΦt(x) (2.11)
where Φ(x)′ is the updated level-set function at each time and δ is the update step taken to
be sufficiently small to guarantee numerical stability. There exist two ways to compute the
update flow Φt(x). First, certain definitions of curve flows can be expressed as equivalent
flows Φt(x) explicitly defined on the level-set function. This situation arises when the curve
properties (such as curvature) are computed on the level-set function itself. This is the
most natural way to evolve the level-set function but since the level-set function loses its
properties of being a distance function in the course of evolution, and to insure numerical
stability, one needs to perform periodic re-initialization, which is computationally costly.
Second, if the curve evolution flow Γt does not have an equivalent level-set function flow,
the evolution of the level-set function Φ is governed by the following differential equation:
Φt = −Γt|∇Φ| (2.12)
More accurately, eq. 2.12 only specifies the evolution of the level-set Φ = 0 corresponding
to the interface. Several options exist to define the evolution of the level-set function away
from the interface (Sethian, 1999):
• Force extension approach. Under this approach, the level set function update at any
point in the space is chosen to preserve the property of Φ being a signed distance
function of the contour defined by its zero level set. This is the most accurate but
also most computationally extensive approach. It’s most important advantage is that
there is no need for periodic level set function re-initialization.
• Interpolation based approach. Values of the level set function are computed by
interpolating the updates on the interface. Again, one needs to perform periodic
17
re-initialization of the level-set function.
• Modified differential equation approach. The level-set function can be evolved under
a modified differential equation that tends to preserve the distance function property
approximately. Under this approach the need for re-initialization is reduced but not
eliminated completely.
A combination of flows required for a particular application may need a hybrid approach
to evolve the level set function. Approximations can be designed that remove the need for
periodic re-initialization and PDE-based evolution at the cost of reducing the accuracy of
the evolution (Shi and Karl, 2005).
In the base framework, the level set function in the whole domain (image or volume) is
evolved, which leads to significant computation overhead. Narrow band approaches reduce
the computational complexity of the level set evolution by constraining the level set update
to a narrow band around the interface. It is important to note that narrow band approaches
do not reduce (and can increase) the need for periodic re-initialization.
Re-initialization of the level set function consists of three steps: zero interface extrac-
tion, computation of the values of the level set function at grid points next to the interface,
and update of the level set function values away from the interface using a fast-marching
(Sethian, 1999) or fast sweep approach. Typically, the re-initialization has to be performed
every few iterations depending on the magnitude of the update steps.
2.4 Shape modeling approaches
Using prior shape information in boundary extraction problems is the major direction of
this dissertation. In the following sections we overview some existing approaches, dis-
cuss their advantages and limitations, and depict the area of deficiency of the current
approaches. Shape models are used to construct shape term Eshape in the boundary ex-
traction energy formulation.
18
X
Y Φ =0
Φ
Figure 2·1: Level set curve embedding. Curve Γ is given by a set of pointsΓ : Φ = 0, Φ is called level set function.
2.4.1 Constructing distance measures on shapes
Let us first consider a particular situation when the solution for the object is expected
to be close (in terms of its shape) to the single known instance of shape. In such a case
one can define the penalty of deviating from this given shape and use this penalty in the
energy formulation as Eshape. This penalty can be considered the measure of distance
between the solution and the true shape but is not a model in a strict sense. On the other
hand, the definition of shape distance can be used for shape clustering, shape classification.
Not surprisingly, shape distances play an important role in object-based image processing.
They allow evaluation of the solution goodness and impose dynamic constraints. They also
can be used to constrain the relative positions of multiple objects. We first review some of
the widely used choices of shape distances, in particular, those used in this dissertation.
The distance between two shapes can be constructed on parametric shape representa-
tions, such as the angle function representation (Klassen et al., 2004; Tagare, 1999). One
possibility is to compare parametric representations directly using a definition of the dis-
tance on the space of parametric representations. Shapes under comparison are assumed
pre-registered. In (Klassen et al., 2004), shape registration is intrinsic in the distance def-
inition. Elastic matching has been used in (Basri et al., 1995) with particular attention
19
paid to the similarity of shapes with deformable parts. A distance measure computed on
skeleton representations has been used in (Sundar et al., 2003). Other methods construct
shape distances on features extracted from the shapes. For instance, (Berretti et al., 2000;
Belongie et al., 2002; Li and Simske, 2002) among many others.
In curve evolution methods, a shape distance is often defined on the contours themselves
or on the embedded distance functions. An example of a generic curve distance measure is
the Chamfer distance (Borgefors, 1984; Thayananthan et al., 2003) that can be defined as
d(Γ1, Γ2) =
∫
x∈Γ1
min
y∈Γ2
||x − y||ds (2.13)
where the integration is carried out along Γ1 accumulating the Euclidean distance between
the current point on Γ1 and the curve Γ2. Another often used shape difference measure
based on the total area between shapes is the Hamming distance (Skiena, 1997)
d(Γ1, Γ2) =
∫
A: sign(D(Γ1))6=sign(D(Γ2))
dS (2.14)
where D(Γ1) and D(Γ2) are signed distance transforms for shapes Γ1 and Γ2 respectively.
Another measure is the Hausdorff distance (Rote, 1991; Charpiat et al., 2003), defined as
minimum distance between any point on Γ1 and any point of Γ2
d(Γ1, Γ2) = min
x∈Γ1
min
y∈Γ2
||x − y||
(2.15)
2.4.2 Constructing shape variability model
If no prior shapes are observed or if more then one prior shape is given, one needs a different
approach to define/constrain the space of shape variation, which is expressed as the term
Eshape in the energy based curve evolution framework. Here we review existing approaches
by dividing them into the following categories.
1. Methods using a generic prior
In generic regularization methods, a prior of regularizing type is assumed, such as
20
“a curve must be short”. Practically, certain properties of the shape such as the
perimeter or the area are penalized in order to regularize the estimated boundary
curve. This group of methods amounts to generic regularization or geometric “low
pass” filtering to limit the effects of noise in the image. Such methods do not construct
a shape model in an explicit way (Mumford and Shah, 1985; Caselles et al., 1997;
Tsai et al., 2001b; Kim et al., 2002b; Siddiqi et al., 1997). Such “generic” penalties
are stationary along the curve, in that every point on the boundary experiences the
same effect. Such priors can remove object protrusions and smooth salient object
detail when the boundary location is not well supported by the observed data, since
they seek objects with short boundaries or small area. The important advantage of
methods using a generic prior is that they usually are easily implementable in a curve
evolution context.
2. Extensions of methods using a generic prior
Methods similar in nature but improving upon generic priors are numerous and de-
velop in several directions. One group of methods, constructs non-linear priors on
curvature or other image/shape descriptors. This group includes anisotropic diffu-
sion based approaches. Examples can be found in (Tasdizen and Whitaker, 2004)
and references therein. A common goal for these approaches is to preserve corners, or
edges of objects while preserving smoothness elsewhere. In a curve evolution context,
geometric flows which drive an evolving curve toward a polygon were developed in
(Unal et al., 2002). These flows potentially could be used as a prior force in the
curve evolution framework. Unfortunately, the geometric flows in (Unal et al., 2002)
favor polygonal shapes with predefined (chosen by the operator) edge orientation di-
rections. Such a prior is highly dependent on extrinsic properties (such as object
orientation), and does not appear adaptable to other, non-polygonal, shape classes.
In (Shah, 2002), the functional extending the curve length penalty was formed and
applied to the curve smoothing problems. In this functional, the curve length penalty
21
was applied only in the regions with a low likelihood of being a corner. The likelihood
of a corner, called the corner strength function, was constructed jointly while evolving
the curve using modifications of the Mumford-Shah functional methodology (Shah,
1996). This technique yields a piecewise smooth curve, potentially improving the
segmentation that is achieved by using a stationary smoothing prior, such as a curve
length penalty. However, this technique, like the generic curve length penalty, does
not include more detailed information derived from training shapes. Moreover, aux-
iliary unknowns (the corner strength function field) makes the solution non-unique
and sensitive to the regularization parameters.
It is possible to improve the generic prior so that it includes more information about a
class of shapes but is still expressed as a local penalty, stationary with respect to the
shape boundary. One such alternative data-driven prior shape model was proposed in
(Leventon et al., 2000b) as a part of a level set based segmentation algorithm. The
overall distribution of curvature and intensity with respect to a segmenting curve
was found from training data. This spatially stationary model was then used in a
probabilistic MAP formulation to segment an image.
The shape prior in (Leventon et al., 2000b) is computed as follows. It is assumed
that the values of the curvature of the level set function Φ embedding the segmenting
curve are independent samples from the prior distribution on the level set curva-
ture computed on training shapes leading to the following formulation for the prior
distribution on the level-set function Φ
p(Φ) =∏
y∈R
n∑
j=1
∫
x∈R
exp
(
−||K(Φ)(y) − K∗
j (x)||2
2σ2
)
(2.16)
where K(Φ)(y) is the curvature of the level set function at y, R is the image plane,
and K∗j (x) is the curvature of the level set function computed on the prior image j
at x. n is the number of prior images. Corresponding energy formulation for the
22
Eshape(Γ) is given by
Eshape(Φ(Γ)) = −∑
y∈R
log
n∑
j=1
∫
x∈R
exp
(
−||K(Φ)(y) − K∗
j (x)||2
2σ2
)
(2.17)
where Eshape(Γ) is explicitly defined on the level set function Φ computed on the
curve Γ, K(Φ)(y) is the curvature of the level set function at y, and K∗j (x) is the
curvature of the level set function computed on the prior image j at x. n is the
number of prior images.
Although giving better results than generic curve length penalty priors, this approach
still tends to suppress salient structures. The reason is that the stationary prior
coupled with the MAP criterion attempts to drive the curvature at every point on
the curve to the same value corresponding to the mode of the distribution. We use
this technique in Chapter 5 for comparison with our results.
A variation of Leventon’s model described previously was proposed in (Litvin and
Karl, 2003), where the PDE attempts to deform the curve in such a way that the cur-
vature histogram computed over the current level set function matches the curvature
histogram computed over the training level sets corresponding to the training shapes.
Such global histogram matching approach showed interesting properties and yielded
the results superior to those given by the method in (Leventon et al., 2000b). Un-
fortunately, approach in (Litvin and Karl, 2003) does not have corresponding energy
or probabilistic formulation. Both the approach in (Leventon et al., 2000b) and the
approach in (Litvin and Karl, 2003) are limited to encode only the level set curvature
as a feature and are not adaptable to other shape features.
3. Deformable templates
Numerous approaches have been proposed to construct prior models based on al-
lowable deformations of a template shape. One group of approaches are based on
representing and modeling shape as a set of landmarks (see (Dryden and Mardia,
1998) and references therein). In the Point Distribution Model (PDM), proposed in
23
(Cootes et al., 1995), n labeled points on a boundary are selected to describe each
shape in the training set. The space of allowable shapes is then defined as a box in
2n dimensional space defined by the spread of points in this space, where each point
corresponds to one training shape.
A number of approaches use principal component analysis based on parameterized
boundary coordinates or level set functions to obtain a set of shape expansion func-
tions (Rousson and Paragios, 2002; Tsai et al., 2001a; Leventon et al., 2000a; Wang
and Staib, 2000) that describe the subspace of allowable shapes. Sinusoidal basis
functions were used in (Staib and Duncan, 1996). A solution is then sought in this
restricted shape space. In another approach the subspace of allowable shapes is com-
posed of the set of deformations of a predefined shape template (Sclaroff and Liu,
2001). The restricted shape space is then used to constrain the solution, or to com-
pute the likelihood of a particular boundary configuration. Still, other approaches
construct more complex parametric shape representations, such as the MREP ap-
proach in (Pizer et al., 1996), or deformable atlas based approach in (Christensen,
1994). (Fenster and Kender, 2001) builds a prior probability density as a multidi-
mensional Gaussian on a space of parameters calculated from image features along
the boundary.
Unfortunately, these methods can be overly sensitive to the global appearance of
particular shapes in the training data. These methods are effective when the space
of possible curves is well covered by the modeled template deformations as obtained
through training data, but may not generalize well to shapes unseen in the training
set. Another difficulty is the need to pre-register shapes, which can be problematic
in case of large deformations within the training set of shapes.
Let us focus on a particular implementation of PCA-based shape prior defined on
level-sets. First, training shapes are aligned using the technique in (Rousson and
Paragios, 2002). Second, PCA analysis is carried out on signed distance transforms
24
corresponding to registered training data shapes. The PCA space of allowable shape
deformations was constructed using first 5 eigenshapes. In order to use the prior in
a curve evolution framework, one imposes the penalty on the mismatch between the
segmenting curve and its projection onto the PCA space. We penalize the area be-
tween the current shape and the identified PCA space projection using the Hamming
distance penalty. Therefore, the prior shape energy is given by
Eshape(Γ) =
∫
A: sign(D(Γ))6=sign(D(Γ+PCA
))
dS (2.18)
where Γ+PCA is the projection of Γ onto the PCA space. The problem to find the
projection is formulated as the minimization problem with respect to the Hamming
distance between the given curve and the curve in the PCA space. This minimization
is carried out using gradient descent with respect to PCA projection coefficients.
Formally, the projection Γ+PCA is given by
Γ+PCA = argmin
ΓPCA∈PCA
∫
A: sign(D(Γ))6=sign(D(ΓPCA))
dS (2.19)
In this dissertation we use this technique for comparison with our results.
4. Articulated models
Another approach to the inclusion of prior shape information is based on explicit
modeling and extraction of component parts (Pizer et al., 1996; Zhu and Yuille,
1996). Such models are also known as articulated models. These models have been
shown to represent well visual similarity within certain classes of shapes, such as
human silhouette shapes or human palm shapes. Unfortunately, articulated models
only give an ad hoc solution to a certain type of problems. Different models have to
be constructed for different classes of shapes. Comparison between different objects is
problematic. Such models are also not adapted to curve evolution based approaches.
Recently, the symbolic signatures method has been proposed in (Ruiz-Correa et al.,
25
2006). A method can define shape parts, parameterize them by numerical signatures
and encode the relative positions of parts as 2D images. An SVD based classifier is
trained on prior shape examples to perform detection and classification tasks. This
methodology can be considered a part-based method. The skeleton of parts is not
predetermined and a measure of distance between pairs of shapes with different skele-
ton topology can be constructed. However, the approach has two major problems.
First, user intervention is needed in the process of model construction. Second, it is
still unclear if the model can be used in curve evolution.
5. PDF construction
Some methods attempt to construct “true” probability distributions on the space
of shapes. In one such technique, motivated by the theories of human perception,
Zhu (Zhu, 1999) developed the maximum entropy model of shape. The model is
probabilistic and flexible and thus seemingly good candidate for the shape prior.
Zhu showed that samples drawn from the constructed probability distribution show
impressive perceptual similarity to the prior shapes while being highly variable in
their shape.
Let us overview the model construction according to (Zhu, 1999). The model is re-
quired to have maximum variability in unconstrained directions while capturing some
important data statistics for a set of “significant” shape features φ(α)(s), which are
measured along the shape boundary, where s is arc length along the boundary and
α is a feature index. The shape feature statistics are defined as follows
µ(α)(z) =
∫
δ(z − φ(α)(s))ds (2.20)
where z is the feature value.
We denote the pdf of the curve Γ for the given class of shapes by p(Γ), where Γ stands
for the instance of a boundary. Following (Zhu, 1999), to have a pdf of maximum
entropy that is consistent with observed statistics, p(Γ) should satisfy the following
26
set of equations:
p∗(Γ) = argmax
p(Γ)
(
−
∫
p(Γ) log p(Γ)dΓ
)
Ep∗(Γ)[µ(α)(z)] = µ
(α)obs(z) ∀z, ∀α∫
p∗(Γ)dΓ = 1 (2.21)
where Ep(Γ)[] is the expectation with respect to the probability distribution function
p(Γ); integration is carried out over the space of all curves; µ(α)obs(z) are averaged
statistics for the training set of M shapes:
µ(α)obs(z) =
1
M
M∑
i=1
µ(α)i (z) (2.22)
where µ(α)i (z) is the statistic of the feature α computed on the ith prior shape.
By solving the system of equations 2.21 using Lagrange multipliers, the following
solution form is obtained:
p(Γ) =1
Zexp
(
−k∑
α=1
∫
λα(z)µα(Γ, z)dz
)
(2.23)
where λα(z) is the Lagrange multiplier function for a feature α, µα(Γ, z) is the statistic
corresponding to the shape Γ, and Z is the normalization constant. In practice, the
curve is discretized by sampling the boundary with fixed number of nodes. The range
of feature values is also discretized. In the discrete case, considering only one feature
(α = 1), eq. 2.23 becomes
p(Γ) =1
Zexp(− < Λ, µ1(Γ) >) (2.24)
where Λ is the set of Lagrange multipliers and µ1(Γ) is the discretized statistic (his-
togram) for the curve Γ. In order to construct a pdf of the form of eq. 2.24, one
should find the set of Lagrange multipliers Λ. Using the fact that log p(Γ) is concave
27
with respect to Γ, it is possible to find Λ iteratively:
dΛ
dt= Ep(Γ;Λ)[µ(Γ)] − µ
(α)obs (2.25)
where Ep(Γ;Λ)[] is the expectation with respect to the probability distribution function
p(Γ; Λ). The limit point of the iteration scheme in equation 2.25 will satisfy the
observations constraint (second equation in the set 2.21).
The key difficulty is the computation of the expected statistics from the current
distribution Ep(Γ;Λ)[µ(Γ)]. Analytical calculation of this quantity is very difficult. As
a result, numerical Monte Carlo Markov Chain (MCMC) methods are used in order
to obtain samples from p(Γ; Λ). The quantity Ep(Γ;Λ)[µ(Γ)] is then obtained as the
average of histograms over samples from the distribution.
(Zhu, 1999) uses the Metropolis-Hastings algorithm to simulate a random walk in
the space of possible configurations of a closed discrete contour. Curve nodes are
positioned on grid points. One move in the space consists of moving one node of a
curve to one of the 8 neighboring positions. The distance between nodes is constrained
within a certain bound. The resulting MCMC simulation is very slow and the number
of moves necessary for obtaining a sample from the distribution is of the order of 109,
making this MCMC simulation too slow for use in practical applications.
In Chapter 3 of this dissertation we investigate the possibility to improve upon the
model construction computational cost and use this probability distribution in curve
evolution based boundary extraction.
2.5 Shape distributions
We now review the shape distribution concept that is one of the key ideas used in this dis-
sertation. Shape descriptor based on distributions arises from the idea that a distribution
computed over a parameter extracted from the shape effectively encodes the presence of
parts or visual features of the shape, while being invariant to the positions of these parts or
28
k
Shapes
−r−r
s
k −r
H(k)
k
1pdf(k)
Figure 2·2: An example of constructing a shape distribution for a curve(left) based on curvature κ(s) measured along the boundary (second graph).Third and fourth graphs show the sketches of pdf(κ) and cumulative dis-tribution function H(κ) of the samples of curvature respectively. Note theinvariance of H(κ) with respect to the choice of the initial point of arc-lengthparameterization.
features. Resulting shape descriptor may be invariant to larger shape deformations preserv-
ing certain prominent features of shapes. Shape distributions are derived representations,
in the sense that these representations are defined on the underlying contour representa-
tions. Moreover, in general case, shape distributions are many-to-one representations, since
different contours can have the same corresponding shape distribution representation.
Let us review the concept of shape distributions in greater detail. In (Osada et al.,
2002), shape distribution is defined as a cumulative distribution function computed on
random samples from the feature function defined on the shape. An illustrative example of
the shape distribution idea is shown in Figure 2·2, using boundary curvature as the feature.
Building a shape distribution is done in 2 steps:
1. Computing samples of the feature function (boundary curvature k(s) in this case)
on the parameter space on which feature function is defined. For example, in case of
boundary curvature, this space is the arclength of the curve.
2. Computing the cumulative distribution function (CDF) H(κ) on feature function
samples.
Distributions are computed on training data and also can be defined analytically. Shape
distribution for a set of shapes where computed as an average of the cumulative distribution
functions corresponding to the individual shapes in the group.
29
The concept of shape distributions was successfully applied to shape classification tasks.
For example, a method using shape distributions in handwritten digits recognition exper-
iments yielded best performance (Osada et al., 2002) among other published techniques.
Other results using shape distributions in classification experiments include (Thayanan-
than et al., 2003; Mezghani et al., 2004; Ip et al., 2002; Osada et al., 2001). Protein
similarity has been studied using shape distributions in (Canzar and Remy, 2005). These
results indicate that shape distributions are robust, invariant, flexible shape representations
with good discriminative properties. In this work we propose to use shape distribution to
construct a shape prior for use in statistical inference tasks (see Chapter 4). Note that
shape distributions computed from sampled feature functions are not deterministic shape
descriptors. Instead, we introduce deterministically computed shape distributions that we
present in Chapter 4.
Building other types of distribution-based shape representations for recognition pur-
poses has been investigated in several papers. Distributions of radially computed parame-
terizations, known as shape context, were applied to shape classification tasks in (Belongie
et al., 2002; Zhang and Malik, 2003). A related idea named force histogram descriptors
have been used in (Matsakis et al., 2004). (Grigorescu and Petkov, 2003) uses distance
sets computed on contour points for shape recognition. To our best knowledge, these other
types of distribution-based shape representations have not been applied as priors for shape
inference.
2.6 Shape model prior work summary and motivation
Despite the large body of work on shapes, only few types of shape modeling approaches are
implemented using curve evolution, namely generic priors and certain deformable templates
based approaches. Generic approaches do not use specific information on prior shapes,
while deformable template approaches construct the shape deformation space around the
“average” shape, thereby restricting global deformations and impeding generalizability to
unseen shapes. One would like to construct a model that supports larger deformations pre-
30
serving shape similarity as perceived by humans. Ideally, the model should be constructed
on small training sets and should easily generalize to unseen samples. At the same time,
we would like to incorporate the model in curve evolution framework for solving boundary
extraction problems. Handling registration is also an issue of concern, especially in case
of large deformations. Most widely used shape modeling approaches pre-register (align)
shapes prior to constructing the deformation model. The registration becomes increasingly
difficult when shapes are more different from each other. We would like to handle registra-
tion in transparent and efficient way regardless the magnitude of shape deformations.
Some methods are motivated by similar goals. In (Klassen et al., 2004), an attempt
was made to construct a shape distance measure based on matching angle function shape
descriptors. Invariance properties were encoded into the model by implicit shape correspon-
dence estimation. A certain degree of generalizability was achieved by allowing additional
unpenalized degrees of freedom through boundary stretching. Although this method at-
tempts to improve generalizability, its effectiveness is limited due to being constrained to
a particular along-the-curve parameterization. Articulated models (above) may capture
large deformations of shape parts but are application specific and are not curve evolution
based.
Our belief is that a successful strategy can be implemented through constructing the
distribution of certain shape features defined from the shape. The distribution can accu-
mulate the information about the presence of certain feature values but will not contain the
information about the precise “location” of the feature. Distribution-based shape repre-
sentations have been constructed and have been successfully applied to shape classification
tasks, see Section 2.5. In Chapter 4 we will further explore this idea and construct the
shape prior based on distribution-based shape representations. Maximum entropy model in
(Zhu, 1999) captures perceptual shape similarity under large deformations in probabilistic
framework. We will explore using this model for image segmentation in Chapter 3.
31
2.7 Appearance models
In a curve evolution context, an appearance model characterizes the prior knowledge on the
image with respect to the object of interest and provides the link between the image data
and the resulting object boundary. In energy based curve evolution framework, appearance
prior is encoded in the intensity term Eint. Below we consider several types of appearance
models.
Boundary based approaches assume that the true boundary is positioned along high
image gradients. A notable example of this strategy is developed in (Caselles et al., 1997).
Gradient field attraction force via orthogonal curves used in (Tagare, 1997) and edge-flow
forces in (Sumengen et al., 2002) are some examples among many. The drawbacks of pure
gradient based methods include narrow capture area, sensitivity to initial conditions, need
for balloon forces, etc. The major limitation in using these methods is the need for strong
edges to be present everywhere along the object boundary. In many applications, edges
can be diffuse, weak, or even nonexistent due to anatomical features, imaging technique,
etc.
In region-based approaches, certain assumptions are made concerning the image inten-
sities inside and outside of the evolving curve. A benchmark approach in (Yezzi et al., 1999)
assumes that at the true position, the boundary maximizes the difference between certain
statistics computed on the inside and outside of the segmenting curve. Another bench-
mark approach in (Chan and Vese, 2001), named active contour without edges, assumes
constant average intensities inside and outside of the region of interest. The data term
in the segmentation functional is defined as the image likelihood given by this piece-wise
constant image model assuming independent identically distributed (IID) Gaussian noise.
In this dissertation we use the model in (Chan and Vese, 2001) to test the effect of different
shape priors in segmentation problems. Another example of a region-based model is the
information theoretic approach in (Kim et al., 2002a), casting the segmentation problem as
maximization of the mutual information between region labels and image intensities. The
32
evolving curve tends to segment the scene into regions with homogeneous intensities. In
this dissertation we also use the method in (Kim et al., 2002a). Region intensity histogram
priors are constructed in (Herbulot et al., 2004; Puzicha et al., 1999). In (Leventon et al.,
2000b) the prior is directly constructed on training images.
A classical hybrid approach is the piece-wise smoothness model used in (Shah, 1996),
where the image is assumed smooth except on the evolving boundary. A boundary itself
is linked to the image gradient magnitude. Using the energy functional defined in (Shah,
1996), a boundary and image can be simultaneously reconstructed.
A common assumption for all region based statistical methods is the uniformity of a
certain statistic inside the region of interest. However, in many problems, this assumption
may be violated. In this dissertation we focus on the problems that pose difficulties for
current gradient and region based method and develop a new appearance model strategy
tailored to such problems.
Appearance modeling approaches of a different type, we name template-based ap-
proaches, are the counterparts of the shape template-based models. Active Appearance
Model (AAM) proposed in (Cootes et al., 2001) is a classical example. In AAM model, a
template image is linked to a deformable boundary template. The model tries to find the
match between the warped template and the image being segmented, finding the desired
object boundary as a deformed boundary template. The method uses PCA boundary de-
formation model and emphasizes an exact match between the segmented boundary/image
pair and the boundary/image template. Clearly, this method is strong when there is a
sufficient degree of consistency between the training images/boundaries and the segmented
boundary/image and also when there is enough training data to describe possible template
deformations and possible image variations.
In this dissertation we develop an alternative appearance modeling approach. Our
approach is different from AAM type of approaches in the type of boundary/image vari-
abilities accounted for. Active Appearance Model enforces fidelity between the boundary
and the deformable template image model as a whole. Template warpings are the only
33
possible deformations. The positions of characteristic image/boundary features are linked
to the template. On the other hand, our method abstracts and generalizes boundary ap-
pearance properties, effectively encoding the appearance properties “content” for a given
boundary/image pair. In our method, characteristic image/boundary features have flexible
locations. Hence, our method has potential for greater robustness/flexibility with respect
to large variations of boundary appearance along the boundary itself and across training
boundaries/images. Our method can also be potentially more robust with respect to small
training sets.
2.7.1 Role of distribution in appearance modeling and histogram equalization
We pointed out that the distributional descriptors may be beneficial to describe shape.
Not surprisingly, measuring and comparing intensity distributions has also been used in
the context of appearance model construction, for instance (Freedman et al., 2004; Kim
et al., 2002a; Herbulot et al., 2004; Puzicha et al., 1999). The classical and most widely
used method to model appearance is to specify/compute the histogram of gray level values
over the region (Herbulot et al., 2004) and use it as the prior histogram of intensities in
the segmented region. Constructing appearance models will be considered in Chapter 7 as
an extension of our distribution-based shape modeling approach.
We now review another important application of intensity distributions, namely, image
enhancement through global histogram modification using Partial Differential Equations
(PDE) (Sapiro and Caselles, 1997; Sapiro and Caselles, 1995). We will also use these tools
later in a different context. Besides being a research topic, image histogram equalization
and modification has become a standard part of off-the-shelf image editing software, such
as Adobe Photoshop. We will consider the histogram modification task following (Sapiro
and Caselles, 1997). Given the image I(x, y)|R × R we would like to evolve its gray level
values in a unique and spatially uniform way so that the underlying gray level histogram
hI(λ) evolves to match a target histogram h∗(λ). Let the target Cumulative Distribution
Function (CDF) be defined as H∗(λ) =∫ λ
0 h∗(s)ds and the current image CDF be defined
34
similarly HI(λ) =∫ λ
0 hI(s)ds. It can be shown that the gray level intensity flow
∂I(x, y, t)
t= H∗[I(x, y, t)] − HI [I(x, y, t)] (2.26)
always exists, is unique, and has the desired solution as a limit point
limt−→∞
hI(t) = h∗ (2.27)
The importance of this method lies in the possibility of evolving the function (image
I in this case) so that the distribution computed on it matches the target distribution.
The simplicity and nice properties of the flow are due to the lack of any constraints on the
values of the image values at different locations.
We will extend the general idea of matching two distributions using an evolution PDE
to a more general class of features that are computed with respect to a curve and/or image
intensities. Much of the difficulty in matching these distributions is related to the fact
that the different variables are subject to numerous constraints, so that their independent
evolution is not possible.
2.8 Distribution difference measures
We have reviewed two topics of interest that will be further explored throughout this
thesis, namely, distribution-based shape descriptors in Section 2.5 and distribution-based
intensity descriptors in Section 2.7.1. These descriptors have been previously used in
comparison schemes through the definition of the distribution difference measure. Hence,
such distribution difference measures play an important role in this work.
The natural questions are:
1. What form or parameterization of the distribution should we use (PDF, CDF or
parametric measure)
2. What measure of difference between distributions should be used.
35
We first consider the choice of distribution parameterization. The highest level choice
is between parametric and non-parametric methods in describing distributions. We opt for
non-parametric methods for two reasons. First, we desire a rich descriptor of distributions.
In fact, we would like a descriptor with theoretically unlimited length in order to be able
to encode shapes, which are infinite dimensional objects. Second, we aim at universal
methodology, applicable to any feature computed on any shape. Therefore, no predefined
parametrical distribution shape can be a universally good model.
The second choice is between particular distribution descriptors and associated distri-
bution difference measures. In many applications, in particular in information theory and
statistics, the distances are defined on PDFs. Let us suppose that we have two PDFs of
features p1 and p2. Each has narrow peak and these peak are close to one another in 2 dis-
tributions but do not overlap (see Figure 2·3, panel (A), top). Obviously, a direct measure
of distribution difference constructed of these PDFs will not decrease even when peaks in
p1 and p2 are very close. “Direct” here means comparing values of two distributions at the
same parameter value. In the context of discretized histograms, these measures are known
as bin-to-bin measures (Antani et al., 2002). For instance, consider a Minkowski-form
distance (L1 norm), given by
dM (p1, p2) =
∫
|p1 − p2|dλ (2.28)
We illustrate the resulting measure dM (p1, p2) as the shaded area in Figure 2·3, panel (A),
bottom. The measure in eq. 2.28 is small only when peaks in p1 and p2 match exactly.
Small difference in peak position sharply increases the measure up to its maximum value of
2 (for normalized distributions). Further increase of the distance between peaks of p1 and
p2 (see Figure 2·3, panel (B)) does not increase the distance measure. Other commonly
used mutual information measure, such as Kullback-Leibler (K-L) divergence (Kullback,
1968), and Jeffrey divergence (Cover and Thomas, 1991) have qualitatively similar behavior
with respect to cases A and B in Figure 2·3. We argue that in case (A), p1 and p2 are close
to each other because each of them describes the dominant value for the feature and these
36
dominant values are close to one another. Clearly, a bin-to-bin measure on PDFs does
not capture our intention to get a small measure when dominant object feature values are
close. In fact, similar considerations have been made in designing the histogram difference
measures for similarity based image retrieval (see (Rubner et al., 1999; Ling and Okada,
2006) and references therein).
|difference|
1 2p
|difference||difference|
p ppp 21 1
|difference|
P
P
P
P
1 1
22
(A) (B) (C) (D)
Figure 2·3: L1 difference measure computed on PDFs and CDFs. Differ-ence value is given by shaded area. Panels (A) and (C): modes of p1 andp2 are very close; L1 difference on CDFs produces small value. Panels (B)and (D): modes of p1 and p2 are further from each other. L1 difference onCDFs gives larger value, while L1 difference on PDFs does not change.
Our illustrative example shows that we need to compare the distributions across dif-
ferent values of the parameter in order to quantify the similarity of the distributions. In a
discrete histogram comparison context, such measures are known as cross-bin dissimilarity
measures (Chupeau and Forest, 2001). To this end, several options have been studied. A
quadratic form distance between distributions has been used for image retrieval in (Niblack
et al., 1993). This distance does not establish the correspondence between masses in distri-
butions and is not theoretically justified. In image retrieval experiments it has been shown
to overestimate the mutual similarity of flat distributions. So called Earth Mover’s Dis-
tance (EMD) in (Rubner et al., 1999) is defined as the minimum work needed to transform
PDF p1 into p2, where a unit of work is given by moving a unit of mass over a unit distance.
37
This distance is seemingly a good candidate for a distribution difference measure. However,
the difference measure can only be found as a solution to an optimization problem. We
prefer an analytical, and simple, expression for the distribution difference measure.
A recently developed measure of difference between distribution as the length of geodesic
path on the manifold of distribution has been proposed in (Mio et al., 2005). This measure
of distribution difference is seemingly a logical choice of distribution difference. However, as
in the case of EMD measure, the value of distribution difference is a solution to a numerical
optimization problem.
To this end consider a measure defined on CDFs P1 and P2, corresponding to p1 and p2
respectively. Considering the example of Figure 2·3 (A), where p1 and p2 where considered
“close”. In this case, while the pointwise measure on PDFs is large, the measure on CDFs
is small (see Figure 2·3 (C,D)). Moreover, as the difference between dominant feature
increases, the measure defined on the difference in the CDFs will increase monotonically
and smoothly which is the behavior we desire. We therefore conclude that a reasonable
“direct” measure to quantify the difference between feature distribution can be constructed
on the corresponding CDFs. In fact, the “match” measure of the difference between CDFs
has been proposed in (Shen and Wong, 1983) as the L1 norm of the CDF difference
dM (P1, P2) =
∫
|P1 − P2|dλ (2.29)
The measure in eq. 2.29 represents a particular case of the EMD distance in (Rubner
et al., 1999). The measure in eq. 2.29 is not differentiable, which creates difficulty in using
variational approach. However, differentiable approximations can be used.
Later in this thesis we use L2 measure of CDF difference
dM (P1, P2) =
∫
(P1 − P2)2dλ (2.30)
which has qualitatively similar properties with measure in eq. 2.29 while suppressing large
differences (outliers). The difference measure in eq. 2.30 is differentiable and leads to
38
straightforward derivation of curve evolution equations. Moreover, the difference measure
in eq. 2.30 has been used for shape distribution matching with great success (Ip et al., 2002;
Osada et al., 2002; Osada et al., 2001; Mezghani et al., 2004), outperforming alternative
shape descriptors.
2.9 Classification
So far, we reviewed the background related to the first and second major directions of this
work: prior shape and appearance models construction and shape extraction methodology
utilizing the prior information. We now focus on the third major direction - inferencing on
shapes, given that the shapes have been extracted from images. We focus on morphological
analysis of shape differences based on corpus callosum brain shapes, motivated by the
Human Brain Project. In particular, we are interested in two tasks:
1. Localization and quantification of shape differences between groups of subjects.
2. Automatic discrimination between groups of subjects based on shapes.
Since the notion of morphological differences implies the need for localization, we select
a parametric shape descriptor methodology using medial axis representation, natural for
elongated shape of the corpus callosum. Sampled medial representations yield feature sets
describing shapes. We review the techniques utilized to extract medial axis representa-
tions and feature sets in Chapter 8. The task of localization and quantification of shape
differences is carried out by statistical analysis of the medial axis representation.
We are also interested in constructing optimal classifiers on these sets to tackle the
discrimination task (task 2 above). We are interested in 2 class discrimination problem
where the classifier is build given the training data with known class memberships. The
classifier is tested on testing data, not used to train the classifier. Classifier performance
is given by the estimated probability of the correct classification.
We now review some elements of classification theory used in this dissertation. We first
consider the MMSE (Minimum Mean-Square Error) linear discriminant function classifier.
39
Suppose observations come from two categories w1 and w2 with n1 and n2 observations in
each category respectively. Each observation is given by a vector xi of length N . We first
compose the augmented feature matrix Y as follows
Y =
1n1
Xn1
−1n2−Xn2
(2.31)
where 1n is the column of n ones and Xniis the matrix composed of observations in the
category wi stacked as rows. A vector of margins b is defined as
b = 1n1+n2(2.32)
Margin value for a particular observation is the penalty of misclassifying that observation.
Since in our work all observations have equal importance, all margins are set to be equal.
We seek the decision boundary (linear MMSE classifier) given by a vector a of length
N + 1. The vector a specifies the hyper-plane that separates the N -dimensional space of
observations so that prior observations from each class stay on the same side of the hyper-
plane. The distance between the observation and the hyper-plane should be equal to the
margin value for the corresponding observation in the least square sense. The criterion
function to be minimized is
E = ||Ya − b|| (2.33)
The solution can be found using Pseudo-inverse
a = (YtY)−1Ytb (2.34)
A new observation is decided to belong to class w1 or w2 according to the half-space it falls
into. Formally, a discriminant function is given by
g(x) = a0 +∑
i=1..N
aixi (2.35)
The observation is decided to belong to class w1 if g(x) > 0 and to class w2 otherwise.
40
Let us now introduce another decision boundary criterion function. Suppose we want
to find the decision boundary that minimizes the number of misclassified data samples.
Such misclassification criterion function can be written as
E = −
n1+n2∑
i
sign(biYia) (2.36)
The criterion function in eq. 2.36 can not be minimized using gradient descent approaches;
therefore, in practice, differentiable approximations are used, such as Perceptron criterion
function. We use a direct differentiable approximation to misclassification criterion that
approximates the criterion function in eq. 2.36 asymptotically. Let the error function be
defined as
E = −
n1+n2∑
i
atan
(biYia
γ
)
(2.37)
where γ is a small number (we use γ = 0.01). In the limit of γ = 0, the criterion in eq. 2.37
is equivalent to the one in eq. 2.36. A decision boundary minimizing eq. 2.37 can be found
by gradient descent.
As the second classifier technique, we use the learning-based technique ada-boost (Fre-
und and Schapire, 1999), which constructs the discriminant function as a non-linear com-
bination of iteratively constructed sequence of “weak” classifiers. Each subsequent “weak”
classifier minimizes the weighted misclassification error criterion, where the weight given
to a particular sample depends on whether the sample was previously misclassified. After
a sufficient number of steps, overall decision boundary is guaranteed to classify correctly
all training samples. The only requirement for the “weak” classifier is the guarantee to
have the error rate lower then 0.5.
For the weak classifier we use the weighted modification of the differentiable misclassi-
fication criterion in eq. 2.37. We define the criterion function
E = −
n1+n2∑
i
Di atan
(biYia
γ
)
(2.38)
41
where Di are the weights applied to individual data points in Ada Boosting algorithm
and the summation is carried out over all data points. The criterion function in eq. 2.38
guarantees to achieve the training error lower then 0.5 and hence is suitable for use in Ada
Boosting algorithm. Ada-boosting can construct a complex decision boundary, which is
best suitable for separable cases with a large amount of training data. Since we have limited
and non-separable training data we only use few Ada-boost steps to prevent over-fitting.
42
Chapter 3
Maximum entropy shape model as a curve
evolution prior
In the previous chapter of this dissertation we overview the approaches to introduce the
prior shape knowledge into the segmentation problems. In this chapter, we consider the
maximum entropy model proposed in (Zhu, 1999). In shape sampling experiments, this
model has been shown to yield samples from the distribution that appear to share visual
characteristics of shapes used to construct the model. We propose an approximation that
significantly accelerates model construction. Next, we propose to use this model in curve
evolution context using numerical curve flow computation and apply our method to segment
images with very low SNR. The work reported in this chapter previously appeared in (Litvin
and Karl, 2002).
3.1 Introduction
An attractive approach to boundary extraction problems is to adopt an explicitly Bayesian
framework, and to express the prior boundary shape information in the form of a probability
distribution function (pdf) on the space of curves or deformations. Such Bayesian perspec-
tives based on explicit shape priors and pdfs lead naturally to “optimal” segmentation and
classification approaches. However, most such approaches do not consider the distribution
measure defined on the space of shapes, but rather define the probability distributions
on parameterized deformations (see deformable template approaches in Chapter 2). Di-
rect comparison of parameterizations and deformation space modeling makes most of these
approaches valid on relatively small deformations with respect to an “average” shape.
43
(Zhu, 1999) has proposed “natural” and data driven maximum entropy models of object
shape (see PDF construction approach in Chapter 2). While computationally expensive and
burdensome to find, these models have demonstrated the ability to capture intrinsic and
complicated characteristics of object structure. The model in (Zhu, 1999) was motivated by
the desire to capture perceptual shape similarity. A model having such properties would
seem to possess good characteristics for such practical problems as segmentation. Here
we investigate the possibility of using such a shape model for image segmentation. To our
knowledge, this is the first attempt at using such models to solve practical problems. In the
next section we briefly review the concept of constructing pdfs of shape using the maximum
entropy principle (Zhu, 1999), and present our technique to reduce the computational cost.
In the third section we apply the approach to the problem of segmentation and show
preliminary results.
The basic idea of the method in (Zhu, 1999) is to construct an approximation to the
true pdf based on the space of shapes such that this pdf will capture some features or
statistics from a training set but will have maximum variability in unconstrained direc-
tions. Increasing the number of representative features that are used we can make our
approximated pdf closer to the true pdf for a given class of shapes.
3.2 Constructing a pdf on the space of shapes
Here we summarize the maximum-entropy-based shape model we use derived from (Zhu,
1999) (see Chapter 2).
Zhu defines a shape model obtained from a set of training data. The model is required
to have maximum variability in unconstrained directions while capturing some important
data statistics for a set of “significant” shape features φ(α)(s), which are measured along
the shape boundary, where s is the arc length along the boundary and α is a feature index.
The shape feature statistics are defined in the form of observed histograms:
µ(α)(z) =
∫
δ(z − φ(α)(s))ds (3.1)
44
where z is the feature value. The observed statistics µ(α)obs(z) that the model has to satisfy
are given by the average over statistics computed on M training shapes
µ(α)obs(z) =
1
M
M∑
i=1
µ(α)i (z) (3.2)
where µ(α)i (z) is the statistic of the feature α computed on the ith training shape.
Let us define the boundary orientation function θ(s) to be the tangential direction at
the boundary point s. The only feature we use in this work is curvature, defined as the
derivative of the boundary orientation function with respect to the arc length:
φ(1)(s) =dθ(s)
ds(3.3)
We denote the pdf of the curve Γ for the given class of shapes by p(Γ), where Γ stands
for the instance of a boundary. In order to satisfy the maximum entropy condition and
have the correct observed statistics, the pdf takes the form:
p(Γ) =1
Zexp
(
−k∑
α=1
∫
λα(z)µα(Γ, z)dz
)
(3.4)
where λα(z) is the Lagrange multiplier function, µα(Γ, z) is the statistic corresponding to
the shape Γ, and Z is the normalization constant. Discretizing the range of feature values
and considering only one feature, eq. 3.4 becomes
p(Γ) =1
Zexp(− < Λµ(Γ) >) (3.5)
where Λ is the set of Lagrange multipliers and µ(Γ) is the discretized statistic (histogram)
for the curve Γ. The set Λ can be obtained as the stationary point of the differential
equation
dΛ
dt= Ep(Γ;Λ)[µ(Γ)] − µ
(α)obs (3.6)
where Ep(Γ;Λ)[] is the expectation with respect to the probability distribution function
p(Γ; Λ) computed on the space of all possible curves.
45
Figure 3·1: MCMC move proposal in (Zhu, 1999). Point i in configurationA is moved into one of the 8 positions under the constraints on the lengthof linelets connecting node i with its neighbors.
The key difficulty is the computation of the expected statistics from the current distri-
bution Ep(Γ;Λ)[µ(Γ)]. The quantity Ep(Γ;Λ)[µ(Γ)] is obtained as the average of histograms
over samples from the distribution where samples are drawn using the Metropolis-Hastings
algorithm to simulate a random walk in the space of possible configurations of a closed
discrete contour. Curve nodes are positioned on grid points. One move in the space con-
sists of moving one node of a curve to one of the 8 neighboring positions. The distance
between nodes is constrained within a certain bound, see Figure 3·1. The resulting MCMC
simulation is very slow and the number of moves necessary for obtaining a sample from
the distribution is of the order of 109, making this MCMC simulation too slow for use
in practical applications where model construction measured in hours or days is not an
acceptable choice.
As a result we have developed a more efficient method of MCMC implementation. In
addition we have developed an even more efficient approximation that we discuss after
introducing our exact MCMC implementation. Unlike in (Zhu, 1999), in our algorithm the
curve nodes no longer lie on a grid but take any position on the plane, though we restrict
the linelets connecting nodes to now have a fixed length. A proposed move of our MCMC
46
simulation now changes the coordinates of all nodes in such a way that the distances
between all adjacent nodes remain the same and fixed. This new implementation has
two important advantages over the original MCMC implementation in (Zhu, 1999). First,
by fixing the linelet length we eliminate the need to continuously approximate uniform
sampling along the curve. Second, and most important, is the considerable acceleration of
the MCMC simulation attained using our method. In the original MCMC method a move
of one node of the chain into a new configuration should be accompanied by successful
moves of neighboring nodes in a certain direction. Therefore, the probability of a valid
change of configuration is very low or alternatively a change in configuration requires a
large number of MCMC moves. In our new method all nodes move during each step;
therefore, the curve moves in the configuration space more quickly.
We now discuss our MCMC method. As in the original method, the first node is chosen
at random. Two new trial positions of the chosen node are proposed at a distance +δ and
−δ in the direction of the local normal to the curve, δ being a small number. One of these
two proposed positions is then chosen at random. Next, two neighboring nodes are moved
in the direction of the mean of the angle between old position of the starting node, the
neighboring node, and the new position of the starting node. Subsequent pairs of neighbors
are moved according to the same rule until the last node is reached. The new position of
the last node is then uniquely identified. An illustration of the proposed scheme is given
in Figure 3·2. It is easy to see that the proposed move is reversible, which is a requirement
for a valid MCMC simulation step. The new curve remains uniformly sampled. We denote
the initial position of the entire curve by A and the new proposed position by B. Let us
define the probability that configuration A is changed into B as K(A → B). Since we have
only two possible equally likely configuration changes, K(A → B) = 0.5. Probability of
the reverse change K(B → A) is also equal to 0.5 since there are always two candidate
positions for the node. The proposed move is accepted with probability:
Pa(A → B) = min
(K(B → A)p(ΓB)
K(A → B)p(ΓA), 1
)
= min
(p(ΓB)
p(ΓA), 1
)
(3.7)
47
new position
dd
dd
Starting node
Figure 3·2: Our new scheme of proposed MCMC move (3 nodes, includingthe starting node are shown). Black circles represent the initial node config-uration. White circles represent 2 candidate positions for the starting nodeand the new positions of the neighbors after a starting node was moved intothe new (bottom) position.
where p(ΓA) and p(ΓB) are the values of the probability function p(Γ) evaluated on con-
figurations A and B respectively.
Overall, this move strategy is repeated until the chain converges, which means the
convergence of the expected statistics Ep(Γ)[µ(Γ)]. In practice, we observe the estimated
feature histograms as the MCMC simulation proceeds and the number of moves increases
until no further significant changes are observed. We found that after 20,000 moves no
visible changes were observed in µobs estimated from drawn samples. The number of MCMC
steps decreased by the order of ≈ 105 with respect to the original MCMC simulation scheme
in (Zhu, 1999). However, each MCMC step is computationally N times more extensive,
where N is the number of curve nodes. The overall computation time improvement factor
effectively becomes ≈ 103.
We can gain even greater time savings through some approximation. This approxi-
mation is based on the intuitive idea that as the number of nodes in the curve increases,
distant nodes become more and more de-correlated and the curve locally behaves like an
open curve. In the case of an open curve the MCMC simulation proceeds as follows. Sup-
pose the initial configuration is denoted as A. First, a trial node is chosen at random. Then
48
dd
d
Trial node
new position
Figure 3·3: Our scheme of open curve MCMC move proposal. Black cir-cles represent the initial configuration. White circles represent the finalconfiguration after the curve is bent at the trial node.
we choose a random angle value in range [−0.1..0.1]. This angle value us used to bend the
curve at the chosen node, yielding the configuration B. The configuration change is re-
versible and probabilities K(A → B) and K(B → A) are equal. Therefore, the change is
accepted with probability Pa = min[p(B)/p(A), 1]. An illustration of the proposed scheme
is given in Figure 3·3. The chain converges after approximately 20,000 MCMC steps but
at a computational cost about two orders of magnitude smaller compared to a closed curve
MCMC simulation. The overall computational expense improvement factor (with respect
to the original MCMC method in (Zhu, 1999)) is about 105. Let us denote the expected
statistics computed on the samples drown from the distribution as µsym = Ep(Γ)[µ(Γ)].
We have run MCMC simulation with an open curve and with a closed curve with different
numbers of nodes and compared the simulated sets of features µsym. We find that as the
number of nodes increases the two methods give increasingly similar values for estimated
parameters µsym. The exact effect of our MCMC approximation on the constructed model
and the segmentation results remains to be investigated. At this point, we assume that the
open curve MCMC approximation in model construction preserves the qualitative proper-
ties of the model to encode the perceptual shape characteristics. Therefore, we use open
curve MCMC simulation instead of closed curve MCMC simulation with N = 80 to build
our shape priors. Time needed to compute the model parameters is measured in minutes
using our MATLAB implementation.
49
3.3 Application to curve-evolution based segmentation
In this section we apply the shape prior identified using the methods we have described
in Section 3.2 to segment a shape based on observation of a noisy image. We pose the
segmentation problem in probabilistic MAP formulation, maximizing the posterior density
p(Γ|I) for the shape given the data. Using Bayes rule and converting to the equivalent
energy formulation by taking negative log
Γ∗ = argmax
Γ
p(Γ|I) = argmax
Γ
p(I|Γ)p(Γ) =
argmin
Γ
−1
2(u − v)2 − αlog(p(Γ)) (3.8)
We express image likelihood −log(p(I|Γ)) as the intensity component of the energy Eint =
−12(u − v)2. Assuming bi-level image, this energy attempts to maximize the difference
between u and v - the averaged intensities inside and outside of the curve Γ respectively,
thereby moving the curve towards the object boundary. The shape prior energy is Eshape =
−log(p(Γ)). Regularization parameter α governs the importance of the prior term.
We use the curve evolution framework to minimize the energy in eq. 3.8. Therefore, we
find the curve flow dΓdt
corresponding to the steepest decrease of the energy (the gradient
flow). The gradient flow corresponding to the first term is given by
(dΓ
dt
)
int
= (u − v)(2I − u − v) ~N (3.9)
where ~N is the local normal and I is average image intensity in the neighborhood of radius
R of a node. This averaging prevents the curve from wrapping on itself due to local noise
fluctuations. The smoothing radius R is set to half of the distance between neighboring
curve nodes.
The gradient flow corresponding to the second term in eq. 3.8 is computed for the
discretized curve by constructing a differentiable approximation of p(Γ) in eq. 3.5 based
directly on feature values instead of histograms. First, we define the continuous valued
50
Lagrange multiplier function λc as a piecewise linear approximation of the set of Lagrange
multipliers Λ corresponding to the model p(Γ). Now the inner product in eq. 3.5 can be
approximated as a sum over Lagrange multiplier function λc values evaluated on the set
of discrete feature values. Since the feature values are curvatures θn measured on the
curve nodes, the final summation is over the curve nodes. The resulting differentiable
approximation is given by
p(Γ) = exp (− < Λ, µ(Γ) >) ≈ exp
(
−1
L
L∑
n=1
λc(θn)
)
(3.10)
where L is a number of curve nodes. Finally, the shape prior energy is given by
Eshape =1
L
L∑
n=1
λc(θn) (3.11)
which is the sum of Lagrange multiplier function values corresponding to the curvatures
measured on the curve nodes. The minimizing flow(
dΓdt
)
shapecorresponding to the shape
prior term in eq. 3.11 is now found by computing numerical partial derivatives of eq. 3.11
with respect to perturbations of curve nodes.
The overall curve flow minimizing the energy in eq. 3.8 is given by the sum of the two
components:
dΓ
dt=
(dΓ
dt
)
int
+
(dΓ
dt
)
shape
(3.12)
The difficulty in implementing the resulting curve flow is the highly non-convex nature of
the probability distribution function p(Γ). This property can be explained by the presence
of potential barriers in the prior energy given by eq. 3.11. In practice, the set of Lagrange
multipliers Λ can contain very high values. For instance, it happens when some values of the
observed histograms µobs are small. Such high values of the Lagrange multiplier function
effectively forbid corresponding values of curvature (introducing large term in the sum in
eq. 3.11). In the process of curve evolution, configuration changes through intermediate
states that produce such “forbidden” curvature values can not be realized. As a result, the
evolution may stop prevented by a local increase of the energy due to just one high value
51
of the Lagrange multiplier function. We use two approaches to overcome this difficulty.
First, we use sufficiently coarse histogram and Lagrangian multiplier discretization. We
discretize the histogram using 30 bins. Coarse histogram leads to the smooth Lagrangian
multiplier function. Second, we manually threshold the Lagrange multipliers Λ to eliminate
high values. Of course, this modification represents an uncontrollable change of the prior,
but we found it helpful to reduce the local minima problem.
Another difficulty is the sensitivity of the flow with respect to curve perturbations.
The prior energy is the function of the curvature values which are known to be sensitive to
noise. The computation of the flow involves computing the derivatives of the energy which
magnifies the noise sensitivity. In our implementation, this noise sensitivity is partially
eased by the smoothing effect of periodic curve resampling performed after every few steps
of evolving the curve.
(A) (B)
Figure 3·4: Ground truth image with noise added constructed on the shapefrom dataset 1 (panel A) and dataset 2 (panel B). Black solid line: trueshape; White dashed line: initial contour
We perform two experiments on two different classes of shapes. For each experiment
we generate the shape prior using a set of training shapes obtained by slight perturbations
(non-isotropic scaling) of the basic shape characterizing the class. One shape from the
52
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=0
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=0.67
Figure 3·5: Data set 1. Result of segmentation without using prior shapemodel (left) and the best result using the model (right). True boundary isshown by straight line and circles show reconstructed boundary.
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=0
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=1.5
Figure 3·6: Same as figure 3·5 for data set 2.
53
training set is then used to generate noisy observation data. The regions inside and outside
of the shape have values 1 and 0 respectively and independent Gaussian noise with a
standard deviation of σn = 3 is added to the resultant clean image to create the noisy
data. The resulting segmentation problem is difficult to solve well without a prior model
as we will show next.
The first training data set contains triangular shapes with sharp corners and the second
set contains shapes with smoother corners and nearly circular boundaries. In Figure 3·4,
panels (A) and (B) we show the noisy image constructed on a shape from dataset 1 and 2
respectively. In Figures 3·5 and 3·6 we show the results of segmenting a given shape from
each of the two data sets. On left panel we show the results obtained with no prior (α = 0).
On the right panel we show the corresponding result when a shape prior was used (α 6= 0).
The best result was chosen from results obtained for various α. As we can see, no shape
prior results in the noisy boundary. Conversely, using prior shape information smoothes
the boundary while preserving the high curvature areas.
An alternative, and currently popular, approach to regularize segmentations is to use
a generic boundary length penalty term:
Eprior =
∫
Γ
ds (3.13)
In order to show the advantage of using a trained prior curve model over this more generic
approach we show the results using this generic approach in segmentation of one of our
shapes for optimal values of the prior weight α in Figure 3·7. We see that using a simple
length penalty can give a smooth boundary at the expense of severe corner smoothing.
3.4 Discussion
In this chapter we showed the results of applying a data derived pdf on the space of shapes
to the problem of segmenting noisy images. We used the pdf on shapes developed in (Zhu,
1999), which captures the perceptual similarity of shapes. The model in (Zhu, 1999) has
54
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=0.25
0 50 100 150 200 2500
50
100
150
200
250
R=11 σn=3 α=1.0
Figure 3·7: Segmentation obtained using penalty function in eq. 3.13.
been previously used exclusively in shape sampling experiments. The computational cost
of constructing the model is prohibitive in the original formulation.
We developed an improvement of the MCMC method for simulation of shapes sampled
from the given distribution. We then applied this method to several segmentation examples.
In our example experiments we did observe significantly better segmentation results when a
prior shape model was used when compared to results obtained with no prior term or those
using a generic length penalty term. Using the proposed technique we were able to recover
a boundary which retained focused corners while smoothing out excessive variability and
fluctuation on the boundary.
In the process of constructing the model, we approximated the closed curve MCMC
chain by an open curve MCMC chain simulation. This approximation must be studied in
better detail in order to better understand its effect on the segmentation solution. Another
possibility is to learn the functional mapping between open and closed MCMC simulation
results (for instance using neural networks) in order to obtain better closed curve MCMC
chain approximation.
We only used one definition of feature function, namely the curvature. Therefore, the
descriptive power of our model is limited. Using other definitions of feature functions
55
proposed in (Zhu, 1999) significantly increases the computational cost of constructing the
model. For instance, the “co-linearity” feature constructor in (Zhu, 1999) requires an addi-
tional layer of MCMC sampling that increases the cost by orders of magnitude (we did not
implement this feature function). Even constructing the model efficiently may be possible
using such advances feature functions, using resulting distributions in curve evolution may
pose additional difficulties (theoretical and computational) due to non-deterministic nature
of these feature functions. We conclude that practical use of the models developed in (Zhu,
1999) still presents difficulties. Potentially, the progress can be made using determinis-
tic definitions of the feature functions that we develop in the next chapter in a different
context.
56
Chapter 4
Shape-distribution-based prior shape model
This chapter is devoted to the core topic of this dissertation, the construction and use of
shape-distribution-based shape models. In the following sections we discuss the motivations
behind using shape distributions, and introduce the framework of constructing the shape
prior based on shape distributions. The work reported in this chapter was published in
(Litvin and Karl, 2004a; Litvin and Karl, 2004b; Litvin and Karl, 2004c).
4.1 Motivation
In Section 2.4 we presented a review of prior shape modeling approaches. We believe that
a significant gap exists in the range of available techniques. On one end of the spectrum,
“generic” shape priors penalize some property of the shape, such as curvature or shape
area. These methods are too simple to be effective, since they merely smooth the curve.
In challenging applications more guidance is needed to overcome noise, occlusion, etc. In
particular, the solution should be influenced by the prior specific to the expected class of
objects. On the other hand, template based methods are usually specific to a particular
shape parameterization and do not generalize well. The goal of this part of the dissertation
is to develop a shape modeling approach that would possess better generalization properties
and be robust in the face of a small number of training examples. We aim at a model that
can be used in boundary inference using a curve evolution framework. We also need to
construct the model keeping in mind the need to register (align) shapes. Registering highly
variable shapes is a difficult task on its own. We desire a deterministic and computationally
efficient model.
Motivated by limitations in existing shape modeling approaches and by the represen-
57
k
Shapes
−r−r
s
k −r
H(k)
k
1pdf(k)
Figure 4·1: An example of constructing a shape distribution for a curve(left) based on curvature κ(s) measured along the boundary (second graph).Third and fourth graphs show the sketches of pdf(κ) and cumulative distri-bution function H(κ) of curvature respectively. Note the invariance of H(κ)with respect to the choice of the initial point of arc-length parameterization.
tational richness of the distribution-based shape descriptors (see Section 2.6 for prior work
review), we propose to use shape distribution to construct a shape prior for use in bound-
ary inference tasks. In (Osada et al., 2002), shape distributions are defined as sets of
cumulative distribution functions (CDFs) of feature values (one distribution per collection
of feature values of the same kind) sampled along the shape boundary or across the shape
area. As shown by recent shape classification experiments (see Section 2.5), such shape
distributions can capture the intuitive similarity of shapes in a flexible way while being
robust to a small sample size and invariant to an object transformation.
An illustrative example of the shape distribution idea is shown in Figure 4·1, using
boundary curvature as the feature. Building a shape distribution is done in 2 steps:
1. Computing the curvature function on the shape boundary κ(s)
2. Computing the cumulative distribution function H(κ) on samples from the curvature
function
Our prior is based on an energy which penalizes the difference between a set of feature
distributions of a given curve and those of a prior reference set. Such prior shape distri-
butions can capture the existence of certain visual features of a shape regardless of the
location of these features.
58
4.2 Our Formulation
First, let us summarize the important properties of shape distribution based shape repre-
sentations useful for constructing the shape prior (see Section 2.5):
• Invariance. Representations can be constructed to be invariant to rigid motion,
rotation, scaling, and mirror imaging by the proper choice of feature functions. In
fact, all definitions of shape distributions used in this work satisfy this important
property. Invariance property also eliminates the need to register shapes.
• Robustness. Small perturbations of shapes lead to small changes of the computed
distributions.
• Metric. The measure of shape distance defined using shape distributions possesses
the property of norm. Symmetry of the measure is one of the important consequences.
• Generality. The shape distribution methodology allows for the design of different
feature functions that are easily tailored to a particular problem.
• Generalizability. It has been shown in shape classification experiments, that shape
distribution based shape representations generalize well to unseen shapes in the train-
ing set even when sample size is small. Our experiments also confirm this property.
An important and key difference of our formulation of shape distributions, as opposed
to the original formulation in (Osada et al., 2002), is our deterministic framework. In
(Osada et al., 2002), distributions were computed on random samples from feature func-
tions, while we construct the distributions over the entire feature functions. Therefore,
the original shape distribution-based descriptors incorporate statistical uncertainty, while
our shape distribution represents a deterministic descriptor. Without this deterministic
approach, we would not be able to incorporate shape distributions as prior in a curve evo-
lution framework. All definitions made in this section are in the continuous domain unless
specified otherwise. Our computations are of course performed in the discrete domain by
discretizing curves, distributions, and curve evolution equations.
59
Multiple feature functions can be defined to characterize a shape or groups of shapes.
So far, we reviewed the feature function defined as boundary curvature. Separate fea-
ture functions capturing different characteristics of shapes can be combined in a single
framework, creating a more versatile prior.
4.2.1 A prior energy based on shape distributions
We now introduce our formulation of the shape prior in the continuous domain. Let Φ(ω)
be a continuously defined feature (e.g. curvature along the length of a curve) on the domain
Ω and ω is the element in the space Ω. Let λ be a variable spanning the range of values Λ
of the feature. Let H(λ) be the CDF of Φ computed on the entire domain Ω:
H(λ) =
∫
Ω hΦ(ω) < λ
dω
∫
Ω dω(4.1)
where h(condition) is the indicator function, which is 1 when the “condition” is satisfied
and 0 otherwise. Unlike in the original shape distribution formulation in (Osada et al.,
2002), our shape distributions are defined as deterministic descriptors.
We define the shape prior energy Eshape(Γ) for the boundary curve Γ in eq. 2.2 based
on shape distributions as:
Eshape(Γ) =M∑
i=1
wi
∫
Λ
[
H∗i (λ) − Hi(Γ, λ)
]2dλ (4.2)
where M is the number of different distributions (i.e. feature functions) being used to
represent the object, Hi(Γ, λ) is the distribution function of the ith feature function for
the curve Γ, and the non-negative scalar weights wi balance the relative contribution of
different feature distributions. Prior knowledge of object behavior is captured in the set of
target distributions H∗i (λ). We choose L2 measure defined on CDFs due to its attractive
properties (see Section 2.8). This measure represents a cross-bin distribution difference
measure that grows monotonically for increasingly dissimilar distributions. In addition, this
measure is differentiable with respect to curve deformations, which is a property necessary
60
to derive the energy minimizing curve flows. Note that our shape energy definition is a
deterministic measure due to our deterministic definition of the shape distribution based
shape descriptors.
We propose three strategies to define the target distributions H∗i in eq. 4.2:
1. The target distributions can correspond to a single prior shape. The resulting prior
penalizes the distance between the current curve Γ and the prior curve in terms of
their shape distributions.
2. The target distributions can be computed as averages derived from a group of M
training shapes:
H∗k(λ) =
1
M
M∑
i=1
H ik(λ) (4.3)
where H ik(λ) is the distribution of type k corresponding to the prior shape i.
3. The target distributions can be specified by prior knowledge (e.g. the analytic form
for a primitive, such as a square)
It is important to stress that the prior on the group of shapes can be constructed without
the need to register shapes due to the invariance properties of shape distribution represen-
tations.
In cases where the target distributions correspond to a single shape, eq. 4.2 can be
expressed as the distance between the two curves. For two curves Γ1 and Γ2, the measure
of dissimilarity is then
d(Γ1, Γ2) =
M∑
i=1
wi
∫
Λ
[
Hi(Γ1, λ) − Hi(Γ2, λ)]2
dλ (4.4)
This definition of the distance is differentiable and is a metric. This measure is exactly the
one used in (Osada et al., 2002) in the shape classification framework when measuring the
distance between two shapes.
61
4.3 Minimizing flow computation
Recall that the second major direction of this dissertation is using shape model to ex-
tract the boundary from images using constructed shape models. We utilize the energy
based curve evolution framework, where the curve is found as the minimizer of the energy
functional E(Γ):
Γ∗ = argmin
Γ
E(Γ) (4.5)
The energy typically consists of 2 types of terms: intensity term(s) and shape term(s).
E(Γ) = Eint(Γ) + αEshape(Γ) (4.6)
In Section 4.2 we defined the shape prior energy term Eshape(Γ). In order to use the energy
in eq. 4.2 as a prior in a curve evolution context we must be able to compute the gradient
curve flow (or just curve flow)(
dΓdt
)
shapethat deforms the curve in the direction of the
steepest decrease of the energy Eshape(Γ). In curve evolution framework, the overall curve
flow is constructed as
dΓ
dt=
(dΓ
dt
)
int
+ α
(dΓ
dt
)
shape
(4.7)
where(
dΓdt
)
intis the minimizing curve flow corresponding to intensity energy term Eint(Γ).
For simplicity, we consider a shape prior energy term defined on just a single feature
function. Since eq. 4.2 is additive in the different feature functions, the minimizing flows for
single individual feature functions can be added with the corresponding weights to obtain
the overall minimizing flow. The energy for a single feature function is given by:
E(Γ) =
∫ [
H∗(λ) − H(Γ, λ)]2
dλ (4.8)
where H∗(λ) is the target CDF and H(Γ, λ) is the CDF computed on the evolving curve.
Because the energy depends on the whole curve in non-additive way, the minimizing flow
at any location on the curve also depends on the whole curve, and not just the local curve
properties, making this computation much more challenging than the flow computation for
62
the popular shape prior energy definitions. The minimizing flow and its computation will,
of course, depend on the specifics of the feature function chosen.
We propose two techniques to compute the curve flow. The first method is a variational
based approach and produces exact analytical solutions for the feature functions used in
this work. The second method is the numerical gradient descent based solution. We apply
the unconstrained gradient descent to find the curve deformation that reduces the energy.
This method can be used to quickly test new feature functions.
4.3.1 Exact solution using variational framework
Here we introduce an efficient approach to analytically compute the minimizing flows for
certain feature functions by using a variational framework. The resulting flows guarantee
reduction of the energy functionals in eq. 4.8.
Let us briefly outline the strategy to find the curve flow minimizing an energy functional
(see (Charpiat et al., 2003) for background and Appendix A for detailed derivations):
1. Define a small perturbation of the curve in the direction of the normal as a function
of arc-length as β(s).
2. Find the Gateaux semi-derivative of the energy in eq. 4.8 with respect to the per-
turbation β. Using the definition of the Gateaux semi-derivative, the linearity of
integration, and the chain rule we obtain
G(E, β) = 2
∫ [
H∗(λ) − H(Γ, λ)]
G[
H(Γ, λ), β]
dλ (4.9)
3. If the Gateaux semi-derivative of a linear functional f exists, then according to the
Rietz representation theorem, it can be represented as
G(f, β) =< ∇f, β > (4.10)
where ∇f is the gradient flow minimizing the functional f . We must compute
G(
H(Γ, λ), β)
, then find corresponding ∇H(Γ, λ) using representation in eq. 4.10.
63
4. Gateaux semi-derivative of the functional E can be now represented
G(E, β) = 2
∫ [
H∗(λ) − H(Γ, λ)]
< ∇H, β > dλ =
⟨
2
∫ [
H∗(λ) − H(Γ, λ)]
∇Hdλ, β
⟩
(4.11)
The overall flow minimizing eq. 4.8 is then given by
∇E = 2
∫ [
H∗(λ) − H(Γ, λ)]
∇H(Γ, λ)dλ (4.12)
4.3.2 Numerical solution
In (Litvin and Karl, 2004b), we propose a numerical scheme to estimate the flow. Let the
single feature function at time t be Φt(Γ) (continuously defined in the space Ω). In our
first step, we compute the gradient descent flow in the space Ω minimizing the eq. 4.2.
dΦ(λ) =∂Φ
∂t= −∇ΦE = [H∗(Φ(t)) − Ht(Φ(t))] H ′
λ(Φt(t)) (4.13)
where H ′λ(Φt(t)) is the derivative of the distribution with respect to λ evaluated at Φt(t).
This flow is defined on feature function values and specifies how to change the feature
function values to reduce E. The stationary point of this flow corresponds to the case
when H∗(Φ(t)) = Ht(Φ(t)), i.e. the distribution function for the given curve matches the
target prior distribution function. The discrete approximation of eq. 4.13 is straightforward
and can be efficiently computed.
The flow in eq. 4.13 is defined on the curve features separately, without consideration of
consistency between those features. For example, for boundary curvature feature function,
arbitrary evolution of the curvature values along the curve will no longer correspond to a
connected curve. In fact, curvature function computed on the curve must satisfy a condition
discussed in (Klassen et al., 2004). Let this flow on the feature function computed according
to eq. 4.13 be denoted by dΦ∗. The evolution of the feature function is constrained by the
fact that the feature function must remain corresponding to a valid contour Γ, which
induces a infinite dimensional manifold of valid feature functions. In general, the flow
64
dΦ∗ is not restricted to this manifold. In order to find the actual curve flow, we adopt
a projection solution, finding the curve deformation that results in a feature space flow
closest to the given unconstrained flow dΦ∗.
An approach similar in spirit is used in (Klassen et al., 2004) to evolve shapes. In
Klassen’s approach, a prescribed direction of shape parameterization evolution is given on
the the space tangent to the manifold of valid shapes. This prescribed direction takes the
object outside of the manifold of shapes. The solution is then sought as a shape on the
manifold that is the result of evolution in the direction closest to the prescribed direction
on the tangent space.
Let us consider a given contour at time T . Let the space of all feature function flows be
Φ and let β(s) be a small displacement of the contour in the normal direction to the curve
at s. Let S be a hypersurface in Φ, given by feature function flows corresponding to all
possible curve deformations. We want to orthogonally project the unconstrained feature
flow dΦ∗ onto the hypersurface S. See Figure 4·2 illustrating this technique. Let us denote
the resulting projection dΦ+ and corresponding curve deformation β(s)+. Note that the
point dΦ = 0 ∈ S. We perform gradient descent on the manifold S starting from β(s) = 0
to find the projection dΦ+. We seek the curve deformation solution as follows
β(s)+ = argmin
β(s)
||dΦ(β(s)) − dΦ∗|| (4.14)
where ||x|| is the L2 norm.
We use the following strategy to find the solution β(s)+. Starting from zero deformation
β(s)t=0 = 0, we consecutively apply small random perturbations dβ(s)t to the curve. Each
new random perturbation adds to the perturbation at the previous time step:
β(s)t = β(s)t−1 + dβ(s)t (4.15)
We record the resulting cumulative perturbation as an intermediate result if the cost
||dΦ(β(s)) − dΦ∗|| given the new perturbation is lower then the previous value, that is,
65
dΦ=0t=0 Φd
(s)t=n
*
d
Φ* −d (s)
Φ (β )
d Φ (β )||||
Figure 4·2: Illustration of the descent on the manifold procedure to findthe the curve flow β(s)+, eq. 4.14. The surface represent the space of allrealizable feature function flows S.
if
||dΦ(β(s)t) − dΦ∗|| < ||dΦ(β(s)t−1) − dΦ∗|| (4.16)
The procedure is stopped when the average decrease of the energy over a fixed number of
steps is lower then a specified threshold value.
The procedure is symbolically illustrated in Figure 4·2. The surface represents the
realizable feature function flow manifold S. The trajectory shows the value of the feature
function flow during the minimization procedure.
As we pointed out previously, feature function values are computed by uniformly dis-
cretizing the curve and periodically re-sampling the curve. The computational burden of
computing the curve evolution under the numerically computed flow is substantial due to
the two nested levels of iterations (curve evolution and curve flow computation).
Even though the flow dΦ+ found using our procedure is gradient related to the flow dΦ∗
(< dΦ+, dΦ∗ > is positive), we have no guarantee that dΦ+ is not asymptotically orthogonal
to dΦ∗. In fact, complex geometry of S can trap the solution to the minimization problem
in eq. 4.14 into a local minimum, which can potentially result in ||dΦ+|| << ||dΦ∗|| and/or
66
< dΦ+, dΦ∗ >≈ 0. However, in our experiments the flow obtained using this numerical
scheme did minimize the cost in eq. 4.2. Since the approach does not produce the true
gradient curve flow but only gradient related flow, the combination of the resulting flow dβ+
with(
dΓdt
)
intin eq. 4.7 does not necessarily minimize the original energy functional in eq. 4.6.
In our experiments, the flow computed using this approach produced the segmentation
result that is reasonably close to that given by the true gradient flow for Eshape. Given
the above theoretical issues with this numerical scheme and high computational cost, exact
analytical solution for the curve flow is highly preferable and is used in experiments reported
in this dissertation.
A different formulation of the distribution difference measure based on geodesics in the
space of PDFs is introduced in Section 4.7.2. In Section 4.7.2 we also pave the way to
compute the corresponding minimizing flow.
4.4 Feature function choice
Figure 4·3: Feature function #1 values computed on the discretized curve.We show interpoint distances d13..d15.
We now consider the choice of the feature functions. In the example in Section 4.1,
we considered the curvature measured along the boundary as the feature function. In
Appendix A.2, we derive the minimizing curve flow corresponding to this choice of feature
function and show that resulting flow is likely to be numerically unstable and sensitive
to noise. Our experiments also confirmed this fact. Instead, we design feature function
definitions having better properties by using non-adjacent points on the curve to compute
67
feature function values.
We use two specific feature functions in our experiments in this chapter, which we define
below:
• Feature function #1. Inter-point distances.
We first define the space Ω as a subspace of R2, where each dimension encodes the
position on the curve parameterized by arc-length. Without loss of generality we
normalize the arclength to the interval [0, 1]. Two positions s1 and s2 on the curve
Γ define the point ω = s1, s2. Now the subspace ΩS of the space Ω is defined as
ΩS :min(|s1 − s2|, 1 − |s1 − s2|)
L∈ S
S = [a, b] a, b ∈ [0, 0.5] (4.17)
where L is the total length of the boundary; S is the interval specified by a and b.
Basically, ΩS specifies the set of curve points within a given distance along the curve.
The upper bound 0.5 of the interval corresponds to two points dividing the boundary
into two parts of equal length. Let us assume a point ω ∈ ΩS . The feature function
corresponding to this point is given by the normalized Euclidean distance between
the corresponding boundary points:
Φ(s1, s2) =d(Γ(s1), Γ(s2))1Ω
∫
ν∈ΩS
d(ν)dν(4.18)
where d(ν = s1, s2) = d(Γ(s1), Γ(s2)) is the Euclidean distance between the bound-
ary points specified by s1 and s2. By defining multiple intervals S we obtain mul-
tiple distributions that we use simultaneously in our prior in eq. 4.2. In the ex-
periments presented in this paper we used a single interval [0, 0.5] or four different,
non-overlapping intervals of equal length given by 0.125. These correspond to points
of increasing separation along the boundary curve. The corresponding division of the
space Ω into four sub-spaces is illustrated in Figure 4·4.
In the discrete formulation (used to compute our distributions), the feature value
set F consists of the normalized distances between nodes of the discrete curve, see
68
Ω
s
s
ΩΩΩ
321
2
1
s = 0
s = 0.5
s = 0.1
s = 0.9
2
Ω 3
Ω 4
Ω 1
Ω 1
Ω 1Ω 4
Figure 4·4: Left: Graphic interpretation of the division of the space Ωinto four subspaces ΩS : Ω1, Ω2, Ω3, and Ω4. Corresponding intervals are[0, 0.125], [0.125, 0.25], [0.25, 0.375], and [0.375,0.5]. Right: a curve with 3pairs of points, members of Ω1, Ω1, and Ω4 respectively.
Figure 4·3
F =dij | (i, j) ∈ S
mean(
dij | ∀(i, j) ∈ S) (4.19)
The set S defines the subset of internodal distances along the curve used in the
feature. For (a, b) ∈ [0, 0.5], S(a,b) defines this subset of nodes
(i, j) | (min(|i − j|, 1 − |i − j|)) ∗ ds/L ∈ [a, b], where a and b are the lower and
upper bounds of the interval respectively; ds is the distance between neighboring
boundary nodes and L is the total boundary length. Note that the feature function
so defined is invariant to shape translation, rotation and scale.
For feature function #1, the analytically computed energy minimizing flow is given
by (see Appendix A):
∇E(Γ)(s) = 2
∫
t∈S
~n(s) ·~Γ(s, t)
|~Γ(s, t)|
[
H∗(|~Γ(s, t)|) − H(Γ, |~Γ(s, t)|)]
dt (4.20)
where Γ is the parameterized curve as a function of arc length X(s), Y (s) with
s ∈ 0, 1, ~Γ(s, t) is a vector with coordinates X(t)−X(s), Y (t)−Y (s), and ~n(s) is
the outward normal at X(s), Y (s). The flow at each s is an integral over the curve,
indicating the non-local dependence of the flow. The expression under the integral
can be interpreted as a force acting on a particular pair of locations on the curve,
projected on the normal direction at s.
69
• Feature function #2. Multi-scale curvature.
As in feature function #1, we define the space Ω as a subspace of R2 where each
dimension encodes a point on an arc-length parameterized curve Γ. The subspace
ΩS of space Ω is defined similarly according to equation 4.17 and is parameterized
by an interval S = [a, b]. We now assume that ω = s0, s+. Let us define l(ω) =
min(|s+ − s0|, 1− |s0 − s+|) as the shortest distance along the curve between points
s+ and s0. We now define point s− obtained by moving from s0 along the curve by
l(ω) in the opposit direction. Hence, ω = s0, s− ∈ ΩS . We now define the feature
function value Φ(ω) as an angle of support between points s+, s− and s0:
Φ(s0, s+) = ∠(s−s0s
+) (4.21)
In a discrete formulation, the feature value set F consists of the collection of angles
between nodes of the discrete curve, see Figure 4·5
F = ∠i−j,i,i+j ∀(i, j) ∈ S (4.22)
where ∠(ijk) is the angle between nodes i, j, and k. Again, the set S defines the
subset of internodal angles used in the feature and again we used S in a multi-scale
way, as described in the definition of feature function #1. Invariance with respect
to translation, rotation, scale and mirror imaging holds for this feature function by
construction.
For feature function #2, the analytically computed energy minimizing flow is given
70
by (see Appendix A)
∇E(Γ)(s) =
−
∫
t∈S
cos(β) cos(n(s),Γ+)+cos(γ) cos(n(s),Γ−)sin α
if α 6= π
sin(~n(s), ~Γ−)) otherwise
×
a · r(s,t)
bc−
f (s−t)
√
1 −(
~n(s − t) ·~Γ−
|~Γ−|
)2
|~Γ−|−f (s+t)
√
1 −(
~n(s + t) ·~Γ+
|~Γ+|
)2
|~Γ+|
×
[
H∗(α(s, t)) − H(Γ, α(s, t))
]
dt (4.23)
where r(s,t) and f (s+t) take values -1 and 1 and indicate the sign of change of the
angle α(s, t) = ∠(~Γ−, ~Γ+) with respect to along-the-normal perturbation of the point
Γ(s) and Γ(s + t) respectively, ~Γ+ = ~Γ(s, s + t); ~Γ− = ~Γ(s, s − t); a = |~Γ+ − ~Γ−|;
b = |Γ−|; c = |Γ+|; β = ∠(−~Γ+, ~Γ− − ~Γ+); γ = ∠(−~Γ−, ~Γ+ − ~Γ−). See Appendix A
for details on notation and feature function computation.
Figure 4·5: Feature function #2 in discrete case: interpoint anglesα−1,1,2..α−n,1,n are shown.
4.5 Intensity histogram equalization connection
Recall the histogram modification task (Section 2.7.1). Given the image I(x, y)|R × R we
would like to evolve its gray level values in unique and spatially uniform way to match
71
gray value PDF hI(λ) and arbitrary target PDF h∗(λ). Let the target CDF be defined as
H∗(λ) =∫ λ
0 h∗(s)ds and the current image CDF be defined similarly. It can be shown that
the flow
∂I(x, y, t)
t= H∗[I(x, y, t)] − HI [I(x, y, t)] (4.24)
always exists, is unique and has the desirable solution as the limit point.
Suppose now we want to match the distribution H(Φ(Γ), λ) of the feature function Φ(Γ)
and the target distribution H∗(λ) using the energy criterion in eq. 4.2. Our minimizing
feature function flow in eq. 4.13 is similar to the histogram modification flow eq. 4.24. The
difference is the multiplicative term H ′λ(Φt(t)) which is absent in the histogram modification
flow PDE. Since the term H ′λ(Φt(t)) is always positive, the histogram modification flow
in eq. 4.24 is gradient related to the flow in eq. 4.13 and is therefore a minimizing flow
of eq. 4.2. To our knowledge, an energy minimization interpretation of the histogram
modification flow in eq. 4.24 has not been presented. Therefore, we establish that eq. 4.2
can also be minimized by the histogram modification flow.
Our shape distribution based shape prior is connected to the histogram equalization
in that our method extends the general idea of matching distributions using PDE to more
general class of features that are computed on the curve and/or image intensities. We saw
that considerably more complicated approach is needed to match these general distributions
due to many constraints on evolving variables. In our case, these constraints are given by
the connection of the feature function values to a given curve.
4.6 Extension to 3D
In recent years, 3D image processing techniques have gained more interest fueled by increas-
ing processing power, data acquisition improvements and application needs. For instance,
CT and MRI are able to produce volumetric data with comparable resolutions in three
dimensions. One can achieve good results in interpreting such data by exploring volume
as a whole rather then its separate 2D slices. Exploring the volume as a whole can be
achieved in different ways. One direction of research focuses on processing 2D slices while
72
exploiting the data consistency in the third dimension. In the shape modeling context, this
approach is known as “2D-to-3D”. It is especially effective and natural when a particular
dimension has a different nature from the other two or otherwise when data resolution
differences prevent using isotropic 3D approach.
A different, and more interesting approach is to extend a given 2D approach in a
“natural” way. In fact, many shape modeling techniques allow for such an extension.
The most interesting example is the 3D version of level set based shape models. In level
set approach, the boundary is represented implicitly as a zero level set surface of the
higher dimensional function; therefore, level-set boundary representation can be carried
out in any dimension. Direct, explicit boundary representation is also possible in 3D
and is known as deformable mesh. On the other hand, approaches relying on a curve-
based parameterization, such as (Klassen et al., 2004), may not have a straightforward 3D
extension.
As for any shape modeling approach, it is desirable to have a natural 3D extension of
the method. In this section we present the foundation for such an extension for our shape
distribution based shape model. We first discuss the theoretical aspects using a continuous
formulation and then discuss implementation issues.
4.6.1 Formulation 3D
We define the 3D shape as a smooth closed surface that we denote S. The definition of the
prior is similar to the 2D case. We first define a feature function Φ(ω) that can be defined
on the surface or on the volume. The CDF of Φ(ω) is given by
H(λ) =
∫
Ω hΦ(ω) < λ
dω
∫
Ω dω(4.25)
and the prior shape energy is given by
Eshape(S) =
M∑
i=1
wi
∫
Λ
[
H∗i (λ) − Hi(S, λ)
]2dλ (4.26)
73
where H∗i (λ) are the prior distributions computed on training surface(s), M is the number
of different feature functions.
The challenge arises in the definition of the feature functions, and their computation.
Extension to 3D of the feature function #1 (inter-point distances) can be defined as follows.
Let Sp and Sq be two points on the surface S. We define the element ω as a combination
of 2 surface elements, that is ω = Sp, Sq, where Sq, Sp ∈ S. The space Ω is therefore
defined as the set of all possible combinations of Sq, Sp. The value of the feature function
is the normalized distance between Sp and Sq.
Φ(Sp, Sq) =d(Sp, Sq)
mean (d(Ss, Sw)|∀Ss, Sw ∈ S)(4.27)
Extension to 3D of feature function #2 (multi-scale curvatures) can not be performed in
a straightforward way. Extending the idea used in 2D case, one has to define three surface
elements S1, S2, S3 and measure the angles between vectors ~V12 and ~V13. In 2D case, in
order to reduce the space Ω dimensionality, we constrained along-the-contour distances
between pairs of points by requiring them to be equal. However, the along-the-surface
distance between two elements on the surface can not be defined in a unique way. In fact,
one could use optimization based distance definition but the computation burden of feature
and flow computation is deemed unreasonable.
4.6.2 Surface flow
For feature functions #1 in 3D case, the surface flow expression is similar to its 2D coun-
terpart. For feature function #1 the minimizing flow is given by
∇E(S)(Sa) = 2
∫
Sb∈S
~n(Sa) ·~Γ(Sa, Sb)
|~Γ(Sa, Sb)|
[
H∗(|~Γ(Sa, Sb)|) − H(S, |~Γ(Sa, Sb)|)]
dSb (4.28)
where Sx is a point on the surface S with coordinates X(Sx), Y (Sx), Z(Sx), ~Γ(Sa, Sb) is
a vector with coordinates X(Sb) − X(Sa), Y (Sb) − Y (Sa), Z(Sb) − Z(Sa), and ~n(Sx) is
the outward normal to the surface at Sx. The flow at each location Sx is an integral over
74
the surface. The expression under the integral can be interpreted as a force acting on a
particular pair of locations on the surface, projected on the normal direction at Sx.
4.6.3 Implementation issues
Discrete implementation of feature distribution and flow computation requires sampling of
the curve (2D) or the surface (3D). In 2D, we accomplished that by uniform sampling of
the curve, which is straightforward. In 3D, uniform sampling of the surface is not defined
uniquely. However, one may use non-uniform sampling in computing the feature functions
and associated flows. Here we propose to avoid surface resampling by using level set surface
embedding that provides non-uniform sampling. In fact, level set function zero crossings
with elementary grid voxels provide the triangulated approximation of the original surface.
Suppose we want to compute the integral∫
SK(s)ds of the quantity K(s) defined on the
surface.
1. Using fast level-crossing extraction algorithm, we obtain zero grid crossings Cri|i =
1..N and triangular patches Pai|i = 1..M. Let us denote the corners of the patch
Pai as Cr1(Pai),Cr2(Pai), and Cr3(Pai). The union of all patches approximate the
original surface:
UMi=1Pai ≈ S (4.29)
2. For each patch Pai, we compute the surface A(Pai), the normal n(Pai), and the
center of mass C(Pai).
3. The surface integral can therefore be approximated as
I =
∫
S
K(s)ds ≈M∑
i=1
K(C(Pai))A(Pai) (4.30)
where K(s) is the quantity defined on the surface point.
An example of the surface triangulated using level set function zero crossings as surface
points is shown in Figure 4·6.
75
Figure 4·6: Surface triangulation by the level set function zero crossingsextraction.
4.7 Extensions and additional issues
4.7.1 Weighted distributions
In certain applications it may be advantageous to impose different weights on different
ranges of feature function values. For instance, it may be more important to preserve
sharp corners than straight portions of boundaries. We propose an extension of our shape
prior energy to achieve that effect. Let us introduce a differentiable weighting function
w(λ). We now consider a typical term in general shape distribution prior energy term. We
write the modified energy as
E(Γ) =
∫
w(λ)[
H∗(λ) − H(Γ, λ)]2
dλ (4.31)
The weight emphasizes fidelity to certain features more then others. The derivation for
the curve flow minimizing this energy definition is presented in Appendix A.4.
4.7.2 Geodesic distance between distributions
In Section 2.8 we discuss and justify the choice of distribution difference measure used
throughout this work. Here we would like to suggest an alternative approach based on the
76
recent work by (Mio et al., 2005). In (Mio et al., 2005), the authors present an approach
to compute the distance between CDFs based on the shortest geodesic computation. The
geodesic is the trajectory in the space of PDF that transforms one PDF into another along
the shortest path. The results show that the geodesic path computed between two distri-
butions evolves PDF in a “shape-preserving” fashion, which means that as the distribution
is evolved, its shape is approximately preserved. In terms of the examples considered in
Section 2.8, the distance between unimodal distributions with closely located modes will
be small because the path of moving a mode is short. The same property holds for EMD
measure, as discussed in Section 2.8. Unfortunately, as for EMD measure, the analytic
form for the geodesic distance measure is not available. The measure computation involves
numerical optimization. Hence, analytical curve flow computation is not feasible.
Yet, one property of the method in (Mio et al., 2005) can make it useful for our purpose.
The geodesic computation in (Mio et al., 2005) involves the computation of the initial
tangent direction in the space of PDFs. Suppose we have the initial and target PDFs p0(x)
and p∗(x) respectively, and corresponding CDFs H0(x) and H∗(x). The tangent direction
f is computed in the space of the log-likelihood of the distribution p(x). That means the
elementary increment dp of the PDF is given by
dp = p0(ef − 1) (4.32)
It follows that the corresponding elementary increment dH∗ of the CDF is given by
dH0(x) =
x∫
−∞
p0(ef − 1) (4.33)
dH0(x) gives the direction in which CDF must be evolved to reach the target CDF along
the geodesic trajectory. Now we may find the curve flow that moves the curve in way
that the CDF evolution direction is as close as possible to the prescribed direction dH0(x).
To this end, let us construct the energy function based L2 norm in the space of CDF
77
increments. We desire to find the gradient curve flow that minimizes this energy:
E(Γ) =
∫
[dH0 − dH(Γ)]2 =
∫
[dH0 − (H(Γ) − H0)]2 =
∫
[H0 + dH0 − H(Γ)]2 (4.34)
Equation 4.34 is exactly one term in eq. 4.2 where the target distribution is H∗ = H0+dH0.
Therefore, we can find the gradient curve flow minimizing 4.34 by one of the two procedures
described in Section 4.3. As described previously, the curve evolved using such gradient
curve flow may not achieve the target distribution H∗ in general case. The reasons are
non-linearity of the manifold of realizable CDFs, discretization errors, and possible local
minima of the energy. Hence, the flow will move the PDF away from the computed geodesic,
and the geodesic direction must be re-estimated after each curve evolution step. Moreover,
since actual PDF evolution will not follow the geodesic exactly, we can not guarantee that
the along-the-geodesic distance measure will be minimized by the curve flow computed.
Computationally, this method is more extensive than using our formulation in eq. 4.2,
since additional optimization problem to find the geodesic direction f must be solved at
each iteration. The advantage of this method is the more intuitive formulation of the
distribution difference measure.
4.7.3 Shape distribution uniqueness issues
In this section we return to the discussion of the properties of shape distribution based
shape representations, namely, invertibility and uniqueness. Many known parameteriza-
tions of shape are invertible, meaning that a given shape representation, such as skeleton
representation, corresponds to one and only one shape. On the other hand, a given shape
distribution representation does not corresponds to a single shape in general case, which
is the consequence of the fact that distribution-based representation is constructed to be
invariant with respect to the location of features.
Let us consider the feature function #1 (inter-point distances) defined in this chapter.
In (Lemke et al., 2002), theoretical work has been done to study reconstructing discrete
point sets from the sets of measured inter-point distances between any 2 points in the set of
78
points. In fact, this problem was studied previously in X-ray crystallography and restriction
site mapping of DNA. (Lemke et al., 2002) defines sets of points that follow “turnpike”
and “beltway” configurations. The later implies the points to lie on a circular loop while
the former - on the line. A distance set is then defined as a combination of all inter-point
distances between the points in the set. Distances are measured as Euclidean distances
for “turnpike” configuration and as arc-length distances for “beltway” configuration. The
problem of reconstructing point coordinates from distance set boils down to reconstructing
1D coordinates of points (since the points are constrained to lie on the curve). Lower and
upper bounds on the number of possible solutions where shown to be sub-exponential with
respect to the size of the set of points. Unfortunately this problem setup is quite different
from our problem. In our problem, the points are restricted to be uniformly spaced along
the curve, while the curve has no predefined shape. However, we hypothesize that the
number of solutions (possible generating point sets for a given distance set) in our problem
can also be large.
We suggest that this solution non-uniqueness is the underlying property of the shape
distribution representation that is responsible for generalization ability of our prior. We
suggest that the configuration solving the point set reconstruction problem (the problem of
reconstructing the sets of points from measured inter-point distances) share some important
visual shape properties. It is even more difficult to approach the point set reconstruction
problem from the distribution similarity point of view. To our best knowledge, no research
has been done in this direction. More theoretical work is needed to fully understand the
issues of shape distribution representation uniqueness.
4.7.4 Computational complexity
In this section we present some results on the computational complexity involved when
using our shape prior. We only consider the computations needed to construct the curve
flow corresponding to our prior during one curve evolution iteration and ignore the overhead
computations required to evolve the curve, reinitialize level-set function, etc, since these
79
computations are independent of the used prior. We also do not consider the computations
required to construct (train) our model, since model construction only requires feature
distribution computation for each of the training shapes.
Let us consider a single feature function and a single curve evolution iteration. Overall
computational expense will be linear in the number of feature functions used. The number
of FLOPs needed to compute the feature function and the corresponding curve flow is
O(N2), where N is the number of nodes discretizing the curve. Assuming that the actual
number of FLOPs needed is 20N 2 and N = 200, on 1.8GHz processor, we estimate that
the required computation time is 0.02 seconds. Our MATLAB implementation on 1.8GHz
processor with significant overhead computations integrated, requires 0.6 sec and 0.8 sec for
the flow corresponding to inter-point distance and multi-scale curvature feature functions
respectively. Therefore, major optimization of our implementation can be carried out.
4.8 Summary
In this chapter we introduced a novel method of constructing and using shape prior. Our
method relies on modeling distributions of certain significant features of shapes and creating
a measure of similarity between these distributions. A key to the useful implementation of
obtained measure lies in our ability to incorporate it into a curve evolution framework as
a prior energy term. To our best knowledge, shape prior based on shape distributions is
for the first time formulated and coupled with curve evolution framework. In the second
part of this chapter we presented the 3D extension of our approach and discuss arising
implementation issues.
80
Chapter 5
Applications of shape distribution based shape
priors
In this chapter we consider applications of our shape modeling approach. First, we test
the behavior of our framework when the curve is guided solely by the shape prior force
based on shape distributions. We test the focusing ability of the prior to recover the single
curve used to construct the prior shape distributions. Next, we use our prior for image
segmentation in a curve evolution context. We perform several experiments; in particular,
we perform segmentation of images with occlusion. Finally, we explore another application
of our prior to find the intrinsic mean of several shapes. The work reported in this chapter
previously appeared in (Litvin and Karl, 2004a; Litvin and Karl, 2004b; Litvin and Karl,
2004c).
5.1 Shape focusing by shape term guided evolution
In this section we examine the evolution of a curve driven solely by our shape prior energy
term. We have removed the data term to examine and understand the effect of the prior
dependent flow term on the curve evolution. Roughly, this should produce the curve most
favored by the prior (modulo initialization effect and local minima). By using different
feature functions in our prior we can obtain some insights on how this most favored curve
is affected by the nature of the used feature functions. These insights are useful in designing
feature functions for a particular application.
Now the curve is evolved solely under the action of the shape term; that is, the energy
81
to be minimized is
E(Γ) =M∑
i=1
wi
∫
Λ
[H∗i (λ) − Hi(Γ, λ)]2 dλ (5.1)
where H∗i (λ) is computed on the target shape. This energy minimizes the distance in
eq. 4.4 between the evolving and target shapes, thus matching shape distributions of the
evolving curve and the target curve.
Experiment 1
In our first experiment we choose a rather simple target shape and evolve the curve
using different choices of feature functions to see how different feature functions capture
different aspects of a shape. Our method is flexible in that different elements of a shape
can be encoded through different feature function choices. We consider 3 different choices
of features in constructing the model: (1) feature function #1 (inter-point distances) alone
(w2 = 0); (2) feature function #2 (multi-scale curvatures) alone (w1 = 0); (3) features
functions #1 and #2 combined with w1 = w2. The evolution is started from the curve
shown by the dash-dotted line in Figure 5·1. The target shape (on which the target
distributions are computed) is shown by the dashed line. For each of the 3 experiments,
the curve is evolved until it stops. The result is shown by the solid line.
All 3 experiments yield shapes quite similar to the target shape, but small differences
give insight into what properties of the shape are captured by particular feature functions.
Note that our distribution-based representation is scale/rotation/translation invariant, so
differences in scale and position of the result are not considered important. First, we
consider the flow based on feature function #1. This feature function encodes distances
between pairs of points on the curve. The resulting curve (panel A in Figure 5·1) has
a slightly bent, elongated shape. In fact, boundary curvature is not captured directly by
feature function #1; therefore, it is expected that differences in global bending deformation
are not effectively corrected by the flow based on this feature function. However, the
distance between opposing boundaries captured by feature function #1 is well preserved
in the resulting shape. We now consider the flow based on feature function #2. This
82
feature function encodes curvatures measured at different scales. The flow based on feature
function #2 (panel B in Figure 5·1) yields a shape highly similar to the target shape but
slightly non-symmetric and cone-shaped. This is explained by the fact that feature function
#2 is designed to capture boundary curvature rather than relative boundary position. In
fact, the resulting shape has correct curvatures (straight lines and circular regions) but
relative boundary positions do not match those of the target (the result is a cone rather
than a tube). Finally, both flows combined (panel C in Figure 5·1) yield nearly a perfect
shape. The flows for both feature functions combined complement each other and work to
correct for their separate deficiencies. By including more terms into the prior energy, more
information about shape can be captured.
40 60 80 100
50
60
70
80
90
100
40 60 80 100 120 140
40
50
60
70
80
90
100
110
40 60 80 100
50
60
70
80
90
100
(A) (B) (C)
Figure 5·1: Evolution of an initial contour under the sole action of theprior flow: initial (dot-dashed), target (dashed), and resulting (solid) con-tours. (A) - prior constructed on the inter-point distances (#1); (B) - priorconstructed on multi-scale curvatures (#2); (C) - Both feature classes #1and #2 are included.
Experiment 2
In our previous experiment, the curve converges close to the target, indicating the
concaveness of the energy functional on the space of shapes. A prominent global minimum
allows the curve evolution to focus on the target shape. In our second experiment, we
show that in some cases shape complexity can lead to difficulty for our prior to focus
on the target. In fact, shape complexity increases the entropy of our shape distribution
descriptors (evidence given in (Page et al., 2003) and confirmed by our experiments),
83
Figure 5·2: Target plane shape.
Figure 5·3: Evolution of the contour under the action of prior flow: ini-tial (dot-dashed), and final (solid) contours. Target contour is shown inFigure 5·2.
making more curves have similar distributions. It is expected that the energy functional
on the space of shapes may have more significant local minima that can trap the evolving
curve. In this experiment we test the significance of such local minima by using a complex
target shape. We further suggest a modification of our prior that allows us to improve its
focusing properties on complex shapes.
Our new, more complex target shape is shown in figure 5·2. We use feature functions
#1 (inter-point distances) and #2 (multi-scale curvatures) in eq. 5.1. For each feature
function type (#1 and #2) we use 4 intervals S, leading to a total of 8 terms in eq. 5.1.
The result given by evolving the curve using the target distribution computed on the
84
curve in Figure 5·2 is given in Figure 5·3. The initial curve is the average over 16 prior
aircraft shapes and is used throughout experiments with aircraft shapes. The prior aircraft
shapes were not pre-aligned; hence, their average shape computed by averaging signed
distance transforms is not symmetric. This essentially random initial contour simulates
unsupervised, random initialization in real life situations. By not using a symmetric initial
curve we eliminate the possibility of obtaining better focusing due to well chosen initial-
ization. Our result (solid curve) shows that the evolution stops at the local minimum of
the energy functional, coming close to the global minimum (target shape itself). Notice
that the resulting shape is not symmetric. In fact, the shape energy does not encourage
symmetry. The evolving curve starts from the non-symmetric initialization and remains
non-symmetric until it stops at the local minimum which happens to be non-symmetric.
The structural elements of the resulting shape are well developed (see wings, tail). The
relative positions of the elements are maintained. The difference between the target and
the resulting shape is in the fine details of the tail, the wings, and the fuselage. Therefore,
our prior still leads to a curve evolution that focuses onto the target but can be trapped
into local minima in the vicinity of the global minimum.
Clearly, we would still like to have a focused prior on more complex shapes. Now we
propose the way to improve the focusing ability of our prior. We can capture the prior
shape at multiple scales to smooth out the energy. The motivation is that our prior has
better focusing effectiveness on the less complex shape; therefore, the prior at coarser scale
should be allowed to resolve larger shape details, while the prior at finer scale will focus
on fine details.
Now let us introduce the concrete implementation of this strategy. The question is
how to obtain the coarse shape representations and how to use the prior built on coarse
representations in our curve evolution approach. Addressing the second question, we choose
to augment the shape prior energy E(Γ) by additional terms corresponding to different
scales, which penalize the difference between the representation of Γ at a given scale (in
term of shape distributions) and the target shape representation at that scale. Now the
85
curve evolution flow should minimize the new, augmented energy.
In order to build the multi-scale shape representations, instead of sampling the curve,
we take advantage of using the level set framework. Suppose a signed distance function
transform is computed on the curve. In fact, the level sets corresponding to increasing
absolute non-zero values of the level set function represent the smoothed versions of the
contour (corresponding to the zero level set). Level sets corresponding to the positive large
distance from the contour are circles. The level sets inside the contour collapse to a point.
Therefore, we can approximate the shape representations at multiple scales by its level
sets. Furthermore, the level set is the natural way to relate the representations at multiple
scales to one another. By fixing the level set representation to be a signed distance function
transform, we automatically link our representations at multiple scales to a valid contour.
If we evolve these representation in the distance function preserving fashion, we achieve a
valid curve and its corresponding coarse representations at any step of the evolution.
Now let us assume that the signed distance transform of the target curve Γ∗ is D(Γ∗).
Suppose we extracted the level sets corresponding to the set of values L : l1..lN of
the distance function, and the resulting N contours are Γ∗l1 = D(Γ∗) = l1,...,Γ∗lN =
D(Γ∗) = lN. We now built the target distributions on each of these level sets. We
denote the target distribution for the feature function i, extracted from the contour Γ∗lj
as H∗lji . Let the current curve be denoted Γ and its distance transform D(Γ). The new
augmented prior energy is given by the sum of terms corresponding to the prior at different
scales. The term for each scale is constructed as the shape distribution difference measure
between the target distribution computed on the corresponding target level set and the
distribution computed on the corresponding level set of D(Γ) The new energy is
E(Γ) =N∑
j=1
M∑
i=1
wi
∫ [
H∗lji (λ) − Hi(D(Γ) = lj, λ)
]2dλ (5.2)
Let Γlj = D(Γ) = lj be the level set (at value lj) corresponding to the curve Γ. Now let
us abstract from the existence of the signed distance transform embedding Γ and assume
86
that Γlj are separate evolving curves. Minimization of eq. 5.2 can now be rewritten as a
constrained minimization problem
Γ+ = argmin
Γ:Γlj =D(Γ)=lj ∀j
N∑
j=1
M∑
i=1
wi
∫ [
H∗lji (λ) − Hi(Γ
lj , λ)]2
dλ (5.3)
Unfortunately, the constraints in eq. 5.3 can not be written in the closed analytical
form; therefore, a closed form for the curve flow minimizing the energy in eq. 5.3 is difficult
to find. We propose an approximate solution to solve this minimization problem. Let us
relax the constraints in eq. 5.3 and assume that each curve Γlj evolves independently. In
such a case, the energy decouples and the flow for each curve Γlj is given by
dΓlj
dt= −∇E(Γlj ) = −∇
(M∑
i=1
wi
∫ [
H∗lji (λ) − Hi(Γ
lj , λ)]2
dλ
)
(5.4)
where the flow ∇E(Γlj ) is the gradient curve flow for Γlj minimizing the corresponding
energy term. Let us now move back to the level set domain. Suppose we start from the
signed distance transform D(Γ) and evolve the level set function infinitsmally in such a
way that eq. 5.4 holds for any j ∈ L. We can do so by evolving the level set function at
each point according to the curve flow corresponding to the closest level set. We can write
such evolution as
Dt(x) = −dΓD(x)
dt∇D(x) (5.5)
where the flows dΓD(x)
dtare given by eq. 5.4, with lj = round(D(x)). After an evolution step
according to eq. 5.5, the level set function is no longer a signed distance transform. We may
now perform the correcting step, trying to make D(Γ) a signed distance transform again.
In (Sethian, 1999), the PDE-based approach to perform such correction was introduced for
dynamic re-initialization. In (Sethian, 1999), it is shown that the level set function update
flow necessary to produce the signed distance transform is ∇D. We can now perform the
update and correction of the level set function using the following PDE
Dt(x) = −dΓD(x)
dt∇D(x) + β∇D (5.6)
87
where β is the strength of the correction term chosen empirically. Using large enough β
the flow in eq. 5.6 approximately preserves the signed distance transform property of D.
In the absence of local minima, the PDE in eq. 5.6 approximately minimizes the energy in
eq. 5.2.
Figure 5·4: Evolution of the contour under using multi-scale prior definedon a group of level sets: initial (dot-dashed), and resulting (solid) contours.
The result of applying this procedure to the current shape focusing example is shown in
Figure 5·4. We used 61 terms in the energy formulation in eq. 5.2 with L : l1 = −30, l2 =
−29, ..., lN = 30. The final shape is now closer to the target shape in Figure 5·2. Notice
the improved shape of the tail (narrow neck is better pronounced), wings, and fuselage.
The resulting shape is also nearly symmetric despite the non-symmetric initialization. We
confirm our intuition that the multi-scale application of our prior can improve the focusing
ability of the shape prior and ease the problem of local energy minima in case of complex
shapes.
5.2 Image segmentation
In the previous section we investigated the ability of our prior to generate the curve flow
that converges onto the target shape. We built the energy formulation that did not include
the image data. We showed that the prior cost function based on our formulation indeed
captures global shape geometry. The flow computed on the prior results in a curve similar
to the target curve, which is the evidence of the convexity of our prior shape energy term
88
on the space of shapes.
We now focus on image segmentation applications that are the most important appli-
cation of our prior. Recall that in our image segmentation framework, the solution curve
Γ∗ is sought as the minimizer of the following segmentation functional
Γ∗ = argmin
Γ
E(Γ) = argmin
Γ
Eint(Γ) + αEshape(Γ) (5.7)
where Eint is the intensity data term and Eshape is the shape prior term. Our goal is to
study the properties of our shape prior on segmentation problem by comparing different
choices of Eshape while using the same data term Eint.
Experiment 1
First, we approach the problem of segmenting a polygonal shape from bi-modal images
corrupted by noise. Our goal is to obtain the segmentation satisfying two criteria:
• Low segmentation error in terms of area distance between the resulting contour and
the true contour. We use Hamming distance to quantify the segmentation error.
• Subjective visual similarity between the result and the true shape. We will define
visual similarity as the preservation of significant visual features of the true shape. For
the considered case, we would like to preserve sharp corners and straight boundary
segments.
We compare the results for the following forms of the prior energy Eshape:
• Our distributionally based prior. Energy Eshape is given by in eq. 4.2
• A generic curve length prior Eshape(Γ) =∫
Γ
ds
• Leventon’s “probabilistic” prior on level-set curvatures presented in (Leventon et al.,
2000b). Energy Eshape is given by eq. 2.17.
• A PCA prior introduced in Section 2.4.2. Energy Eshape is defined as the Hamming
distance between the segmenting curve and it’s projection onto the PCA space. The
89
prior energy is given by eq. 2.18.
Since our focus is on the shape prior, we adopt a simple region-based data fidelity term
that we combine with different choices of the prior. The data fidelity term is given by the
probability of the observed image, given the location of the boundary: Eint = − log p(I|Γ).
In our model we assume a bi-level image model and that intensities inside and outside of
the object boundary, are known. In the case of Gaussian noise, the data fidelity term Eint
is then given by (Chan and Vese, 2001)
Ed =
∫
Ru
(I − u)2dA +
∫
Rv
(I − v)2dA (5.8)
where u and v are the known image intensities inside and outside of the segmenting bound-
ary and were integration is carried out over inside and outside areas Ru and Rv respectively.
The curve flow component corresponding to a gradient descent with respect to eq. 5.8 is
given by
dΓ
dt=
(u − v)
2
(
I −u + v
2
)
~N (5.9)
Our shape distribution based prior in eq. 4.2 is constructed using two feature functions:
inter-point distances and multi-scale curvatures using one distribution corresponding to
the interval S[a,b] = [0, 0.5] for each feature function. The target distributions H∗ for each
feature function are computed as averages of distributions of the 4 prior triangular shapes
shown in Figure 5·6 (A). The regularization parameters in eq. 5.7 in each choice of the
prior were chosen to obtain the best result for that prior. In Figure 5·5 we compare the
segmentation results given by different choices of Eprior. Gaussian IID noise (SNR=−17.5
db) was added to a bi-modal image of a triangle to create the data image. The true
boundary is shown by a solid black line. We show the segmentation obtained using our
distribution-based prior model in frame (A); the PCA method in frame (B); the result of
using the curvature density prior of (Leventon et al., 2000b) in frame (D); and the result
of using the curve length penalty in frame (C).
The goal of this experiment is to illustrate the significant advantage of using our method
90
Symm. Dist = 274.89 Symm. Dist = 314.9
(A) (B)
Symm. Dist = 294.3 Symm. Dist = 296.32
(C) (D)
Figure 5·5: Segmentation results. A: Our method; B: PCA; C: Curvelength penalty prior; D: Method in (Leventon et al., 2000b); White - finalresult; Black - true shape boundary; Dashed line - initial curve. Symmetricarea distance (in pixels) between true boundary and final result is shownon the top of each panel.
(A) (B)
Figure 5·6: Prior shapes used to construct the prior in our experiment. A:triangular shapes (experiment 1); B: polygonal shapes (experiment 2).
91
over the existing approaches when training shapes visually similar to the segmented ob-
ject shape are provided to the algorithm. The curve length penalty (C) yields smoothed
curve, obviously not a desirable solution. The Leventon’s prior (B) yields the curve with
somewhat straighter boundaries. In fact, Leventon’s method uses prior information on the
level-set curvatures of the training shapes. Since level-sets computed on the training shapes
have predominantly zero curvature, the prior tends to straighten up the boundaries. Un-
fortunately, level-set smoothness is enforced uniformly across the image plane, and corners
are not preserved. The PCA method (B) yields the shape which has even straighter bound-
aries and sharper corners. Unfortunately, the space spanned by the principal components
contains smoothed shapes and does not preserve sharp corners. Finally, our method (A)
gives the best result, both in terms of preserving the sharp corners and straight boundary
segments as well as in terms of shape area distance measure. In Table 5.1, we summa-
rize the segmentation errors as measured by the symmetric distance (in pixels) between
the true boundary and the final result. For our method (A), the resulting error is mostly
attributed to the bias in location and angular position of the resulting shape, while the
error for other two methods is due to shape ”distortion”. In Table 5.1, we also show the
distribution difference measure which we use in Eshape in eq. 4.2. As expected, our result
yields a much smaller value for this measure because this measure is directly the part of
the segmentation functional being minimized.
Table 5.1: Experiment 1: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.
Our method PCA method Curve length Leventon
Symmetric area distance 274 314 294 296Distribution distance 0.03 0.14
Experiment 2
92
Symm. Dist = 309.26 Symm. Dist = 416.09
(A) (B)
Symm. Dist = 294.3
(C)
Figure 5·7: Segmentation results, polygonal prior. A: Our method; B:PCA; C: Curve length penalty prior; White - final result; Black - trueshape boundary; Dashed line - initial curve. Symmetric area distance (inpixels) between true boundary and final result is shown on the top of eachpanel.
93
In our next experiment we test the robustness of our technique to the use of different
sets of training shapes. Ideally, one would like to have a good segmentation whenever the
shapes used for training share significant visual features with the expected shape. We use
the same experimental set up as described previously but now we use a different set of
training shapes. We now use polygonal prior shapes shown in Figure 5·6 (B) which are far
less similar to the ground truth shape.
The result for this prior is shown in Figure 5·7 compared with the result obtained
using the PCA prior (B) and with minimum curve length prior (C). Comparing our prior
with minimum curve length prior, one can see that despite a slightly larger area distance
for our result, our result is still visually superior to that obtained with the curve length
prior since our result preserves sharp corners and straight boundary segments. Resulting
errors are summarized in Table 5.2. The visual superiority of our result is reflected in
smaller distribution difference measure obtained using our prior. Comparison with the
result using a PCA-based prior is most illustrative. Different polygonal shapes used here
have prominent features (corners) located at different arbitrary positions. Alignment of
the different polygonal shapes does not bring their prominent features into correspondence.
As a result, the constructed PCA space only contains the four training polygonal shapes as
special cases and does not contain triangular shapes. The solution given by the PCA-based
method gives the shape that is far from the true triangular shape.
Table 5.2: Experiment 2: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.
Our method PCA method Curve length
Symmetric area distance 309 416 294Distribution distance 0.03 0.08
Experiment 3
94
The segmentation of noisy real images with limited prior data is an interesting ap-
plication that can be addressed by our prior. Suppose we want to segment a 3D object
from its 2D projection given that two views of the object to be segmented are known to
the algorithm. These views can be given by previously segmented contours in a tracking
experiment. In Figure 5·8, we show the noise free image to be segmented in panel (B) and
2 training contours of the object to be segmented in panel (A). It is important to note that
none of the training contours is close to the object shape to be segmented.
(A) (B)
Figure 5·8: Experiment 3: (A) - training shapes; (B) - noise free image
We employ the same segmentation technique as used in previous experiments. In Fig-
ure 5·9 we show the results obtained using our prior, the curve length prior, and the PCA
prior. Table 5.3 summarizes the segmentation errors obtained using the three methods.
The curve length prior yields almost straight boundaries at the expense of smoothed cor-
ners. The PCA space does not contain the true object, resulting in the large error. Our
segmentation preserves the salient shape features (straight boundaries and sharp corners).
Our result also has the lowest segmentation error, both in terms of the area-based error
measure and the shape distribution based measure.
Experiment 4
We now apply our technique to knee cartilage segmentation from MRI data. This type
95
(A) (B)
(C)
Figure 5·9: Experiment 3 segmentation results. A: Our method; B: PCA;C: Curve length penalty prior; White - final result; Black - true shapeboundary; Dashed line - initial curve.
Table 5.3: Experiment 3: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.
Our method PCA method Curve length
Symmetric area distance 697 1492 729Distribution distance 0.048 0.11
96
(A) (B)
(C) (D)
Figure 5·10: Knee cartilage segmentation results. A: Initial (dashed line)and true (solid line) contours; B: Our method (solid line); C: Leventon’s(solid line) D: Curve length penalty prior (solid line).
97
of problem arises in osteoarthritis disease diagnostics and treatment. By measuring the
segmented cartilage thickness, doctors can assess the rate of cartilage loss due to the disease
progression and evaluate medication effectiveness.
In a clinical setting, the segmentation of the cartilage has been routinely done using
manual region growing techniques. A number of automatic and semi-automatic segmen-
tation techniques have been applied to cartilage segmentation, see (Lynch et al., 2000;
Kapur, 1999; Cohen et al., 1999) and references therein. The difficulties faced by au-
tomated cartilage segmentation include diffuse boundaries, neighboring structures with
similar intensities, and very narrow shape. These factors require a strong prior to guide
the segmenting contour toward the correct boundaries. The presence of salient features
(cartilage corners) whose particular location changes from subject to subject makes our
shape prior an appealing approach. The presence of obstructing neighboring structures
requires an initial contour that is rather close to the true boundary. However, we claim
that the initial solution for this problem is easy to acquire using model-based bone segmen-
tation algorithms, such as in (Kapur, 1999). In our illustrative experiments we construct
the initial contour by uniformly expanding (5-10 pixels) the true contour superimposed on
the image being segmented. In Figure 5·10 we present results of applying different shape
priors to segment the cartilage. We compare the results given by our distribution-based
prior, statistical level-set curvature prior proposed by Leventon (see Section 2.4.2), and
curve length prior. In constructing our prior and Leventon’s shape prior we used manually
segmented cartilage boundary. In Figure 5·10, panel (A) we show the manually segmented
cartilage boundary and the initial contour used in our experiments. In panel (B) we show
the result given by our prior. Note the location of the upper right corner of the cartilage.
In our result, this corner propagates further along the cartilage boundary comparing with
the manually segmented boundary. This is the consequence of the generalizing properties
of our prior, which effectively constrains the width of the cartilage and the sharpness of
corners. In panel (D) we show a segmentation result given by applying a curve length
prior. The cartilage corners are roughly at the correct locations but the bottom part of
98
the cartilage expands into the adjacent structure driven by the high image intensity area
close to the cartilage. In fact, the curve length prior effect is weak in the areas of low
curvature and can not prevent the expansion. Increasing the prior strength leads to the
rapid collapse of the contour when the curve curvature force at the tip of the cartilage
overcomes the local image force. Leventon’s prior in panel (C) has roughly the same effect
as the curve length prior. In fact, the average curvature of the level set computed on the
true cartilage shape is low. The curvature of the boundary tends to this low value of the
curvature during the evolution leading to the corner smoothing effect.
The experiments in this section illustrate the ability of our prior to yield segmentations
preserving the salient features observed in the training shapes given limited training data
and high variability in the training data. For the considered problems, the results given
by our shape prior are consistently better than the results given by the alternative choices
of shape prior in terms of the segmentation error and subjective visual similarity.
5.3 Image segmentation with occlusion
Partial object occlusion is the difficulty which is encountered frequently in different im-
age analysis applications. An example is the surgical navigation task, where anatomical
structures and surgical tools must be segmented from X-Ray or optical real-time imagery.
In this section we investigate the ability of our framework to segment objects in case of
partial occlusion. We are particularly interested in reconstructing the salient shape fea-
tures characteristic for the training data. We focus on the case of limited prior data. We
demonstrate that our prior can enforce the reconstruction of the salient features and is
flexible with respect to the location of these features in the segmented image. We further
illustrate the invariance properties of our prior.
Experiment 1
First, we consider a segmentation example where training and true shapes are nearly
identical, except for the location of one prominent feature. Our training shapes are shown
99
in Figure 5·11, panel (A). The only difference between these shapes is the position of
the dorsal fin. In the true shape, the dorsal fin takes yet another position. Positions
of the dorsal fin in the training shapes and true shapes are shown in Figure 5·11, panel
(B). It can be easily seen that the prior constructed on these shapes is very difficult to
generalize using traditional approaches. The level-set based PCA approach applied to the
training shapes produces eigenshapes that reduce and grow the dorsal fin in 3 locations
corresponding to the locations of the fin in 3 training shapes. Indeed, the shortest path
to change the position of the fin in terms of the change of the boundary coordinates is to
reduce the fin in one location and grow it in another location. Among explicit boundary
parameterization based approaches, only the part based approaches can reliably encode
the change of the fin position and generalize to the new unobserved positions of the fin.
The goal of this experiment is to test the ability of our prior to learn the dorsal fin from its
different positions in the training shapes and reconstruct the fin under different occlusion
patterns.
(A)
(B)
Figure 5·11: Occlusion experiment 1. (A) - training shapes; (B) - trueobject contour (thick line) superimposed with training shapes (thin lines).This plot illustrates that the prominent feature location is different in thetrue object and in all training shapes.
100
Using the ground truth contour in Figure 5·11, panel (B), we construct the bimodal
image and add IDD Gaussian noise with SNR -5dB. Accordingly, we use the data curve flow
component given by eq. 5.9. Our prior uses 2 feature functions: inter-point distances and
multi-scale curvatures. One interval S[a,b] = [0, 0.5] is used to construct the distributions
for each of the 2 feature functions. We experiment with 2 occlusion patterns. Occlusion
pattern 1 occludes the dorsal fin. In this case, we desire to reconstruct the fin in its correct
location. Occlusion pattern 2 does not occlude the dorsal fin. In this case, the position of
the dorsal fin is enforced by the data and we desire to verify that our prior does not produce
an extra fin. Our results for this problem are shown in Figure 5·12. Panel (A) shows the
initial contour obtained by averaging the distance function transforms of the three training
contours. The white dashed line in panels (B,C,D) shows the occluded region, while the
solid white line stands for the segmentation result and the black solid line corresponds to
the true contour. Panels (B) and (C) show the results for occlusion pattern 1, using our
prior and PCA prior (as described in Section 2.4.2) respectively. One can see that with our
prior, we are able to reconstruct the “missing” dorsal fin. The location of this fin is slightly
misplaced with respect to the true location. In fact, we do not have data in the occluded
region to correct the position of the fin, so we obtain an equivalently good solution given
the data. As we expected, the PCA prior does not reconstruct the dorsal fin since PCA
components constructed on 3 training shapes do not describe the new position of the dorsal
fin. Panel (D) shows the result using our prior on occlusion pattern 2. Since the dorsal fin
is reconstructed by the data, the straight boundaries in the occluded area should remain
straight and in fact, that is the case in our result. This experiment confirms that our prior
measures and enforces the “amount” of certain feature in the shape and does not lead to
spurious perturbations unexplained by the training data.
Our prior effectively generalizes to the unobserved location of the prominent shape
feature. It is also important to note that we use previously proposed general use feature
functions, namely inter-point distances and multi-scale curvatures, which are not specifi-
cally adapted to the shape used in this experiment.
101
(A) (B)
(C) (D)
Figure 5·12: Occlusion experiment 1. Noisy image is shown in all fourpanels. (A) - Initial contour (dashed line); (B,C,D): Dashed line - occludedregion; Black solid line - true boundary; White solid line - segmentationresult. (B) - occlusion pattern 1, result using our prior. (C) - occlusionpattern 1, result using PCA prior (C). (D) occlusion pattern 2, result usingour prior.
102
Experiment 2
In this experiment we test the ability of our prior to reconstruct the occluded single
visual feature in case of multiple similar features present in the shape. We use hand shapes
and assume that one finger in the noisy image of a hand is occluded. We use manually
segmented contours of hands as training data and test images. Four contours were used to
train the model (Figure 5·13, panel (A)), and one contour was used as the testing example.
Notice that some training examples are flipped. This will negatively affect a PCA based
model, but our model construction process is invariant with respect to mirror transform.
Again, a bi-level image model was used with -5dB Gaussian IID noise. In Figure 5·13, panels
(B) and (C), we show the results obtained using our shape distribution based shape prior,
and PCA prior respectively. In order to construct our prior we used 2 feature functions,
inter-point distances and multi-scale curvatures, using one interval S[a,b] = [0, 0.2] for each
of the feature functions. PCA prior fails to reconstruct the occluded finger since the
principle components constructed on the training shapes attempt to account for the mirror
transform applied to hand shapes rather then for finger displacements. On the other hand,
our prior is able to approximately reconstruct the occluded finger.
Experiment 3
We perform yet another test of the abilities of our model to reconstruct partially oc-
cluded shapes. A collection of aircraft contours was used in this experiment, as shown
in Figure 5·14. The shapes are very different, yet they all share easily recognizable vi-
sual features, such as 2 wings, and a fuselage. Most of the training shapes also have a
delta-shaped tail. We construct the bi-level noisy image using contour #2 and impose the
occlusion pattern obstructing part of the wing and the tail (see Figure 5·15). We attempt
to test the ability of the prior to encode and reconstruct the characteristic features for the
class of shapes, based on highly variable training shapes. We construct our prior using all
16 contours. Our results are presented in Figure 5·15. The reconstructed wing does not
lie close to the ground truth shape but represents an “average wing” for the aircraft in
103
(A) (B)
(C)
Figure 5·13: Experiment 2: Segmentation with occlusion. (A) - priorshapes; (B) - result using our prior and (C) - PCA prior. Dashed whiterectangle - occluded region; Dashed white circle - initial contour; Blacksolid line - true boundary; White solid line - segmentation result.
104
the training set, as expected. Notice that the asymmetric resulting contour is consistent
with our prior being insensitive to the symmetry or the lack thereof. One may consider the
feature functions that capture and enforce the symmetry in order to obtain more visually
appealing solution for this problem.
Our experiments with segmentation of occluded shape are designed to illustrate the
generalizing properties of our shape model. Our model is able to capture and reconstruct
features that are otherwise only possible to capture using explicit, part based, task specific
models.
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16
Figure 5·14: Training planes shapes.
5.4 Average shape computation
We now consider other applications of our shape prior energy formulation. We consider
tasks that do not involve image data. One task of interest is the shape interpolation
problem, that is finding the intermediate curves that can be obtained by transforming one
curve into another. Video sequence object-based resampling is one of possible applications.
Another task of interest is finding the average shape over a group of shapes, which can be
considered a particular case of shape interpolation if the average of the two shapes is sought.
Our proposed framework provides a straightforward approach to the determination of the
105
Figure 5·15: Experiment 3: segmentation of plane with occlusion. Planesilhouette #2 was used to form the image. Dashed white rectangle - oc-cluded region; Dashed white smooth contour - initial curve; Black solid line- true boundary; White solid line - segmentation result.
average shape Γ of a collection of N shapes Γi. We will consider average shape computation
task in greater detail. In the context of our shape prior, our goal is to find the average
shape that would capture significant features of the shapes Γi while being close to all shapes
in some sense. We will compare our result to the average shapes computed using other
approaches and underline significant difference in using our approach.
One way to define an average shape is to find the average of the shape representa-
tions corresponding to prior shapes. This method is often used in practice (i.e. in PCA
methods); however, due to non-linearity of shape spaces, this method is typically not well
justified. Furthermore, this method inherently smoothes out the salient features of shapes
and therefore is not a good match for our goal. A better way to define an average shape
of the collection of shapes Γi is to find an intrinsic mean shape Γ that minimizes the total
distance to all shapes Γi. Of course, the resulting average shape depends on the used defi-
nition of the distance between shapes. Intrinsic mean is also known as the Karcher mean
106
and can be found as (Maurel and Sapiro, 2003; Charpiat et al., 2003; Le, 1991):
Γ = argmin
Γ
N∑
j=1
d2(Γ, Γj) (5.10)
where d(Γ1, Γ2) is the distance between shapes Γ1 and Γ2. To this end, let us define the
shape distribution-based distance between two curves Γ1 and Γ2 in as follows:
d(Γ1, Γ2) =
√√√√√
M∑
i=1
wi
∫
Λ
[H i(Γ1, λ) − H i(Γ2, λ)]2 dλ (5.11)
By substituting eq. 5.11 into eq. 5.10 it can be shown that the resulting mean curve Γ is the
curve which minimizes the distance measure between its distributions and the average of
feature distributions corresponding to prior curves Γi. Therefore, one effectively needs to
minimize eq. 4.2 using the corresponding average distributions as the target distributions
H∗(λ).
From the shape focusing experiments in Section 5.1 we saw that one shape distribution
distance based term in eq. 5.10 results in a flow that converges close to the target shape.
The importance of local minima increases with the shape complexity. Now the target
distributions are computed as averages over prior curves and these average distributions
do not necessarily correspond to any shape. One may argue that local minima may be
more significant in this situation. We will show that the flow computed using such average
target distributions still converges in the vicinity of the contour that we can consider
a visual average of the prior shapes. We show that alternative definitions of the shape
distance do not produce such visual average shape.
From the Karcher mean point of view, we seek the curve that is closest to prior ex-
amples in terms of its feature distributions’ differences. We can loosely interpret shape
distributions as probabilities of occurrence of certain characteristic features of the shape,
such as boundary segments with certain curvature values. Using this interpretation, we
seek the curve with average frequency of occurrence of these characteristic feature. We will
107
see that our results appear to confirm this loose interpretation.
Experiment 1
In our first example we intend to compare the results of using our definition of shape
distance in eq. 5.11 with other choices of the shape distance definition. We will illustrate
that using traditional curve distance measures in eq. 5.10 often results in unsatisfactory
mean shapes. In Figure 5·16, we show two shape instances (solid lines), whose mean shape
we would like to find. From the visual similarity point of view, the mean of these two
triangles should be a triangle with two corners coinciding with the matching corners of
the prior shapes and the rightmost corner located somewhere between the corresponding
corners of the two prior shapes.
An example of a generic curve distance measure is the Chamfer distance defined as
d(Γ1, Γ2) =
∫
Γ1
min||x − Γ2||ds (5.12)
where the integration is carried out along Γ1 accumulating the Euclidean distance between
current point on Γ1 and curve Γ2. In Figure 5·16 (A) we show the mean shape corresponding
to this definition of the distance. Clearly, such a mean shape is wrong from the point of
view of visual similarity.
Another often used shape difference measure based on the total area between shapes is
the Hamming distance
d(Γ1, Γ2) =
∫
A: sign(D(Γ1))6=sign(D(Γ2))
dS (5.13)
where D(Γ1) and D(Γ2) are signed distance transforms for shapes Γ1 and Γ2 respectively.
When used in eq. 5.10, this shape difference measure yields an infinite number of solutions
for the mean shape. These solutions are located in the shaded areas in Figure 5·16 (B).
None of those shapes is a triangle with the right proportions (a perceptual mean shape).
A similar result is obtained using the Hausdorff distance (Charpiat et al., 2003). For this
108
(A) (B) (C)
Figure 5·16: The average shape of 2 triangles obtained using differentshape distant measures: solid lines - prior shapes; dashed line - correspond-ing average shape; filled areas - the family of solutions. (A) - asymmetricdistance based measure; (B) - area based measure. One of the possiblesolution is shown by dashed line; (C) - our distribution difference measure(dash-dotted line - evolution result; dashed line - scaled result).
measure, the solution to eq. 5.10 is not unique, although the set of solutions includes a
perceptually correct mean shape.
Finally, we use our measure of shape difference in eq. 5.11 based on the difference
between shape feature distributions. We construct the difference measure using 2 feature
functions: the inter-point distances and multi-scale curvatures (#1 and #2). In Figure 5·16
(C) we show the result of applying our method to find the mean shape. We initialize the
solution using the result obtained for the Chamfer distance in Figure 5·16 (A). The contour
produced by the shape distribution distance is shown by the dash-dotted line. The size of
this shape is smaller than that of the original shapes due to initialization. This behavior is
consistent with the fact that our measure is insensitive to scale, translation, and rotation
by design. We manually scale and shift the resulting contour to match the position of the
original shapes for visualization purposes (dashed line). One can see that the scaled result
produces the expected “mean shape” from a visual similarity point of view.
It is important to note that using shape models (measures) that explicitly capture the
correspondence of shape features, such as the point distribution model (PDM) in (Cootes
et al., 1995), would also find an average shape with good visual similarity properties.
Our achievement is to obtain a similar good result without correspondence but only using
109
the aggregate shape dissimilarity measure based on uniformly sampled information about
shapes.
Experiment 2
In our second example we compute the average shapes over two sets of aircraft contours
(we use the database from (Jaggi et al., 1999)). We demonstrate that our average shape
computation scheme is capable of capturing the salient structure shared by a group of
shapes. The two groups of shapes used in this experiment are shown in Figure 5·17, panels
A and B. In each group, the shapes are different but share certain common features. The
shapes in group A have narrow wings bent backwards at approximately 45 degree angle
and tails with a concavity in the middle. The shapes in group B have delta-shaped wings
and narrow protrusions on the front side of the wing. In order to effectively capture these
shape groups, we extend the descriptive properties of our prior. We use 4 distributions for
each of the feature functions #1 and #2. These distributions are created using four non-
overlapping intervals S of equal length. We use a non-symmetric initialization computed
as the average over the signed distance transforms of non-aligned 16 prior aircraft shapes
(dash-dotted contours). Our results show the preservation of the characteristics of prior
shapes, such as wing shape and thickness, tail shape. Resulting shapes are slightly non-
symmetric. This non-symmetry is caused by the non-symmetric initialization and the
evolution path ending at a non-symmetric local minimum. One may also notice 2 humps
on on the front side of the wing for case B. Since the wing protrusions are highly variable
in the (case B) prior shapes, our average shape is “uncertain” on the shape and location of
these protrusions, but it is important that these protrusions appear in the result at their
average locations. This result shows that our prior is capable of capturing and averaging
the global shape structure and salient features, such as localized protrusions. Since we
use the initialization that is far from the final result, we predict robustness of our average
shape solution with respect to initialization.
Experiment 3
110
(A) (B)
Figure 5·17: Initial (dash-dotted contour) and average shapes (solid con-tour) for 2 groups of shapes. Prior shapes in each group are shown on thetop of each panel.
Figure 5·18: Experiment 3: Example shapes from (Klassen et al., 2004).
So far, we experimented with shape groups that were relatively similar: triangles or
plane shapes of roughly similar design. Now we attempt to capture more subtle shape sim-
ilarity, coming close to what we call perceptual shape similarity. In this experiment we use
the shapes that share only one common characteristic of being polygons (see Figure 5·19).
We test the ability of our prior to capture this salient shape property and yield the average
shape that has the structure of a polygon.
The set of shapes in Figure 5·19 have been previously used in (Klassen et al., 2004),
where the average shape is computed using the Karcher mean formula. The shape distance
measure was taken to be a geodesic length of the path connecting two shapes’ arc-length
111
(A) (B)
Figure 5·19: Experiment 3: Average shapes computed on shapes in Fig-ure 5·18: (A) Result in (Klassen et al., 2004); (B) Our result.
angle function representations. Pose invariance and invariance with respect to the start-
ing point of the angle-function parameterization were intrinsicly encoded into the distance
computation. Still, this technique represents curves by angle functions and hence it suf-
fers the drawbacks of all parametric shape representations. In the process of finding the
“average” representation using Karcher mean formula, angle function representations are
effectively averaged. Hence, we may expect averaging of salient features of the shape. The
resulting average shape is given in Fig. 8 in (Klassen et al., 2004), which we reproduce in
Figure 5·19, panel (A). As expected, only one corner is present in the resulting shape. This
is the corner that occurs at the same location in all input shapes. Parts of the resulting
boundary are rounded which is not a feature of prior shape examples.
We now apply our approach to find the average shape using the Karcher mean formula
combined with shape distance measure based on shape distribution difference. Our result
is shown in Figure 5·19, panel (B). We obtain the shape with distinct corners and straight
sides. It is interesting that while we impose no constraints on the number of corners, we
obtain a mean shape with an average number of corners (our result has 5 corners and the
input shapes have from 3 to 6 corners). This result conforms to our intuitive interpretation
of the embedding of the distributional representation into the Karcher mean formula. Our
intuitive interpretation states that since our distribution based prior captures the amount
of certain feature in the shape, Karcher mean will effectively find the shape with average
112
amount of this feature. Our result is also approximately symmetric, as input shapes are.
Overall, we appear to obtain an average shape with better retention of the key visual
characteristics of this collection of objects.
5.5 Summary
In this Chapter we present the experiments demonstrating the properties of the 2D shape
prior developed in Chapter 4. Our shape focusing experiments show the ability of the
prior curve flow to transform one curve into another curve resembling the prior example.
Our experiments illustrate the dependence of captured properties of shape on the chosen
feature functions. Our segmentation experiments show the ability of our prior to extract
the boundary that is visually similar to the training data shapes and superior in terms of
quantitative segmentation error measure, comparing with the boundary produced by alter-
native techniques. In segmentation with occlusion experiments, our prior shows the ability
to reconstruct missing salient features of training shapes at new locations, unobserved in
the training shapes. Alternative techniques are not capable to achieve the same degree of
generalization and robustness. In another experiment we show the ability of our prior to
find visual average of given shapes, unattainable by existing approaches.
Our shape prior methodology is shown to benefit from the key properties of shape
distribution based shape representations. Among others, invariance properties allows us
to make registration intrinsic to the model construction. Distributional nature of the used
shape descriptors is responsible for excellent generalization properties. Flexibility allows
us to use different feature functions tailored to particular applications.
113
Chapter 6
Joint segmentation of multiple objects using
shape distribution based shape prior
In this chapter we focus on multiple object segmentation problems where prior information
on relative object positions can be beneficial. The work reported in this chapter previously
appeared in (Litvin and Karl, 2005a; Litvin and Karl, 2005b).
6.1 Introduction
Let us now consider segmentation problems that involve multiple structures of interest
with consistent relative positions. Such applications are abundant in medical imaging,
where multiple organs often need to be segmented simultaneously. In medical images,
multiple structures often preserve certain relative positioning across subjects and these
relationships can be used to properly constrain the solution space. Even in the case when
only one structure needs to be segmented, neighboring structures, if included in the process,
can provide additional useful information/guidance to segment the structure of interest.
Examples of such problems include knee cartilage and femur segmentation, bladder and
rectum segmentation, vertebra segmentation and so on.
Because of its potential, simultaneous multiple object segmentation is an important
direction of research in medical imaging. One can distinguish two directions undertaken
to use information about multiple objects in segmentation. The approaches in the first
group aim at constructing joint shape prior on multiple structures. In (Yang et al., 2003;
Tsai et al., 2003; Freedman et al., 2004) the authors extend the PCA shape model to
constrain multiple object locations. In (Duta and Sonka, 1998), a Point Distribution
114
Model (PDM) was extended to model multiple shapes. The approaches in this group
capture information on the entire boundaries of multiple objects. However, they reflect
the advantages and drawbacks of their single object versions. For instance, generalizability
can be problematic for PCA based multiple object model as well as for PCA single object
model. Our experiments later in this chapter illustrate this point.
A different group of approaches attempt to apply constraints between multiple objects
to preserve certain properties of their boundaries relative to each other. In (Ho and Shi,
2004; Paragios and Deriche, 2000; Alexandrov and Santosa, 2005) the authors mutually
constrained shapes by penalizing quantities local to the area of contact between different
structures, such as area of overlap. Approaches in this group are only useful to prevent the
overlap of different structures. In (Shi, 2005), more general minimin criterion is applied
to evolving multiple curves. This criterion is designed to locally repel curves positioned
closer to each other than a specified distance. Approaches in this group are only concerned
about parts of curves positioned close to each other and only model/penalize one specific
quantity - distance. Therefore, these methods are limited since they take into account only
one aspect of mutual shape positioning and ignoring other information.
A different idea was proposed in (Matsakis et al., 2004) to characterize mutual positions
of shape boundaries. In this approach, the concept of a force histogram was used to compute
quantity depending on the entire two boundaries. Resulting histogram-based descriptors
were used in discrimination, while our focus is in estimation problems.
Our approach naturally extends the single object shape-distribution based shape prior
to jointly model multiple shapes. We share common idea with (Matsakis et al., 2004) in that
an effective way to capture the information about multiple objects is to compute multiple
descriptors on the entire set of boundaries. Our approach reflects the advantages of our
single object shape model. It is designed to preserve salient features of shapes, invariance,
and is implementable using curve evolution. Our approach to modeling multiple structures
naturally integrates with single object shape distribution shape prior.
115
6.2 Formulation
We now distinguish two types of feature functions. For the first type of feature functions,
which we term autonomous, the values for a particular curve are computed with refer-
ence only to the curve itself. Two feature functions of this type have been introduced in
Chapter 4: inter-point distances and multi-scale curvatures. For the second type of feature
functions, which we term directed, the feature function values are computed with reference
to the curves of other objects. By incorporating directed feature functions into our shape
models we provide a mechanism for modeling the relationships between different objects
in a scene and, thus, create a framework for multi-object segmentation.
We first briefly review the principles of shape distribution based shape modeling ap-
proach developed in Chapter 4. Let Φ(ω) be a continuously defined feature on the space
Ω, where ω is the element of the space Ω. Let λ be a variable spanning the range of values
Λ of the feature. Let H(λ) be the CDF of Φ:
H(λ) =
∫
Ω hΦ(ω) < λ
dω
∫
Ω dω(6.1)
where h(condition) is the indicator function, which is 1 when the “condition” is satisfied
and 0 otherwise. We define the prior energy Eshape(Γ) for the boundary curve Γ based on
this shape distribution as:
Eshape(Γ) =
M∑
i=1
wi
∫ [
H∗i (λ) − Hi(Γ, λ)
]2dλ (6.2)
where M is the number of different distributions (i.e. feature functions) being used to
represent the object, Hi(Γ, λ) is the distribution function of the ith feature function for the
curve Γ, and the non-negative scalar weights wi balance the relative contribution of the
different feature functions. Prior knowledge of object behavior is captured in the set of
target distributions H∗i (λ). These target distributions H∗
i can correspond to a single prior
shape, an average derived from a group of training shapes, or can be specified by the prior
knowledge.
116
We now extend our single object prior to model multiple curves as follows. Let N
be the predefined number of objects in the image. We define a directed feature function
Φqp(ω)|p 6= q, where q, p are object indices. This feature function is computed on object
q using certain information given by object p. As in the single object case, each feature
function gives rise to a CDF computed according to eq. 6.1. The multi-object prior energy
functional is defined as
Emultshape(Γ1, ..., ΓM ) =
N∑
q=1
N∑
p=1
Zqp
M∑
i=1
wi
∫
Λ
[
H∗iqp(λ) − Hiqp(Γq, λ)
]2dλ (6.3)
Where Zqp are the values of the interaction matrix Z. Element Zqp defines the confidence
in the prior information on the relationship between objects q and p, expressed by the
difference between directed distributions corresponding to the pair of structures. This
matrix is application specific. Generally, the objects that can be segmented well without
prior can be used to constrain those that cause problems in the absence of an additional
prior. It is worth noting that the matrix Z is not necessarily symmetric. Its diagonal values
are the weights on single object prior terms for individual objects. In this dissertation we
empirically choose the values of Z to achieve the best segmentation performance. Here we
also use the graphical interpretation of the interaction matrix. On the diagram representing
the segmented objects for a particular problem, each non-zero off-diagonal element of Z is
shown by a directed link as illustrated in Figure 6·1.
As in the case of single object prior, the choice of feature function defines the model
performance. We now introduce directed feature function #3. Feature function #3 will
represent relative inter-object distances. Let us have two contours C1 and C2. We now
assume that the feature function is defined on contour C1, using information on contour
C2. At each point s on the boundary of C1, we measure the distance between between
point s and the closest point on the curve C2, normalized by the average radius of C2 with
117
Z =
0 1 10 0 10 1 0
=⇒
1
2
3
Figure 6·1: Interaction matrix graphical interpretation using directed di-agram. Three objects are sketched in the right panel with assigned objectindices. Arrows in the right panel correspond to non-zero entries in thematrix Z.
respect to its center of mass. Formally
ΦC1C2(s) =sign[DC2(C1(s))] min d(C1(s), C2)
1LC2
∫
p∈C2d(C2(p), MC2)dp
(6.4)
where MC2 is the center of mass of the contour C2; LC2 is the length of the contour C2;
d(x1, x2) is the Euclidean distance between points x1 and x2; DC2 is the signed distance
transform corresponding to the curve C2. The indicator function sign[DC2(C1(s))] is equal
to 1 and −1 if C1(s) is outside and inside C2 respectively. The space Ω on which the feature
function is defined is the arc-length of the curve C1. Note that the prior defined using this
feature function provides a descriptor richer than those in the penalty based approaches in
(Ho and Shi, 2004; Paragios and Deriche, 2000), while being less restrictive than a PCA
based multi-object prior. Again, this feature is invariant to translation, rotation and scale
applied simultaneously to the pair of shapes C1 and C2. We illustrate the feature function
construction in Figure 6·2. For simplicity we discretize curve C1 using 6 nodes.
In practice, we use the signed distance transform to compute the values of ΦC1C2(s).
Given the signed distance transform DC2 corresponding to the curve C2, the feature func-
118
tion can be computed as
ΦC1C2(s) =DC2(C1(s))
1LC2
∫
p∈C2d(C2(p), MC2)dp
(6.5)
1
4
32
56
C2
C1D1 D2 D3
D4
Figure 6·2: Feature function #3 used in this work illustrated for a curveC1 discretized using 6 nodes. Feature values for curve C1 are defined as theshortest signed distances from the curve C2 to nodes of the curve C1.
The feature function #3 can be extended to 3D in a straightforward way. Suppose we
have surfaces S1 and S2. Let s and p denote points on the surfaces S1 and S2 respectively.
We will define the feature function on the surface S1. Feature value Φ(s) is defined as a
normalized value at s of the signe distance transform DS2 corresponding to the surface S2.
Φ(s) =DS2(s)
1AS2
∫
p∈S2d(S2(p), MS2)dp
(6.6)
where AS2 is the area of the surface S2 and MS2 is the center of mass of the surface S2.
6.2.1 Flow computation for inter-object distance feature function
For feature function #3 introduced above, relating the curve C1 to another object C2, the
gradient flow computed using variational approach (see Appendix A) is given by:
∇E(C1)(s) = ~n(s) · ~∇DC2(s)
[
H∗
(DC2(s)
R(C2)
)
− H
(
C1,DC2(s)
R(C2)
)]
(6.7)
119
where DC2(s) is the value of the signed distance function generated by the curve C2 at
the point s on the curve C1, and R(C2) is the mean radius of the shape C2 relative to its
center of mass. The details of the flow derivation are given in Appendix A.
For 3D version of the feature function #3, relating surfaces S1 and S2, the gradient
flow is given by
∇E(S1)(s) = ~n(s) · ~∇DS2(s)
[
H∗
(DS2(s)
R(S2)
)
− H
(
S1,DS2(s)
R(S2)
)]
(6.8)
where DS2(s) is value of the signed distance function generated by the surface S2 at the
point s on the surface S1, and R(S2) is the mean radius of the shape S2 relative to its
center of mass.
6.3 Experiments
In this section we show the results of applying our feature-distribution-based shape prior
to joint segmentation of multiple objects. In fact, in many problems, information about
the relative positions of the boundaries of different objects in the image provides crucial
constraints helping to achieve better segmentation. In our case, the information about
relative object positions is cast in a shape distribution framework, allowing for a unified
shape prior framework including both autonomous and directed features.
In this section we apply our shape distribution based prior to both a synthetic and a
real example. The real data example arises in segmentation of brain MRI. We compare
both single object and multi-object priors. The benefit of using a multi-object prior is
expected to be greater when the boundary of some objects is not well supported by the
observed image intensity gradient or when initialization is far from the true boundary.
Experiment 1
In the first experiment we apply our prior to a synthetic 2-object segmentation problem
with very low SNR, simulating two closely positioned organs. The ground truth image is
shown in Figure 6·3, panel (A). Both objects in the ground truth image have the same,
120
(A) (B)
(C) (D)
Figure 6·3: Synthetic 2 shape example: (A) Bi-level noise free image; (B)Segmentation with curve length prior; (C) - shape distribution prior includ-ing only autonomous feature functions #1 and #2; (D) - shape distributionprior including directed feature function #3 along with autonomous featurefunctions. Solid black line shows the true objects boundaries; dashed whitelines - initial boundary position; solid lines - final boundary.
121
23
41
5
6
7
8
lenticularnucleus
caudatenucleus
lenticularnucleus
caudatenucleus
(A) (B)
(C) (D)
Figure 6·4: Brain MRI segmentation: (a) Multiple structures and inter-actions used for feature function #3; (b) Segmentation with independentobject curve length prior. (c) Segmentation using multi-object PCA tech-nique in (Tsai et al., 2004) (d) Segmentation with new multi-object shapedistribution prior. Solid black line shows the true objects boundaries; solidwhite line - final segmentation boundary.
122
known intensity. The background intensity is also known as in the model in (Chan and
Vese, 2001). Gaussian IID noise (SNR= -18dB) was added to this bimodal image to form
the noisy observed image. The data term Eint and the corresponding data term gradient
curve flow are formed according the data model in (Chan and Vese, 2001).
In Figure 6·3 we show the results obtained by segmenting this image using energy
minimizing curve evolution based on two different shape priors. Figure 6·3 (B) shows the
results of independent curve evolution for both contours using the common curve length
penalty as the regularizing term in the energy functional. Figure 6·3 (C) shows the results
with our shape distribution-based prior but using only autonomous feature functions #1
and #2 (inter-point distances and multi-scale curvatures); Figure 6·3 (D) shows the results
with our multi-object shape distribution prior including all 3 feature functions. The prior
target distributions for case (C,D) were constructed using the true objects in (A). The
regularization parameter was chosen in each case to yield the best subjective result. In
case (D), all 4 elements of the interaction matrix Z were set to 1.
The curve length prior result in (B) yields an incorrect segmentation for one of the
objects or leads to the collapse of one of the contours depending on the strength of the
prior. The shape distribution-based prior in (C) performs well for the second object (the
bent shape). The first object (the ellipse) can not be effectively extracted because the stop-
ping force at the boundary between the objects was insufficient. With the directed feature
function included in the segmentation functional (D), both objects can be correctly seg-
mented since the energy term corresponding to the feature function #3 effectively prevents
intersection of boundaries. Area based segmentation errors are summarized in Table 6.1.
Experiment 2
In our second example we apply our techniques to 2D MRI brain data segmentation. A
data set consisting of 12 normal adult subjects was provided to us by Dr. David Kennedy
at the Center for Morphometric Analysis of Harvard Medical School and Massachussetts
General Hospital. Manual expert segmentations of the subjects were provided and those
123
Table 6.1: Symmetric difference (area based) segmentation error. For eachobject the error measure is computed as a symmetric difference betweenfinal segmented region and true segmented region. The values in the tableare computed as a sum of error measures for individual objects.
Curve length PCA method Our method
Experiment I 1092 146
Experiment II 1090 1437 758
of 11 of these subjects were used as training data to construct our shape prior. The
prior was then applied to segment the data of the omitted subject. The eight numbered
structures shown in Figure 6·4, panel (A) were simultaneously segmented. For the data
dependent energy term Eint, we used the information theoretic approach of (Kim et al.,
2002a) by maximizing the mutual information between image pixel intensities and region
labels (inside or outside), therefore favoring homogeneous regions.
In Figure 6·4 we present our results. Panel (B) gives the segmentation with a standard
curve length prior applied independently to each object. One can see that structures 1
and 4 are poorly segmented, due to their weak image boundaries. In panel (C) we present
the result given by the multi-shape PCA technique in (Tsai et al., 2004) using 5 principle
components defining the subspace of allowable shapes. The segmentation is sought as
the shape in this subspace, optimizing the same information-theoretic criteria (Kim et al.,
2002a) as used with our shape prior. The usage of the same data term simplifies the
comparison with our approach since only the shape model components of the method
are different. One can see that structures 2,5,6, and 7 are not segmented properly due
to the poor generalization by the PCA prior. Expanding the subspace by choosing 10
PCA components did not improve the result given by this method. Finally, our result
is shown in panel (D). We obtain satisfactory segmentation for the structures for which
PCA method failed (2,5,6,7), while performing equally well for structures 1,3,4 and 8. The
choice of initialization did not significantly influence our results. Segmentation errors given
in Table 6.1 qualitatively confirm the superior performance attained using our prior.
124
6.4 Conclusions.
In this chapter we extend a shape distribution-based object prior to jointly model multiple
image object boundaries simultaneously. We apply the variational approach to analytically
compute the energy minimizing curve flow for the directed feature function #3 relating a
pair of objects. We demonstrate the application of our shape distribution prior to medical
image segmentation involving multiple object boundaries. In our experiments we achieved
performance superior to that obtained using the traditional curve length minimization
methods and a multi-shape PCA shape prior reported in the literature.
125
Chapter 7
Shape and appearance modeling with feature
distributions for image segmentation
This chapter extends our shape distribution based shape modeling approach to coupled
shape and appearance modeling for use in image segmentation. The coupling, which is the
first contribution of our new framework, is expressed by a single shape distribution-based
term in the segmentation functional that accounts for both appearance and shape models.
Our new shape and appearance model has the ability to segment images with properties
posing difficulties for existing appearance models. Examples show the accurate placement
of boundaries in challenging situations; for example, in cases where an intensity boundary
does not exist. The work reported in this chapter has been presented in (Litvin et al.,
2006).
7.1 Introduction
In a typical curve evolution scheme the data-dependent forces and shape prior forces are
constructed from distinct principles, and lead to distinct terms in corresponding curve evo-
lution equations. In the previous chapters we concentrated on constructing prior shape
terms. Now we turn our attention to models capturing image appearance properties with
respect to the boundary. We review existing appearance modeling approaches in Sec-
tion 2.7. Such appearance models can be implicit, formulated as curve evolution forces,
driving the boundaries according to the underlying image properties. The simplest data
dependent force positions the boundary to maximize the image gradient magnitude at
boundary location. Alternatively, the popular Chan and Vese model (Chan and Vese,
126
2001) attempts to maximize the uniformity of intensities in the different regions created
by the boundary. (Kim et al., 2002a) attempts to preserve uniformity in regions using
information theory criterion. Other models attempt to maximize the uniformity of certain
higher order statistics of the intensities in each region.
Attempts have been made to develop more complex forces capturing prior appearance
information. For example, the probabilistic appearance model in (Leventon et al., 2000b)
links appearance to the boundary through the distance function, but this model drives
the boundary everywhere towards a most likely intensity and curvature through a MAP
formulation. While not strictly a curve evolution method, the Active Appearance Model
(AAM) in (Cootes et al., 2001) creates detailed intensity and shape models sensitive to
boundary position. However, the AAM approach enforces match between the image and
the template over the entire model domain, requires identification of reliable boundary
landmarks in each image (effectively doing registration), and is based on PCA of observed
pixel values and boundary control points, resulting in generalization sensitivity.
In contrast, our proposed approach to prior construction is based on the concept of
shape distributions (Chapter 4), which we use to encode both prior shape information as
well as appearance information. In this chapter we extend these shape modeling results
from Chapter 4 to include joint shape and intensity priors. Our approach is based on
finding a solution that matches given distributions of shape and intensity features rather
than driving the shape towards a given set of mean values or seeking maximal homogeneity
of intensity or shape characteristics. Since our model is based on the distribution of inten-
sity, edges are not even needed to define the boundary location, which is useful in certain
segmentation contexts. Through careful feature choice, our model can be inherently insen-
sitive to many geometric transformations (scaling, rotation, etc). Our model is richer then
existing models because existing models assume and enforce uniformity of certain statistic
in the region or along the boundary, while our model attempts to model and match free
form distributions of image intensity defined features along boundaries.
127
7.2 Shape distribution principles
The detailed derivation of the shape distribution principles can be found in Chapter 4.
Recall that the prior energy Eshape(Γ) for curve Γ based on shape distributions is defined
as:
Eshape(Γ) =M∑
i=1
wi
∫
Λ
[
H∗ shapei (λ) − Hshape
i (Γ, λ)]2
dλ (7.1)
where M is the number of different shape distributions (i.e. feature functions) being used
to represent the object, H shapei (Γ, λ) is the distribution function of the ith shape feature
function, and the non-negative scalar weights wi balance the relative contribution of the
different feature distributions. Prior knowledge of object shape behavior is captured in
the set of target distributions H∗ shapei (λ). These target distributions H∗ shape
i (λ) can
correspond to a single prior shape, an average derived from a group of training shapes,
or can be specified by prior knowledge (e.g. the analytic form for a primitive, such as a
square).
We used two specific shape feature functions in our experiments presented in Chapter 4:
• Feature function #1. Inter-point distances
• Feature function #2. Multi-scale curvatures
7.3 Extension to combined intensity and shape priors
Now we describe our extension to include prior appearance information. In our approach,
we compute the distributions of intensity-based features measured parallel to the boundary
of the shape. We then seek a solution whose distribution matches this prior distribution,
rather than seeking a solution whose intensities occur at maximum of the distribution.
This approach does not require uniformity of region intensity properties and appears to
have good generalization properties. To find a solution, the dissimilarity of observed and
prior intensity distributions is used to create a curve evolution force using a variational
approach.
128
To generate our appearance model, let us first define an orthogonal coordinate system
W. We will align this coordinate system with the tangent and normal to the boundary
and measure image intensity values in a rectangular patch defined by W, as illustrated in
Figure 7·1. Let O be the origin of this coordinate system W. Let xij ∈ W be a sample
i
j
j
i
Γ( )s
Γ( )s
O
O
Figure 7·1: Image patch based feature values measured along the bound-ary. Point O (patch coordinate system origin) is positioned at Γ(s) (currentboundary point). j-axis is aligned with local inward normal. Two instancesare shown.
point with coordinates i and j. We choose a set of such points S = x1, x2, ...xm. There is
no restriction on this set of points, but currently we make the set symmetric with respect to
the j-axis and include O in the set. Each point xk = (ik, jk) in S gives rise to an intensity
function corresponding to the trajectory of the point xk as the coordinate origin is moved
around the boundary. A typical trajectory is shown as the dotted line in Figure 7·1. Let
Φk(s) be an associated feature function that is computed from this intensity function for the
k-th point trajectory, where s is arc-length around the boundary. The simplest approach,
and what we have done to date, is to simply use the intensity values along the trajectory
themselves, so the k-th intensity feature function is given by:
Φk(s) = I(xk, s) = I(
Γ(s) + R (s,n(s)) [ik jk]T)
(7.2)
129
where I is the image, Γ is the boundary parameterized by arc-length s, n(s) is the local
normal and R(s,n(s)) is the 2D rotation matrix aligning n(s) with j-axis of the coordinate
system W. Similarly to our distribution-based shape model, we then generate the CDF
of each such intensity feature H intk (Γ, λ), following eq. 4.1. The collection of these M
distributions is the basis of our intensity appearance model.
We now define the intensity (data) energy Eintensity(Γ) as the difference measure be-
tween target intensity feature distributions and intensity feature distributions correspond-
ing to the curve Γ.
Eintensity(Γ) =∑
k:xk∈S
∫[H∗ int
k (λ) − H intk (Γ, λ)
]2dλ (7.3)
where H∗ intk are computed by averaging distributions corresponding to the training seg-
mented contours and underlying images. We combine this intensity or appearance model
with the shape model to define an overall intensity and shape prior curve energy:
E(Γ) = Eintensity(Γ) + Eshape(Γ) (7.4)
where Eshape is given in eq. 7.1 and includes only shape terms.
We aim at driving the curve Γ to minimize this energy by finding an appropriate curve
flow through variational principles. Using the above definition of the feature function, the
curve flow at location s minimizing eq. 7.4 is given by (see Appendix B for derivations).
∇E(C)(s) = ∇Eshape(C)(s) + (7.5)
∑
k:xk∈S
n(s) · ∇I(xk, s)[
H∗ intk (I(xk, s)) − H int
k (I(xk, s))]
where I(xk, s) is defined in eq. 7.2.
7.4 Results
Our approach is designed to model directly the intensity distributions around the boundary
of the object of interest. It makes no explicit assumption about the presence of edges and/or
130
image gradients nor does it enforce uniformity of region statistics. Therefore, we expect
our approach to show the greatest advantage over traditional approaches when modeling
objects without prominent edges and with similar region statistics inside and outside the
region of interest. These represent some of the most difficult segmentation cases. Region
and gradient-based segmentation methods will not work on such problems.
Figure 7·2: Example 1. Segmentation with shape/intensity distributionprior. True shape - black solid line; Initial contour - dashed line; Finalsegmentation contour - solid while line.
Experiment 1
In our first example we construct a synthetic such segmentation problem without a
gradient-based boundary and with matching first and second order intensity statistics in
the interior and exterior regions. In Figure 7·2 we show the observed image with the desired
true boundary indicated by a solid black line. By construction, the intensity gradient
is uniform across the image except for the two diagonals. Therefore, any segmentation
method that relies on gradient information to localize the boundary will have difficulty
on this image. In addition, the mean intensity and variance of the intensity inside and
outside of the object boundary are the same, causing problems for typical region based
131
methods. To apply our method we use an intensity appearance model constructed using
only 5 points for the set S. The coordinates (i,j) of the points in the set S are (-10,0),
(-5,0), (0,0), (5,0) and (10,0). These points are illustrated in Figure 7·3 for one position
of the coordinate system. Notice that only the central point (0,0) coincides with the
boundary. The distribution corresponding to the point (0,0), constructed on the image
and the intended boundary in Figure 7·2 contains a single impulse (the intensity along the
boundary is constant). Distributions corresponding to other four points are more complex
because the trajectories traced by these points do not follow the constant image intensity.
The shape term Eshape in eq. 7.4 was not used in this case to focus on the behavior of
intensity prior. The solid white line in Figure 7·2 shows the corresponding segmentation
result. The segmenting contour matches the true contour.
j
(0,0) (5,0) (10,0)(−10,0) (−5,0) i
Figure 7·3: Five points x(k) used to construct feature functions accordingto Eq. 7.2 in Experiment 1.
One may suggest that the distribution corresponding to the point (0,0) is seemingly
enough to yield the correct solution, since it describes the constant intensity on the bound-
ary. When guided by the single distribution difference term corresponding to the point
(0,0), boundary points in bright areas will move towards darker region, and vice-versa,
until they reach the true boundary intensity value. However, in the process of evolution,
some boundary points are positioned over the transition regions (diagonal edges), where
the average intensity matches the intensity on the intended boundary. These boundary
points will not move, preventing the convergence. This situation is corrected by other four
points used. Intuitively, more points increase the descriptive power of the model, which
helps to prevent potential ambiguities and local minima. Instead of matching the intensity
distribution along the trajectory, we are matching the intensity distributions in the image
132
patch centered on the boundary and aligned with the local normal. In the current example
we use 5 points in the patch.
This example shows the effectiveness of our prior on a segmentation problem where
intensity is constant along the intended boundary and the intended boundary is not sup-
ported by edges.
Experiment 2
In our second example we construct an even more difficult problem based on the same
image as in the previous example. In Figure 7·4 we show the observed image with the
desired true boundary indicated by a solid black line. This intended boundary now crosses
the image in such a way that the distribution of intensity along the boundary is uniform.
Still, the first and second order statistics in the interior and exterior regions are the same,
causing problems for typical region based methods. Since the intended boundary does
not correspond to image gradients, boundary gradient based methods are not applicable.
Distributions of intensity perpendicular to the boundary on this image are also uniform,
so the method in (Leventon et al., 2000b), trying to maximize the distribution of intensity
evaluated on the boundary, will not move the boundary. Our result is shown in Figure 7·4.
It is notable that the resulting shape is nearly square which is a sign that our prior on
intensity features also captures some shape information through the coupling between
shape and intensity. However, the location of the resulting shape is shifted significantly
with respect to the true location. We conclude that using only intensity features is not
sufficient to capture shape and the shape/image relationship in this case. Intuitively, there
is more ambiguity in the direction to move the boundary to match uniform distributions
comparing with the previous case, where the distributions are nearly impulses and imply
the unique direction to change each feature value and hence the unique direction to move
the boundary.
Experiment 3
We now consider the same example but combine the intensity prior with the shape
133
Figure 7·4: Example 2. Segmentation with shape/intensity distributionprior. True shape - black solid line; Initial contour - dashed line; Finalsegmentation contour - solid line.
prior term. For the shape prior term Eshape we use the shape distribution energy in eq. 7.1
using 2 feature functions: inter-point distances and multi-scale curvatures. The prior shape
distributions H∗ shapei (λ) were computed using the intended boundary. The result is shown
in Figure 7·5 (B). In Figure 7·5 (A) we combine only shape distribution prior (no intensity
defined distributions) with maximum mutual information data term (Kim et al., 2002a).
The solid white line in Figure 7·5 shows that the combination of our shape and intensity
prior (B) recovers the boundary quite well, while the alternative data term combined with
the shape prior produces the correct shape but at the wrong location and orientation
(A). This example stresses the effectiveness of our new intensity prior on such challenging
edgeless segmentation problem and also emphasizes the advantage of combining our shape
and intensity priors.
Experiment 4. Real data.
In our next example, we segment the lenticular nucleus structure in an axial slice of
a brain MRI. A data set consisting of 12 normal adult subjects was provided to us by
134
(A) (B)
Figure 7·5: Example 3. (A) - Segmentation with shape distribution priorand maximum mutual information data term (Kim et al., 2002a); (B) -Segmentation with shape/intensity distribution prior and shape distributionprior. True shape - solid black line; Initial contour - dashed line; Finalsegmentation contour - solid white line.
Dr. David Kennedy at the Center for Morphometric Analysis of Harvard Medical School
and Massachussetts General Hospital. Segmentations of the lenticular nucleus structure
by experts often include sub-regions with different average intensity and typically does
not follow the strongest perceivable edge everywhere. For example, in Figure 7·6 we show
a lenticular nucleus from the data set that has been segmented by an expert. Inside
the object, one can distinguish areas with significantly different intensity and the expert
segmented boundary is not aligned with the strongest gradient everywhere.
We desire a segmentation that is as close as possible to the expert segmentation. For
our approach, we construct both our prior appearance and shape models based on 11
training images containing shapes segmented by experts. The intensity model is again
based on the 5 sample points (-10,0), (-5,0), (0,0), (5,0) and (10,0) shown in Figure 7·3.
The prior shape model in eq. 7.1 is again based on the two feature functions (inter-point
distances and multi-scale curvatures) introduced in Chapter 4. Both the prior shape and
intensity distributions H∗ shapei (λ) and H∗ int
i (λ) are obtained by averaging the distributions
135
Lenticularnucleus
Figure 7·6: Expert segmentation of the left lenticular nucleus showing vari-ation in intensity within the structure and lack of a consistent gradient alongthe boundary.
of the 11 samples in the training set. To test the model, the segmentation is performed
on an image that is not included in the training set. In Figure 7·7 (A), we show the
segmentation result. For comparison, in Figure 7·7 (B) we show the segmentation result
obtained using an intensity term which maximizes the mutual information between the
intensities and segmentation labels (see (Kim et al., 2002a)) and our shape distribution-
based shape prior given by eq. 7.1. Our shape and intensity model results in a very close
match between the segmented boundary and the expert drawn boundary except at the
lower tip of the structure. The result using the alternative data term segments only the
darker part of the structure, resulting in the large overall mismatch, as expected. This
comparison demonstrates that in this medical example, homogeneity of region properties
is not a good metric for segmentation, while our intensity and shape prior provides a more
general and effective tool.
Experiment 5
We present yet another example of the image segmentation problem that can be ap-
136
(A) (B)
Figure 7·7: Example 4. (a) Segmentation with shape/intensity distribu-tion prior. (b) Segmentation with only shape prior and intensity model in(Kim et al., 2002a). True shape - black solid line; Initial contour - dashedline; Final segmentation contour - solid line.
proached using our shape and appearance distribution prior. LADAR range images are
produced by raster scanning the ground target area from an airborne platform. At the
first step, a typical automatic target recognition system (ATR) performs preprocessing to
suppress range anomalies. At the second step, the image segmentation algorithm is applied
to the image to locate target boundaries. At the final step, the segmented boundaries are
used for classification and recognition. The overview of the LADAR system is given in
(Greer et al., 1997) and references therein.
A typical LADAR range image of a tank is shown in Figure 7·8. The range values are
coded by color with darker shade of gray corresponding to closer range. Two challenges in
segmenting such an image are apparent. First, the background intensity changes continu-
ously across the image, undermining the intensity threshold strategies. In fact, no single
or multiple threshold can be found to separate the target from the background. Second,
the edge is only present around the upper part of the target. The lower part of the target
blends with the background. This makes edge based segmentation methods inapplicable
137
Figure 7·8: LADAR image of a tank.
to this problem.
A working approach to segment this range imagery is to model the background intensity
parametrically, for example using spline models. The target intensity can be modeled
using another spline model. The parameters of the background and target models can
be estimated jointly with the boundary, using EM algorithm. One drawback of such an
unsupervised approach is the high uncertainty of the resulting boundary at the lower edge
of the target, where the background and the target intensities match. Such an uncertainty
can be detrimental for the target recognition algorithm.
The idea in applying our shape and appearance prior is that the distribution of in-
tensities around the boundary can capture the existence of the discontinuity and blending
regions without parametric modeling of the background and target intensities. The down-
side is that our approach needs training.
Due to the lack of real data, we test our algorithm on semi-synthetic images. First, we
construct the synthetic background image with linearly varying range/intensity as in the
image in Figure 7·8. We generate 5 tank poses (using real tank image), and superimpose
them on the background, while preserving the intensity blending along the bottom of the
target. Finally, we add Gaussian IID noise. We obtain 5 images, similar to the real image
in Figure 7·8. We further hand segment each of these synthetic images. Four images and
corresponding segmentations are then used to construct our shape and intensity feature
distribution prior, and the fifth image is used to apply our segmentation framework. Our
138
results are shown in Figure 7·9, panel (A). The result matches the true segmentation quite
well. In Figure 7·9, panel (B), we show the segmentation obtained using only shape prior
term in eq. 7.1. Shape feature functions used were the same as used in case A. For the
data term we used double threshold curve force given by
Fdata(s) = |I(s) − m| − δ (7.6)
where I(s) is the image intensity at the point s on the boundary, m is the average target
intensity and δ is half of the range of the target intensity, determined from the data. The
curve evolves outwards if the intensity on the boundary I(s) is in the range [m− δ, m + δ]
and inwards if I(s) is outside of this range. Resulting flow tends to enclose the region with
intensities matching the intensity of the target. Force Fdata is used instead of −∇Edata
in curve evolution. Obviously, this term leads to the contour leakage in areas where the
intensity of the target matches the intensity of the background. The shape prior competes
with this effect. The resulting contour approximately preserves the elongated target shape,
biased by the expansion force in the right hand side of the target. Our comparison in
case B represents the best ad-hoc region-based curve force constructed without parametric
modeling of the intensities. Our result in case A shows that the real imagery with partially
edgeless boundaries and complex backgrounds can be effectively segmented using our shape
and intensity priors.
7.5 Multivariate distributions extension
So far, we considered 1D distributions of feature functions extracted from shapes. These
distributions were used in independent terms in the prior energy formulation. Since our
feature functions were defined on different spaces Ω (such as combinations of 2 points on
the boundary or combinations of 3 points on the boundary), we were limited to consid-
ering only 1D distributions. In case of intensity dependent feature functions proposed in
this chapter, the situation is different. In fact, the values of intensity obtained as the
coordinate system moves along the contour are the random processes in the same space
139
(A) (B)
Figure 7·9: Semi-synthetic tank image segmentation with intensity andshape prior: (A) - intensity and shape prior; (B) - shape prior and thresholdintensity term in eq. 7.6. True shape - solid black line; Initial contour -dashed line; Final segmentation contour - solid white line.
Ω, where Ω is the arc-length along the curve. Therefore, by considering these processes
as independent we made an unjustified assumption. Moreover, these processes are cer-
tainly correlated in general case. Therefore, we need to design the prior that is capturing
these dependencies. To this end we propose to construct m-dimensional CDFs, where each
dimension corresponds to one feature function. We now may impose the prior as the differ-
ence between m-dimensional CDF computed on prior shapes/images and m-dimensional
CDF corresponding to the evolving contour. Formally, the mD version of the energy is
E =
∫ ∫
...
∫
λ∈Λ
[H∗ int
mD (λ) − H intmD(Γ, λ)
]2dλ (7.7)
where λ = λ1, λ2, ...λm.
In Appendix C we derive the curve flow corresponding to the intensity based feature
functions. We have implemented the 3 feature function version of this flow for the ex-
periments conducted in this chapter but did not see any significant difference in results
for these cases. However, for more feature functions and/or different feature function
140
definition, more exact full mD definition of the energy functional presented here can be
advantageous. For example, shape feature function defined on the boundary (such as lo-
cal curvature) can be combined with intensity feature functions defined on the boundary
to efficiently encode such prior knowledge as “the corners must be darker then straight
boundary segments”.
7.6 Summary
We develop a novel joint shape and appearance modeling framework and present first re-
sults. The method is based on the concept of shape distributions and allows the capture of
salient shape and appearance properties in images. The method can work where current
popular approaches, based on maximizing uniformity of region properties or seeking max-
imal gradient magnitude, do not. Many challenging segmentation problems possess such
properties.
The next possible step is to combine our shape and appearance modeling framework
with multi-object shape modeling framework considered in Chapter 6.
141
Chapter 8
Shape-Based Classification and Morphological
Analysis using Medial Axes and Feature Selection
In previous chapters we considered the curve evolution object-oriented boundary extraction
tasks. We now switch to boundary inference tasks. We assume that image segmentation
has been carried out and we have segmented shapes available for further analysis. We are
interested in existence of significant morphological differences in Corpus Callosum (CC)
shapes due to gender and Schizophrenia disease. We also propose an intuitive way to
investigate and visualize significant morphological differences detected. Another particular
interest is to develop, train and test classifiers on brain Corpus Callosum (CC) shapes.
8.1 Introduction
Corpus Callosum is one of the most studied human brain structure. There is interest
in understanding population-based differences as well as performing individual diagnosis.
Area or volume have been widely used for such tasks in the past, but only to capture simple
anatomical differences. Our belief is that with more detailed shape information available,
significantly more can be accomplished. Corpus Callosum shapes were previously studied in
(Bookstein, 1997) using Procrustes analysis on a landmark representation. Some inter-class
separation was reported between Normal and Schizophrenia subjects’ shapes. A support
Vector Machine approach was used for CC classification of Normal and Schizophrenia
subjects in (Galand et al., 1999). Ad-hoc features extracted from the CC shape were
used for dyslexia detection in (Duta, 2000). Elastic deformation transformation was used
to describe CC shapes in (Davatzikos et al., 1996). Male versus female differences in
142
shape were reported. Analysis and visualization of morphological differences have been
approached in (Martin et al., 1998).
Our approach uses rich, skeleton-based representation of CC shape similar to ones used
in (Galand et al., 1999), while our skeleton extraction scheme is different and is described
in detail in the following sections. Such a representation contains considerably more infor-
mation than just shape area. Since our major task of interest is to understand differences
between CC shapes in different groups, we need to focus this information by reducing
the dimensionality of the feature space. Often used PCA analysis for feature reduction is
not designed to find optimally discriminative features but focuses on the directions of the
largest variation. To this end, we present statistically-based methods of measuring the class
separating properties of each feature. We use these measures to develop a new approach for
identifying and visualizing inter-class shape differences, which highlight areas of significant
shape variation between classes. Our goal is to understand how and why different groups
of shapes are different, which can drive further research (eg. a bump in a region might be
tied to a particular disease). We demonstrate our methods by analyzing shape differences
in the corpus callosum of 68 young adults due to gender and schizophrenia.
We now consider our second task of interest, automatic classification of CC shapes.
Typical approaches to shape classification can be divided into 2 groups.
• Clustering based on definition of shape distances, see (Klassen et al., 2004).
• Classification based on features encoding shape or extracted from shapes, see (Galand
et al., 1999; Galand et al., 2000; Bookstein, 1998; Yushkevich et al., 2003).
In this dissertation we use the second type of approach to classify shapes. We choose to use
reduced feature sets to construct classifiers, where we use feature set reduction methods
similar to the methods proposed to approach the first task of interest (morphological differ-
ences detection). We construct both traditional linear classifiers (Duda et al., 2001) as well
as classifiers based on ada-boosting (Duda et al., 2001; Freund and Schapire, 1999). We
compare our results in classification performance and morphological differences detection
143
with previously published results.
In the following sections we first describe used medial-axis-based methods of feature
extraction, followed by the description of our feature selection approaches. Next, we present
tools to potentially aid in understanding morphological differences and their localization.
Finally, we present and analyse our shape classification results.
8.2 Data and skeleton-based feature extraction
Our data set includes 68 manually segmented Corpus Callosum boundaries in a single
sagittal MRI slice of healthy young male and female subjects and 35 segmented Corpus
Callosum boundaries of schizophrenia patients. This data set was provided to us by Dr.
David Kennedy at the Center for Morphometric Analysis (CMA) of Harvard Medical School
and Massachussetts General Hospital. Resolution of each image is about 700 by 300 pixels.
All images were acquired and segmented by CMA research group at Massachusetts General
Hospital.
We desire a shape representation that is naturally related to neuro-anatomical brain
morphology. We have focussed on shape skeletons as a rich way of characterizing such
shape features (Tari and Shah, 2000; Galand et al., 1999). If a shape is viewed as a
collection of connected ribbons in the form of protrusions and narrow necks, then the
medial axes of these ribbons constitute branches of the shape skeleton. Skeleton-based
shape representations are attractive for a number of reasons. Skeletons provide a natural
representation of the shape in terms of a series of components or parts. In addition, changes
in parts of a shape can be reflected in corresponding changes to parts of the skeleton.
This property of the representation allows focal shape differences to be parsimoniously
captured, identified, and focussed upon. Finally, the representation can easily be made
complete, providing a one-to-one mapping between a given shape and its features. The
main challenge with the use of shape skeletons is the determination of the skeleton of a
shape which is robust with respect to noise and natural variation.
We used two methods to extract the skeleton representation of the shape, namely, fixed
144
topology skeleton (Galand et al., 1999) and nested local symmetry set method (Tari and
Shah, 2000). Below we provide details on both methods.
8.2.1 Fixed topology skeleton
The fixed topology skeleton extraction method, proposed in (Galand et al., 1999), fixes
the branching structure of the skeleton graph and finds the optimal skeleton given this
fixed structure. We use a single branch skeleton structure, which adequately describes the
elongated shape of the Corpus Callosum.
The method starts from the signed distance function D representation of the shape
using the fact that the skeleton is the set of ridge points of the distance map. D is
computed using a fast marching method. Let us assume that the end points of the skeleton
are fixed and the skeleton is initialized by a straight line connecting the end points. We
uniformly sample the skeleton, obtaining the set of points X = xn|n ∈ [1..N ]. We now
employ a curve evolution approach with explicit curve parameterization and fixed boundary
conditions. We evolve the curve using the following update rule for all but the first and
the last points in the set:
xit+1 = xi
t + σ∇D(xit) − ki
tni
t i ∈ [2..(N − 1)] (8.1)
where xit are points at time t, ∇D is the gradient of the distance function, σ is the dis-
tance function smoothing operator (providing stabilizing effect in the neighborhood of the
solution), kit is the curvature of the skeleton at point xi
t, and ni
tis the normal direction at
point xit. The first term in the update equation 8.1 moves the curve towards the ridge of
the signed distance function, while the second term regularizes the evolution. The curve is
resampled every few iterations to preserve the uniform sampling. The update stops when
the curve starts oscillating around a stationary point.
In order to find good skeleton end points, an additional higher level of optimization is
carried out. A criterion of skeleton goodness is devised in the following way. For a given
skeleton, for each point in the set xt we find the inscribed circle of the maximum radius,
145
X
X
1
N
Xk
Figure 8·1: CC shape sketch is shown along with a skeleton found usingtrial end points. Maximum radius circle centered at xk is shown.
which is centered on the given point and is entirely contained within the interior of the
shape (see Figure 8·1). For the point xk the circle radius is given by
Rk = max
argmin
r
∑
k∈[1..N ],||x−xk||<r
h(D(x))
(8.2)
where h(x) is the indicator function which is 1 when x > 0 and 0 otherwise. The badness
measure of the skeleton is then defined as the area of the shape that is not contained in
any circle. This measure is given by
E = A − Uk∈[1..N ]Cir(Rk) (8.3)
where A is the interior of the shape and Cir(Rk) is the inscribed circle centered at the node
xk of the skeleton. Skeleton extraction is repeated for different combinations of end points
on the boundary of the shape. The measure in eq. 8.3 is evaluated for each combination and
the skeleton with the smallest E is stored as the final result. The fixed topology extraction
algorithm was applied to our dataset and yielded a satisfactory solution in most cases. An
146
example of the extracted skeleton is shown in Figure 8·2. In Figure 8·3, we show extracted
skeletons from male subjects’ segmented Corpus Callosum shapes.
Figure 8·2: Extracted fixed topology skeleton. Circles represent the sam-pled discrete points on the skeleton. Outside border is the segmented Cor-pus Callosum shape. Dark regions near the border comprise the skeletonbadness measure in eq. 8.3
8.2.2 Nested local symmetry sets method
Robust and stable skeleton extraction is notoriously difficult. This led to less interest in
skeleton after initial promise. (Tari and Shah, 2000) developed a way to produce robust
skeleton that is stable in face of noise and minor boundary variation. The method first
computes candidate skeleton branches followed by reconnection and pruning processes.
The algorithm then interpolates a smooth function inside the shape boundary by solving
147
an elliptic PDE. This smoothing process controls the effects of noise and minor variation.
The ridges and the valleys of the graph of this smooth function determine the axes of local
symmetry of the shape. The ribbon structure of the shape is determined by identifying the
medial axes of the ribbons among the axes of local symmetry. An important consequence
of the smooth interpolation process is that the resulting skeleton is not connected, since
the medial axes of protrusions do not meet the main axis of the shape, as they do when the
skeleton is obtained directly as ridges of the distance transform. As a result, the task of
pruning nonessential branches, becomes considerably easier. Adaptive, top-down pruning
of the extracted skeleton to eliminate false branches and robustly focus on desired structure
is then carried out. More details of the technique used are given in (Tari and Shah, 2000).
In Figures 8·3 and 8·4 we compare extracted skeletons from male subjects’ segmented
corpus callosum shapes using fixed topology and nested symmetry sets methods respec-
tively. One can conclude that the two methods provide very close solutions for the skeleton.
We performed classification experiments using skeleton features extracted using both al-
gorithms and obtained close performance results. In the following sections we present the
classification and inter-class difference visualization results using the second, more theo-
retically sound method of extracting the skeleton of the shape.
8.2.3 Feature extraction
The fixed topology skeleton extraction approach gives one branch skeleton which is directly
used to extract features. In order to extract features using the second, nested symmetry sets
skeleton extraction approach, we only use the dominant branch. Once we have a single
skeleton branch obtained by either of the 2 skeleton extraction techniques, the skeleton
is uniformly sampled using 20 nodes (including end nodes) xn|n ∈ [1..20]. A set of 37
shape features X is obtained from 18 samples of the relative skeleton angle change θi/L, the
relative shape width Rk/L, and the overall skeleton length L, as illustrated in Figure 8·5.
The shape width features are obtained as radii of maximum inscribed circles. The relative
148
Male
Figure 8·3: Skeletons obtained from male subjects using the fixed topologymethod.
149
Male
Figure 8·4: Skeletons obtained from male subjects using the nested lo-cal symmetry sets method. Principle and secondary skeleton branches areshown.
150
θri i
Figure 8·5: Features extracted from the medial axis
angle change features are computed as
Xk∈[1..18] = angle(
xk+1 − xk,xk − xk−1
)
(8.4)
In order to use these parameters in statistical tests, we perform 2 successive normal-
ization steps for each feature i
1. Average value over the whole dataset of M subjects is subtracted from each feature
value
(xij)
′ = xij −
1
M
M∑
k=1
xik (8.5)
where xij is the original feature value i for subject j, and (xi
j)′ is mean subtracted
feature value.
2. Each feature value is scaled with the inverse of the standard deviation computed on
the whole dataset set for that feature
(xij)
′′ =(xi
j)′
√
1M
M∑
k=1
[(xik)
′]2
(8.6)
where (xij)
′′ is the final mean subtracted and normalized feature value.
These 2 steps insure equal significance of the linear classifier coefficients when used for
feature selection and shape difference localization experiments.
151
8.3 Inter-class shape differences. Detection and visualization.
Our first task of interest is to pinpoint significant inter-class shape differences. To this end,
we developed a method to highlight shape features showing significant differences between
classes. We first choose the criterion of individual feature inter-class variability (feature
ranking score). By plotting the mean values of the features tagged by this variability
measure we can quickly and intuitively display and identify areas of difference between
classes. The geometrical nature of the skeleton-based features makes it straightforward to
translate feature domain differences directly into intuitive morphological class differences.
We experiment with several ways of ranking the feature differences. The first approach
uses the inverse of the t-test based p-value. The second approach uses average MMSE
classifier weights. We now describe these methods in greater detail.
We use three methods to compute the feature ranking score:
1. The score is a p-value computed on feature values for 2 classes. p-value is a probability
that the observed data occurred by chance given that the observations actually come
from the distributions with equal means. A small p-value indicates a high chance
that the probability distributions of the feature observations are different between
the two classes.
2. We repeatedly split the data into training and testing subsets and train the linear
MMSE classifier on the training subset using all 39 features. The average absolute
classifier weight is then computed as am = 1N
N∑
i=0|am
i |, where N is the number of
splits, ami is the weight on the m-th feature in decision boundary computed for the
split i. Resulting a is used as the ranking score. Higher values mean stronger feature
discrimination property.
3. We first fix the number N of features to be a small number, presumably a number
maximizing the classifier performance. We repeatedly split the data into training
and testing subsets and for each split we perform the feature selection procedure
based on p-value (first method). We then compute how many times each feature was
152
selected over performed multiple data splits. The normalized count for each feature
represents the relative probability of a feature being selected for the given number
N of features and is used as a ranking score. Similar method had been reported in
(Yushkevich et al., 2003).
An example of our score-based image-domain representations of inter-class shape differences
is shown in Figure 8·6 (male/female classes, p-value based ranking score). In the top panel
we show the average shapes for both classes. The skeleton for each category was computed
by averaging the angle features in each category and the shape widths (represented by
bars orthogonal to the skeleton) were computed by averaging the width features in each
category. In the bottom panel we show the average shape for both classes combined. Each
circle represents a node of the skeleton. The size and color maps on the right hand side
relate the ranking score values to the size and color of the bars and circles. The darkness
and size of each circle or bar on the average shape indicates the relative importance of that
angle or width feature, respectively.
In Figure 8·6 we show our image-domain representation for the male and female shape
classes using p-value based ranking score. In Figure 8·7 we show the same analysis for
the normal-schizophrenia discrimination problem. In Figure 8·8 we show the inter-class
differences representation obtained using MMSE ranking score for male-female and normal-
schizophrenia discrimination problems. In Figure 8·9 we show the inter-class differences
representation obtained using the third ranking score method for normal-schizophrenia
discrimination problem. The number N of selected features was set to 6, since for this
number of feature we obtain the best classifier performance (see next section).
One can notice that feature importance grading using the different methods provides
only partially consistent results. Let us first compare the first (p-values score) and second
(MMSE average weight score) methods. Fewer features are identified using the second
ranking score method. Consider the male/female classification problem. From Figure 8·6
(left panel), one can observe the significant difference in upper hook curvature of the mean
shapes. This difference is reflected in the low p-values corresponding to angle features
153
−100 0 100 200 300 400
0
100
200
300
400
500
Average medial axis
malefemale
−100 0 100 200 300 400 500 600
0
100
200
300
400
500
600
700
0
0.1
0.2
0.30.4
0.6
0.8
1
0.01
size − color
p−value
p(ob
serv
e by
cha
nce
| mea
ns a
re e
qual
)
Figure 8·6: Male and female Corpus-Callosum differences and importanceof individual features using p-value based feature ranking
154
−100 0 100 200 300 400
0
100
200
300
400
500
Average medial axis
normal controlschizophrenia
−100 0 100 200 300 400 500 600
0
100
200
300
400
500
600
700
0
0.1
0.2
0.30.4
0.6
0.8
1
0.01
size − color
p−value
p(ob
serv
e by
cha
nce
| mea
ns a
re e
qual
)
Figure 8·7: Normal and Schizophrenia Corpus-Callosum differences andimportance of individual features using p-value based feature ranking
155
−100 0 100 200 300 400 500 600
0
100
200
300
400
500
600
700
0
0.2
0.4
0.6
0.8
1
size − color
Classifier Weight
−100 0 100 200 300 400 500 600
0
100
200
300
400
500
600
700
0
0.2
0.4
0.6
0.8
1
size − color
Classifier Weight
Figure 8·8: Feature importance visualization using linear classifier weightas the feature ranking score. Top: Male/female case; Bottom: nor-mal/schizophrenia case.
156
0 200 400 600
0
100
200
300
400
500
600
700
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
size − color
selection freq.
P(n
)Figure 8·9: Feature importance visualization using feature selection fre-quency as the feature ranking score. Normal/schizophrenia case.
(right panel). On the other hand, the classifier weight method in Figure 8·8 does not seem
to show the importance of these features. Similar observations can be made regarding
the normal/schizophrenia classification case. The exact reason for such discrepancies is
unclear, but from our results it appears that the p-value based feature ranking score is
more consistent with observed average shape differences in Figures 8·6, 8·7. p-value based
score only quantifies the means difference. If the variances are different in populations for
both classes, the p-value computed here can not be directly interpreted.
Not surprisingly, the third feature ranking method provides consistent feature impor-
tance grading with that given by p-values directly (first method). The value of this former
method is in the fact that we use near optimal number of features selected from the pool
of features, rather then grading all features.
Results in gender related differences in CC shapes reported in (Davatzikos et al., 1996)
only partially matched our results. Our results and results in (Davatzikos et al., 1996)
show the difference in isthmus and anterior corpus callosum but differences in splenium
are not reflected in our results. Much smaller dataset was used in (Davatzikos et al., 1996)
157
which can be a reason for the difference. We observe differences between our results and
those reported in (Galand et al., 1999) on Normal versus Schizophrenia classification. Use
of different data set might be one reason explaining these differences. It also can be argued
that feature importance visualization schemes reported here and in (Galand et al., 1999)
all use ad-hoc assumptions on what is the criteria of local shape differences.
8.4 Classification
We now present our results on classification of Corpus Callosum shapes. We examined both
the problem of distinguishing between males and female subjects as well as the problem
of distinguishing between normal and schizophrenia subjects. Since it is believed that no
significant difference in areas exist between male and female corpus callosum (Davatzikos
et al., 1996), our resulting shape based differences between these groups are especially
valuable.
We studied two types of classifiers. The first was a linear MMSE classifier, which
thresholds a linear combination of the feature values (see Section 2.9). The second classifier
was the learning-based technique ada-boost (Freund and Schapire, 1999), which creates a
classifier as a non-linear combination of a weighted sequence of “weak” classifiers. We
overview the classifiers used here in Section 2.9.
To design and evaluate both classifiers we used the cross-validation technique (Duda
et al., 2001) and repeatedly and randomly split the data (both categories combined) into
testing and training sets. We performed 300 such data splits, each time reserving 10% of
the data for the testing subset. Reported error values are the averages of these experiments
and for the results presented here do not exceed 0.02.
In order to build classifiers of shapes, we also need to intelligently reduce the dimension-
ality of the feature space. Often this is done using a technique such as principal component
analysis (PCA). However, PCA focuses on the commonalities of a group and not the dif-
ferences between groups, which are important for discrimination. We examined two direct,
statistically-based, approaches for choosing a reduced-order feature set yielding good class
158
separation. We assume working on the training subset of the data in a particular cross
validation data split (as described later).
The first approach is based on the p-value associated with an individual-feature based
t-test of class difference. Assuming conditional Gaussianity and equal variances for the
observations corresponding to each class, the p-value gives the probability that the observed
data occurred by chance given that the observations actually come from the distributions
with equal means (identical probability distributions). A small p-value is the evidence
that the probability distributions of the feature observations are different between the two
classes. For a given number of features retained, we therefore select those features which
have the smallest single feature p-values. The inverse p-value is thus used as a measure of
feature importance.
The second approach is based on the average weight given to a normalized feature in
an optimal linear classifier. We start from the full set of 39 mean-and-variance-normalized
(as described in section 8.2.3) features. We then progressively eliminate one feature at a
time using the following scheme:
1. We repeatedly (N times) split the training data into sub-training and sub-testing
sets and for each split we compute a linear minimum mean-square error classifier
(MMSE), see Section 2.9. For a given sub-split i we obtain a vector of classifier
coefficients ai. Let the element aji correspond to the feature j.
2. We find the index m+ of the feature with the lowest average absolute coefficient
according to
m+ = argminm
N∑
i=0
|ami | (8.7)
where N is the total number of sub-splits.
3. We eliminate the feature m+ from the set of features. Retained features are used for
classifier training and testing for the current length of the feature set.
4. The current set of features is passed onto step 1 to eliminate the next feature. The
159
process is repeated until the the set includes only one retained feature.
As the result, we have a sequence of retained feature sets of decreasing length.
We now consider the task of 2 category classification. We examine the performance of
different combinations of classifier and feature selection method. We consider two binary
classification cases: male/female and normal/schizophrenia. For each of these two cases
we use two feature selection techniques, as described above. For each feature selection
scheme and a given number of selected features we train 2 different classifiers: linear
MMSE classifier and Ada-boost classifier with misclassification penalizing criterion function
based classifier as the “weak” classifier. We report result after six Ada-boost iterations.
Therefore, for a given number of selected features we use 4 combinations of feature selection
method and classifier. In order to estimate the generalization performance we use cross-
validation technique by repeatedly randomly splitting the data into the training and testing
subsets. Testing subsets contains 10% of the data. We obtain the final testing error as an
average testing error over different cross-validation data splits. Our classification results are
summarized in Figure 8·11 for male/female classification (left) and normal/schizophrenia
classification (right). For the number of features larger then 10, the classifier testing errors
quickly approach 0.5 and are not shown here.
We would like to comment on the number of iterations chosen for Ada-boost algorithm.
We consider the male/female classification task using t-value feature selection, and run
the Ada-boost algorithm while evaluating the performance after each iteration. This is
done for different number of selected features. Resulting testing errors are visualized in
Figure 8·10. While the variation with the number of iterations is not perfectly consistent
for different number of features, one will agree that for six iteration the testing error is
nearly minimized for 10 or less features. Therefore, we chose to perform 6 Ada-boost
iterations in the experiment reported here.
For male/female classification, the linear classifier testing error obtained when using
only area is 0.45, which is also shown in Figure 8·11. We obtain approximately a 25%
improvement over only-area classifier by using well-chosen shape features. All the skeleton-
160
feature based methods perform comparably and, perhaps surprisingly, in all cases the
optimal number of features to use is relatively low – between 3 and 6.
Few important observations are worth noting. First, let us compare the performances
of MMSE classifier and Ada-boost based classifier for the same feature selection method.
The MMSE classifier outperforms Ada-boost classifier for small number of features (<
5). We can explain it by the probable simplicity (unimodality) of the conditional feature
distributions for few best features. If the class conditional distributions are unimodal, linear
discriminant function can yield optimal classifier. For larger number of features (> 5), we
observe better performance given by Ada-boost classifier. We hypothesize that as the
number of features increases, MMSE classifier quickly loses its generalizability due to poor
distribution separation. In such a case, few Ada-boost iterations help to adjust decision
boundaries to refocus the discriminant function. Summarizing these results, Ada-boost
algorithm can provide slightly better performance for larger number of features. However,
the best overall performance can still be achieved using linear MMSE classifier and at the
much lower computational cost.
1 2 3 4 5 6 7 8 9 10
5
10
15
20
25
30
350.36
0.38
0.4
0.42
0.44
0.46
0.48
Figure 8·10: Male/female classification testing errors using t-test featureselection method. Testing error is shown coded by color as a function of thenumber of Ada-boost iteration (horizontal axis) and the number of featureschosen (vertical axis).
161
0 2 4 6 8 100.3
0.35
0.4
0.45
0.5
# of features
test
ing
erro
r
Gender classification
0 2 4 6 8 100.3
0.35
0.4
0.45
0.5
# of features
Schizophrenia/normal classification
T−test; linearT−test; Ada BoostingWeights; LinearWeights; Ada BoostingArea only
Figure 8·11: Classification testing error for gender and schizophrenia ver-sus normal is shown for different combinations of feature selection technique,classification method, and number of features retained. “T-test; linear” - T-test method of feature selection, MMSE classifier; “T-test; Ada Boosting”- T-test method of feature selection, Ada Boosting classifier; “Weights; lin-ear” - linear weights feature selection method, MMSE classifier; “Weights;Ada Boosting” - linear weights feature selection method, Ada Boosting clas-sifier.
162
Comparing both classification cases shows the connection with feature importance visu-
alization results presented previously. Let us first consider male/female classification case.
Only 2-3 angle features are most distinguishing in Figure 8·6. Results in Figure 8·11 show
the best classification performance achieved using 4 selected features. Adding more fea-
tures sharply increases the error. Considering normal/schizophrenia classification, a larger
number of width features are seen in Figure 8·7 to exhibit statistical differences. Corre-
spondingly, we achieve the best classification with 5-6 selected features. Moreover, the
testing error as a function of the number of features increases at the lower rate comparing
to the male/female classification case. Therefore, our classification results also suggest that
for the male/female classification case, the inter-class shape difference are more localized
comparing to the normal/schizophrenia case.
Comparison with (Galand et al., 1999) gives more insight into our result. In (Galand
et al., 1999) the lowest testing error was reported when 20 “features” where used. Here
“features” refers to a number of points at which the medial axis was sampled. However, it
was not stated that this representation is still over-complete and many of these features are
obsolete. The resulting error rates are very close to error rates obtained in our work. The
reason for this is that the SVM technique used in (Galand et al., 1999) enables concentration
on the important discriminative features. In (Galand et al., 1999) a better result was
not obtained for a smaller number of “features” (sampling of the medial axis) apparently
because if a coarser sampling was used, the important details of the medial axis are not
represented in the feature vector. We argue that we are able to achieve similar error
rates with MMSE classifier, rather then SVM classifier, using appropriate scheme to select
important features.
Finally, we present a byproduct of the classification experiments, we call feature selec-
tion statistics. At each cross-validation step, a set of selected features is defined for any
given number of features. Retained features can vary between cross-validation steps. For
different numbers of retained features and for each feature, we compute the probability
that a given feature is selected. This probability is computed by counting how many times
163
# of features selected
feat
ure
inde
x
log(p)
5 10 15 20 25 30 35
5
10
15
20
25
30
35
−8
−7
−6
−5
−4
−3
−2
−1
Figure 8·12: Feature selection normalized probability. Male/female clas-sification with p-value feature selection. Log of the normalized probabilitythat a given feature is chosen in the set of N features. Horizontal axis: N- number of selected features; Vertical axis - feature index (1 through 37).
a given feature is selected over different cross-validation steps. The log of normalized prob-
ability so computed is shown in Figure 8·12 for male/female classification case and p-value
feature selection method. Relative feature importance can be judged by locating the most
often selected features (left part of the graph) and the least often selected features (right
part of the graph). Feature selection statistic presented here is of course closely related
to the feature importance visualization schemes presented in Section 8.3, since methods 1
and 2 use the ranking scores serving as feature selection criteria. The difference is that
in the methods 1 and 2 (Section 8.3) the whole dataset is used to compute the feature
ranking score, while feature selection in our classification experiments is carried out using
the training data subsets. One can see that feature selection probabilities in Figure 8·12
are closely correlated with feature importance visualization in Figure 8·6. Feature selection
statistic computed here is the generalization of the method 3 in Section 8.3. In fact, feature
selection probability can be computed for different number of selected features as well as
for different feature selection criteria.
164
8.5 Summary
We have presented a shape-based approach to classification and intra-class analysis using
skeleton shape representation. We used a robust variational approach to find a set of
anatomical skeleton features. We present three approaches to highlight intra-class shape
differences in the original image space based on our importance measures and skeleton
representation. In addition we approach the problem of inter-class discrimination of Corpus
Callosum shapes. To this end, two methods are presented and used to reduce the dimension
of the feature space. We design both linear MMSE and ada-boost-based classifiers and
compare the performance of these classifiers on reduced feature sets.
165
Chapter 9
Conclusions and future research
This dissertation contributes in three directions of research in object-based image analysis.
First, we work on novel shape and appearance modeling approaches. Second, we incorpo-
rate shape and appearance models constructed into boundary extraction tasks. Third, we
develop tools to study morphological differences of shapes.
In the shape modeling thrust of our research, we focus on shape representations by a
closed contour. We use a curve evolution framework in the boundary extraction task and
therefore seek a shape modeling approach that can be implemented using curve evolution.
Our goal is to construct an alternative shape model for the well known generic prior meth-
ods and deformable template based shape models. More specifically, we investigate ways
to construct and use a shape model that does not constrain a shape as a template, yet
contains information on the presence of certain characteristic features of the shape. We
also work on an appearance model that can be superior in situations posing problems for
state-of-the-art appearance modeling approaches.
In our first major contribution, we consider the maximum entropy shape model that
has the desirable property of reflecting the perceptual shape similarity. We are able to con-
siderably reduce the computational cost of constructing the model. We further propose a
method to use this model in a curve evolution framework for shape inferencing. Our results
show the advantages achieved by using such a model compared with current approaches in
image segmentation problems.
In the second major contribution of this thesis we propose a shape distribution-based
approach to model shapes. We propose a framework allowing the use of this type of shape
prior in curve evolution and derive curve evolution equations corresponding to our prior.
166
We demonstrate the properties of our model on shape morphing, shape interpolation,
image segmentation, and image segmentation with occlusion experiments. We extend our
shape distribution based approach to model inter-relationships between different image
structures, thereby achieving segmentation performance improvements in situation when
single object shape prior alone is not sufficient. We also propose a strategy to use our
approach to model 3D objects.
In the third major contribution we propose a joint shape and appearance modeling
framework, extending our shape distribution prior concept. Our new appearance model en-
codes the intensity and image-boundary relationship through distributions of intensity de-
pendent features sampled along trajectories parallel to the boundary. Our framework pro-
vides good segmentations in very challenging situations, when region-based and boundary-
based appearance models have difficulties. Such situations arise when object boundaries do
not correspond to strong edges in the image and when region statistics inside and outside
of the segmenting boundary are similar. Our model describes the image/boundary features
along the boundary and generalizes with respect to the positions of these characteristic fea-
tures along the boundary. This property allows our model to account for such large shape
variations using small training data sets, which can be beneficial for some applications.
In the fourth major contribution, we develop tools for morphological analysis of corpus
callosum shape differences. Specifically, we investigate the localizability of significant inter-
class shape differences and the possibility to automatically classify shapes as male/female
or normal/schizophrenia patients. We use skeleton based shape descriptors suitable for
corpus callosum shapes. We construct feature ranking metrics that allow for intuitive shape
differences visualization and for feature set dimensionality reduction. We test different
classifiers on the reduced feature set and obtain classification performances similar to those
reported in the literature.
167
9.1 Future research
The growing number of imaging technologies are able to acquire volumetric data. Process-
ing such volumetric data jointly should give considerable performance gains compared to
processing single 2D slices separately. Unfortunately, 3D extension is non-trivial for many
approaches. In this thesis we only briefly consider the 3D extension of our shape modeling
approach. Our single object shape distribution based framework can be extended to 3D,
as described in Chapter 4, but only one shape feature function definition considered in this
work can be formulated in 3D. Multi-scale curvature feature function can not be easily
extended to 3D because the distance between two points on the surface can not be defined
uniquely. The descriptive power of our model is limited when using only one feature func-
tion. To make matters even worse, a 3D object has more degrees of freedom and requires
more feature functions to be constrained properly. For these reasons, we were unable to
achieve significant results in 3D at this time. New feature functions should be developed to
construct an effective 3D extension of our shape distribution-based model. Another aspect
of the 3D shape distribution prior implementation is the computational cost of computing
the distributions and associated flows. We showed that for feature function #1, the cost
of computing the surface flow is O(A2), where A is the area of the surface. This challenge
can be addressed by using sampled distributions and feature flows. The 3D extension of
the multi-object framework is straightforward and computationally efficient (the surface
flow can be computed in O(A) operations, where A is the area of the surface). The ap-
pearance feature distribution model can also be extended to 3D with the same surface flow
computational cost per one feature function.
Another possible extension of our framework is to combine multiple object and inten-
sity feature distribution priors into a more general formulation including feature functions
defined on multiple boundaries and image intensities.
A potential application of our prior is the object description and predictive coding in
advanced video coding frameworks, such as MPEG4, where object description capabilities
168
remain largely unused.
In certain imaging applications, there exists periodicity in observed shapes and intensity
patterns. For instance, in spine segmentation, one can exploit such periodicity to achieve
reliable spinal cord extraction and vertebra position detection. In tracking applications,
such periodicity can be encountered in road traffic surveyance, industrial machine vision,
etc. We suggest the potential of extending current shape models to incorporate such
periodicity. For instance, one can simultaneously segment several periodic structures with
constraints on shape differences between those structures. This goal can be pursued using
our shape modeling approach. One possibility is to use the same target shape distribution
for all pairs of adjacent structures. A multi-scale approach can be used to encode the
consistency between the pairs of distant (not adjacent) contours.
169
Appendix A
Variational solution for the curve flow minimizing
shape distribution based prior energy
Notation:
E(Γ) - energy functional to be minimized
s, s1, s2, p - curve arc-length parameterizations normalized to the interval [0,1]
λ - variable spanning the range of the feature values
Γ - curve
~Γ(s) = x(s), y(s) - curve coordinates
~Γ(s1, s2) = ~Γ(s1) − ~Γ(s2) - vector x(s1) − x(s2), y(s1) − y(s2)
β(s) ∈ R1 - continuous differentiable function used to perturb the curve in the normal
direction
~n(s) - normal vector at the boundary location s
β(s) - deformation function (scalar) at the boundary location s
~β(s) = β(s)~n(s) - deformation vector at the boundary location s
H(Γ, λ) - cumulative distribution function defined on the curve Γ
H∗(λ) - prior cumulative distribution function
h(x) = h(x > 0) - indicator function (h is equal to 1 if x > 0 and 0 otherwise)
G(E) - Gateaux semi-derivative of the functional E
In this section we derive the gradient curve flow corresponding to the following energy
functional:
170
E =
∫
[H(Γ, λ) − H∗(λ)]2 dλ (A.1)
Steps to compute the curve flow minimizing the energy are:
1. Compute Gateaux semi-derivative of the energy E with respect to a small curve
perturbation β. Using chain rule
G(E, β) =
∫
G[
(H(Γ, λ) − H∗(λ))2]
dλ = (A.2)
2
∫
[H(Γ, λ) − H∗(λ)]G [H(Γ, λ), β] dλ (A.3)
2. If the Gateaux semi-derivative of a linear functional f exists, due the Rietz represen-
tation theorem
G(f, β) =< ∇f, β > (A.4)
were ∇f is the gradient flow. We use f = H and find ∇(H(Γ, λ)).
3. The flow minimizing the original functional E can be found as
∇E = 2
∫
[H(Γ, λ) − H∗(λ)]∇(H(Γ, λ))dλ (A.5)
Therefore, in the following discussion, we concentrate on step 2: finding the Gateaux
semi-derivative G [H(Γ, λ), β] and the corresponding flow ∇(H(Γ, λ))
A.1 Inter-point distance function
Additional definitions:
d(~Γ(s1), ~Γ(s2)) - Euclidean distance between points s1 and s2 on the curve
For this feature class, the cumulative distribution function for a curve Γ is defined as:
H(Γ, λ) =
1∫
0
1∫
0
h(
d(~Γ(s1), ~Γ(s2)) > λ)
ds1ds2 (A.6)
171
where h(x) is the indicator function.
Assuming an arbitrary perturbation function β and a small scalar ε, we apply the
perturbation εβ(s) to the curve Γ, resulting in the deformed curve
~Γ′(s) = ~Γ(s) + εβ(s)~n(s) (A.7)
(s ,s )Γ 1 2
’(s ,s )1 2Γ
’(s ,s )1 2Γ (s ,s )Γ 1 2
2n(s )
1n(s )
s
s
1
2
εβ( )
εβ( )
s
s
1
2
_
Figure A·1: Inter-point distance augmentation due to curve deformation
From Figure A.1 it can be seen that for a small ε the distance between 2 arbitrary
points s1 and s2 on the perturbed curve is
d(~Γ′(s1), ~Γ′(s2)) = d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1, s2)
|~Γ(s1, s2)|(A.8)
It means that the distance between 2 points on the curve is augmented by the projection
of the difference between two deformation vectors on the normalized vector ~Γ(s1, s2).
172
We can now write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition
G [H(Γ, λ), β] = limε→0
H(Γ′, λ) − H(Γ, λ)
ε=
limε→0
1∫
0
1∫
0
[
h(
d(~Γ′(s1), ~Γ′(s2)) > λ)
− h(
d(~Γ(s1), ~Γ(s2)) > λ)]
ds1ds2
ε=
limε→0
1∫
0
1∫
0
[
h(
d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1,s2)
|~Γ(s1,s2)|> λ
)
−
ε
− h(
d(~Γ(s1), ~Γ(s2)) > λ)]
ds1ds2(A.9)
In the following derivations we introduce the new simplified notation:
d = d(~Γ(s1), ~Γ(s2))
dβ = [β(s1)~n(s1) − β(s2)~n(s2)]
dΓ =~Γ(s1,s2)
|~Γ(s1,s2)|
nx(s), ny(s) = ~n(s) - the components of the normal vector at s
Given the new notation we rewrite
G [H(Γ, λ), β] = limε→0
1∫
0
1∫
0
[h (d − λ + εdβ · dΓ) − h (d − λ)] ds1ds2
ε(A.10)
First, we use the differentiable approximation of the indicator function:
h(x) −→ φα(x) =atan
(xα
)+ 1
2(A.11)
where the parameter α defines the degree of approximation. In the limiting case of α = 0
the approximation becomes an equality. We perform the following derivations using the
approximation and at the last step consider the limiting case of α = 0.
The Gateaux semi-derivative becomes:
G [H(Γ, λ), β] = limε→0
12
1∫
0
1∫
0
atan(
λ−dα
)+ atan
(εdβ·dΓ+d−λ
α
)
ε(A.12)
173
For small ε, using the Taylor expansion we obtain
atan
(εdβ · dΓ + d − λ
α
)
= atan
(d − λ
α
)
+ atan′
(d − λ
α
)(εdβ · dΓ
α
)
+ O(ε2) (A.13)
Now we can find the approximation of the Gateaux semi-derivative using the first 2
terms of the Taylor expansion as follows
G [H(Γ, λ), β] = limε→0
12
1∫
0
1∫
0
atan′(
d−λα
) (εdβ·dΓ
α
)
ε=
1
2α
1∫
0
1∫
0
atan′
(d − λ
α
)
(dβ · dΓ) ds1ds2 =1
2α
1∫
0
1∫
0
ds1ds2
1 +(
d−λα
)2 (dβ · dΓ) =
1
2α
1∫
0
1∫
0
ds1ds2
1 +(
d−λα
)2 [(nx(s1)β(s1) − nx(s2)β(s2))(x(s1) − x(s2))+
+(ny(s1)β(s1) − ny(s2)β(s2))(y(s1) − y(s2))] =
1
2α
1∫
0
1∫
0
nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))
1 +(
d−λα
)2 ds2
β(s1)ds1 −
1
2α
1∫
0
1∫
0
nx(s2)(x(s1) − x(s2)) + ny(s2)(y(s1) − y(s2))
1 +(
d−λα
)2 ds1
β(s2)ds2 (A.14)
By changing the order of integration in the second integral we obtain
G [H(Γ, λ), β] =
1
α
1∫
0
1∫
0
nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))
1 +(
d−λα
)2 ds2
β(s1)ds1 (A.15)
According to eq. A.4, the expression in square brackets is the gradient flow
∇H(Γ)(s) =1
α
1∫
0
nx(s)(x(s) − x(t)) + ny(s)(y(s) − y(t))
1 +(
d−λα
)2 dt =
1
α
1∫
0
n(s) · dΓ(s, t)
1 +(
d−λα
)2 dt (A.16)
174
Gradient flow minimizing the energy in eq. A.1 is therefore
∇E(Γ)(s) = 2
∫
λ
dλ [H∗(λ) − H(Γ, λ)]
1
α
1∫
0
n(s) · dΓ(s, t)
1 +(
d−λα
)2 dt
=
2
∫
λ
dλ [H∗(λ) − H(Γ, λ)]
1∫
0
αn(s) · dΓ(s, t)
α2 + (d − λ)2dt
(A.17)
For α ≈ 0, the expression under the integral is non-zero only when d = λ. By changing
the order of integration we obtain:
∇E(Γ)(s) = 2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|dt
∫
λ
[H∗(λ) − H(Γ, λ)]α
α2 + (d − λ)2dλ =
2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|
[
H∗(|~Γ(s, t)|) − H(Γ, |~Γ(s, t)|)]
dt (A.18)
The obtained expression is simple and has an intuitive interpretation. The flow at each
point s on the curve is computed as the integral along the curve. For each point t on the
curve the expression under the integral is the projection of the normal vector at s onto the
vector pointing from s to t, scaled by the difference between the current and the target
distributions evaluated at the distance between s and t. This projection is intuitively the
projection of the “force” acting on the feature (link between s and t). In discrete case,
the computational complexity of the flow computation is O(N 2) were N is the number of
nodes of the discretized curve.
A.2 Boundary curvature feature function
Now we consider the local boundary curvature as the feature function and attempt to find
the corresponding curve flow. The curvature at the boundary location s can be defined as
κ(s) = ~dn(s) ·~Γ′
|Γ′|(A.19)
175
where ~dn(s) is the infinitely small rate of change (vector) of the normal, ~Γ′ is the local
tangent vector. The cumulative distribution function for this feature function can be
written as
H(Γ, λ) =
∫
s
h
[
~dn(s) ·~Γ′
|Γ′|> λ
]
ds (A.20)
After the small curve perturbation by εβ(s), the new tangent vector and the rate of
change of the normal are:
~dn(s) −→ ~dn(s) + εβ′′(s)~Γ′
|Γ′|(A.21)
~Γ′ −→ ~Γ′ + εβ′(s) (A.22)
The Gateaux semi-derivative is
G(H(λ), β) =
limε→0
∫ds[
h(
( ~dn(s) + εβ′′(s)~Γ′
|Γ′|)(~Γ′+εβ′(s))
|~Γ′+εβ′(s)|> λ
)
− h(
~dn(s) ·~Γ′
|Γ′| > λ)]
ε(A.23)
Assuming |εβ′(s)| < |~Γ′|
G(H(λ), β) = limε→0
∫ds [h((κ(s) + εβ′′(s)) > λ) − h(κ(s) > λ)]
ε= (A.24)
limε→0
2∫
atan(
λ−κ(s)α
)
+ atan(
εβ′′(s)+κ(s)−λα
)
ε= (A.25)
1
α
∫
atan′
(κ(s) − λ
α
)
β′′(s)ds =
∫α
α2 + (κ(s) − λ)2β′′(s)ds (A.26)
Integrating by parts twice we get
176
G(H(λ), β) = (A.27)
2α
∫
ds(κ(s)κ′(s) − λκ′(s))2(α2 + (κ(s) − λ)2)2(κ(s) − λ)κ′(s)−
(α2 + (κ(s) − λ)2)4(A.28)
−(α2 + (κ(s) − λ)2)2(κ′(s)2 + κ(s)κ′′(s) − λκ′′(s)β(s) (A.29)
Resulting ∇(H(Γ))(s) is
∇(H(Γ))(s) = (A.30)
2α(κ(s)κ′(s) − λκ′(s))2(α2 + (κ(s) − λ)2)2(κ(s) − λ)κ′(s)−
(α2 + (κ(s) − λ)2)4(A.31)
−(α2 + (κ(s) − λ)2)2(κ′(s)2 + κ(s)κ′′(s) − λκ′′(s)(A.32)
At this point we obtain the solution that contains derivatives of curvature of the second
order. We expect the resulting flow to be very sensitive to noise and numerically unstable.
Therefore, we do not use boundary based curvature in our experiments.
A.3 Multiscale curvatures
A.3.1 Computation of feature function
For this feature function, the values are computed as “support” angles α, see Figure A·2,
defined for 3 boundary locations (“base” point s1 and symmetrical “side” points s1 − s2
and s1 + s2). The cumulative distribution function is given by
H(Γ, λ) =
1∫
0
1∫
0
h(α(s1 − s2, s1, s1 + s2) > λ)ds1ds2 (A.33)
We choose to define the “inner” angle α as shown on figure A·2 as the angle between
vectors ~dΓ(s1, s1−s2) and ~dΓ(s1, s1 +s2) measured always in the “same” half-space, which
means that the half-space can not be flipped when the angle corosses the π/2 threshold.
This half-space is fixed to be the inside of the curve for s2 = 0. In other words, the angle
α must be a continuous function of s2, and α(s2=0) = π.
177
s
(s −s )Γ 1 2
n(s −s )21
s −s21
α
b
a
γ
1
(s +s )Γ 1 2
1n(s +s )2
c
β
1n(s )
1s +s2
Figure A·2: Illustration of feature value computation for feature function#2
Figure A·3 illustrates four cases of relative locations of points on the curve. It can be
seen that in general case, unambiguous determination of the angle from rays ~Γ(s1, s1 − s2)
and ~Γ(s1, s1 + s2) is impossible from just a combination of points without any further
assumptions. We therefore determine the angle α sequentially for s2 increasing from 0
to 1 and detect the instances when the angle crosses the π/2 threshold. This operation
eliminates the ambiguity.
Figure A·3: Four cases of the relative positions of three curve points. Thesupport angle can not be determined unambiguously.
Let us define a flag function r(s1, s2). For each “base” point s1 the angle α is computed
178
sequentially for s2 increasing from 0 to 1. This process is illustrated in Figure A·4. For each
1
2
3
4
1
2
3
4
r=1r=1
r=1
r=−1
Figure A·4: Sequential computation of the angles for a particular “base”point s1, starting from r = 1 (assuming inside of the curve is up-wards).
base point s1, we start from s2 = 0; r(s1, s2) = sign(κ(s1)); α(s1, s2) = π. Flag function
r(s1, s2) for s2 > 0 is defined as
r(s1, s2) =
1 if α(s1, s2) <= π
−1 otherwise(A.34)
We define the mean direction vector as
~Γm(s1 − s2, s1 + s2) =~Γ(s1, s1 − s2)
|~Γ(s1, s1 − s2)|+
~Γ(s1, s1 + s2)
|~Γ(s1, s1 + s2)|(A.35)
In the process of computing α(s1, s2), as s2 increases by ds, we capture the change of
orientation of the mean direction vector by looking at the scalar product
C = ~Γm(s1 − s2, s1 + s2) · ~Γm(s1 − s2 − ds, s1 + s2 + ds) (A.36)
If C becomes less then zero for some s2 = l we change the sign of r(s1, l) for consecutive
values of s2 > l. The angle between rays ∠(~Γ(s1, s1−s2), ~Γ(s1, s1+s2)) ∈ [0, π] is measured
as the inverse cosine. After this sequential computation is performed for all s1 we end
up with 2D functions ∠(s1, s2) and r(s1, s2). The latest is used to correct the values of
∠(~Γ(s1, s1 − s2), ~Γ(s1, s1 + s2)) that must be greater then π. Finally, the feature values
179
α(s1, s2) are given by
α(s1, s2) = (2π)1 − r(s1, s2)
2+ r(s1, s2)acos
(~dΓ(s1, s1 − s2)
| ~dΓ(s1, s1 − s2)|·
~dΓ(s1, s1 + s2)
| ~dΓ(s1, s1 + s2)|
)
(A.37)
A.3.2 Curve flow computation
In order to find the flow minimizing the energy in eq. 4.29, we perform essentially the same
procedure as for the feature function #1. We refer the reader to Section A.1 for omitted
details.
First, we perturb the curve in the normal direction by εβ(s). It is important that all
3 points (“base” point and 2 “side” points) on the curve change their positions. Let α′
be the angle after the perturbation. Using the continuous approximation of the indicator
function, we obtain
G(H(λ), β) = limε→0
∫ ∫ds1ds2 [h((α′) > λ) − h((α) > λ)]
ε=
limε→0
∫ ∫ds1ds2 [φγ(α′ − λ) − φγ(α − λ)]
ε=
limε→0
12
∫ ∫ds1ds2
[
atan(α′−λγ
) − atan(α−λγ
)]
ε=
limε→0
12
∫ ∫ds1ds2atan
′(α−λγ
)(α′ − α)
ε= lim
ε→0
12
∫ ∫ds1ds2
γγ2+(α−λ)2
(α′ − α)
ε(A.38)
We must therefore compute the angle increment α′ − α resulting from the perturba-
tion. This increment consists of 3 terms (additive due to the assumption of the small
perturbation):
α′ − α = dα(1) + dα(2) + dα(3) (A.39)
where the first 2 terms result from the displacement of the “side” points and the third term
results from the displacement of the “base” point Γ(s1).
We first consider dα(1) (dα(2) is determined similarly). The local geometry of the curve
perturbation at the point s1 + s2 is shown in detail in Figure A·5. It is easy to see that
180
n
p
dα
dΓ
Γd ’
θ
εβ( )s
Figure A·5: Local perturbation of the curve at point ~Γ(s1 + s2). Pertur-bation εβ(s1 + s2) is infinitely small comparing to |~Γ(s1, s1 + s2)|
the increment dα(1) can be found as follows
p
εβ(s)= sin(Θ)
p = εβ(s)√
1 − cos2 Θ
dα(1) =p
|dΓ|=
εβ(s)
√
1 −(
~n(s) · dΓ|dΓ|
)2
|dΓ|(A.40)
1
2
L
P
P
s1
Γm
Figure A·6: Illustration of 2 cases when the sign of the angle incrementdα(1) is different for the same curve perturbation εβ(s).
It is important to recognize that the sign of the above angle increment depends on the
relative direction of the normal ~n(s1 + s2) and ~Γm(s1, s1 + s2). Two possible cases are
illustrated in Figure A·6. In case 1, for a positive β(s1 + s2), the increment is positive and
181
for case 2, it is negative. We define a flag function f(s1 + s2) as follows
f(s1 + s2) =
−1 if points L and P are on the same side of Γ(s1, s1 + s2)
1 otherwise(A.41)
Under this definition,
dα(1) =εβ(s1 + s2)
√
1 −(
~n(s1 + s2) ·~Γ(s1+s2)
|~Γ(s1+s2)|
)2
|~Γ(s1 + s2)|f(s1 + s2) (A.42)
dα(2) =εβ(s1 − s2)
√
1 −(
~n(s1 − s2) ·~Γ(s1−s2)
|~Γ(s1−s2)|
)2
|~Γ(s1 − s2)|f(s1 − s2) (A.43)
We now proceed to calculate dα(3). Let us assume that the angles α′ and α are computed
as inverse cosines. In such case, for α > π, the sign of the angle increment must be changed
as follows
dα(3) = (α′ − α)r(s1, s2) (A.44)
were α′ is the angle resulting after displacing the point s1. Using the cosine theorem and the
abbreviations from Figure A·2, we can write the angle increment due to the displacement
of the point s1 as
α′ − α = acos
(a2 − b′2 − c′2
−2b′c′
)
− acos
(a2 − b2 − c2
−2bc
)
=
acos
a2 −(
b − n(s1) ·Γ−|Γ−|β(s1)ε
)2−(
c − n(s1) ·Γ+|Γ+|β(s1)ε
)2
−2(
b − n(s1) ·Γ−|Γ−|β(s1)ε
)(
c − n(s1) ·Γ+|Γ+|β(s1)ε
)
−
−acos
(a2 − b2 − c2
−2bc
)
=
acos
a2 − b2 − c2 + 2β(s1)ε
(
n(s1) ·Γ−|Γ−|b + n(s1) ·
Γ+|Γ+|c
)
−2bc + 2β(s1)ε(
n(s1) ·Γ−|Γ−|c + n(s1) ·
Γ+|Γ+|b
)
−
−acos
(a2 − b2 − c2
−2bc
)
(A.45)
were ()′ indicates the quantity after displacing the point s1; Γ+ = ~Γ(s1, s1 + s2) and
182
Γ− = ~Γ(s1, s1 − s2). Using Tailor expansion
m + ε1
n + ε2=
m
n+
ε1n
+ε2m
n2(A.46)
we obtain
α′ − α = acos
a2 − b2 − c2
−2bc+
β(s1)ε~n(s1) ·(
Γ−|Γ−|b + Γ+
|Γ+|c)
−bc−
−β(s1)ε~n(s1) ·
(Γ−|Γ−|c + Γ+
|Γ+|b)
(a2 − b2 − c2)
2b2c2
− acos
(a2 − b2 − c2
−2bc
)
(A.47)
Using Tailor expansion of the cosine function we obtain
α′ − α =
acos′(
a2 − b2 − c2
−2bc
)
εβ(s1) ×
~n(s1) ·
(
−Γ−
c|Γ − |−
Γ+
b|Γ + |−
a2 − b2 − c2
2b2c2
(Γ−
|Γ − |c +
Γ+
|Γ + |b
))
=
−1
√
cos(α)
εβ(s1)~n(s1)
2bc·
(Γ+
|Γ + |
a2 + c2 − b2
c+
Γ−
|Γ − |
a2 + b2 − c2
b
)
=
−1
sin(α)
εβ(s1)a
bc[cos(β) cos(n(s1), Γ+) + cos(γ) cos(n(s1), Γ−)]︸ ︷︷ ︸
K
(A.48)
We finally obtain the expression for dα(3). We only need to resolve the uncertainty
arising when α ≈ π. In such case, we assume α = π − δ, β = O(δ), γ = O(δ). Using Tailor
expansion we can write the square bracket as
K =
(1 − 1
2β2)cos(α − ∠(n(s1), Γ−)) +
(1 − 1
2γ2)cos(∠(n(s1), Γ−))
δ=
− cos(δ + ∠(n(s1), Γ−)) + cos(∠(n(s1), Γ−))
δ=
−cos(∠(n(s1), Γ−)) − sin(∠(n(s1), Γ−))δ − cos(∠(n(s1), Γ−))
δ=
sin(~n(s1), ~Γ(s1, s1 − s2)) (A.49)
183
Finally,
dα(3) =
−εβ(s1)a
bc
cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α
when α 6= π
sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise
r(s1, s2) (A.50)
Combining eq. A.38 and eq. A.39, the Gateaux semi-derivative can be written as
G(H(λ), β) =
1∫
0
1∫
0
ds1ds2γ
γ2 + (α − λ)2
β(s1 + s2)
√
1 −(
~n(s1 + s2) ·~Γ(s1+s2)
|~Γ(s1+s2)|
)2
|~Γ(s1 + s2)|f(s1 + s2)+
β(s1 − s2)
√
1 −(
~n(s1 − s2) ·~Γ(s1−s2)
|~Γ(s1−s2)|
)2
|~Γ(s1 − s2)|f(s1 − s2) −
−β(s1)a
bc
cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α
when α 6= π
sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise
r(s1, s2)
(A.51)
Using Ritz representation theorem and changing variables of integration we obtain the
gradient flow minimizing H(Γ, λ)
∇H(Γ, λ)(s) =1∫
0
−γdt
γ2 + (α(s) − λ)2a
bc×
cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α
when α 6= π
sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise
r(s1, s2) +
1∫
0
γdt
γ2 + (α(s − t) − λ)2
√
1 −(
~n(s − t) ·~Γ(s−t)
|~Γ(s−t)|
)2
|~Γ(s − t)|f(s − t) +
1∫
0
γdt
γ2 + (α(s + t) − λ)2
√
1 −(
~n(s + t) ·~Γ(s+t)
|~Γ(s+t)|
)2
|~Γ(s + t)|f(s + t) (A.52)
184
Recall that the gradient of the energy is
∇E(Γ)(s) =
∫
dλ[H∗(λ) − H(Γ, λ)]∇H(Γ, λ)(s) (A.53)
Using eq. A.52, changing the order of integration, and considering the limiting case of small
γ, we obtain
∇E(Γ)(s) =
−
∫
t∈S
cos(β) cos(n(s),Γ+)+cos(γ) cos(n(s),Γ−)sin α
if α 6= π
sin(~n(s), ~Γ−)) otherwise
×
a · r(s,t)
bc−
f (s−t)
√
1 −(
~n(s − t) ·~Γ−
|~Γ−|
)2
|~Γ−|−f (s+t)
√
1 −(
~n(s + t) ·~Γ+
|~Γ+|
)2
|~Γ+|
×
[
H∗(α(s, t)) − H(Γ, α(s, t))
]
dt (A.54)
where r(s,t) and f (s+t) take values -1 and 1 and indicate the sign of change of the angle
α(s, t) = ∠(~Γ−, ~Γ+) with respect to along-the-normal perturbation of the point Γ(s) and
Γ(s + t) respectively, ~Γ+ = ~Γ(s, s + t); ~Γ− = ~Γ(s, s− t); a = |~Γ+ − ~Γ−|; b = |Γ−|; c = |Γ+|;
β = ∠(−~Γ+, ~Γ− − ~Γ+); γ = ∠(−~Γ−, ~Γ+ − ~Γ−).
We finally obtain the closed form solution for the flow minimizing the energy corre-
sponding to feature function #2. The computational complexity for the discrete curve
with N nodes is O(N2).
A.4 Feature classes with weighting function
In this section we consider the energy with non-uniform weighting of the distribution dif-
ference across the feature value range. Using the notation ω(λ) for the weighting function,
185
we focus on the the energy given by
E(Γ) =
∫
w(λ)[
H∗(λ) − H(Γ, λ)]2
dλ (A.55)
A.4.1 Inter-point distance feature function
We will present the derivation of the minimizing flow for the inter-point distances feature
function. Results for other feature functions can be obtained similarly.
Definitions:
d(~Γ(s1), ~Γ(s2)) - Euclidean distance between points s1 and s2 on the curve
The weighting function ω(λ) must be normalized as follows:
1∫
o
1∫
o
w(
d(~Γ(s1), ~Γ(s2)))
ds1ds2 = 1 (A.56)
Therefore we use the renormalized weighting function:
w(d) =w(d)
1∫
o
1∫
o
w(
d(~Γ(s1), ~Γ(s2)))
ds1ds2
(A.57)
The normalization must take place at every iteration in order to insure that distribution
sums up to 1. We assume that w(d) is differentiable.
For this definition of the feature function
H(Γ, λ) =
1∫
o
1∫
o
w(
d(~Γ(s1), ~Γ(s2)))
h(
d(~Γ(s1), ~Γ(s2)) > λ)
ds1ds2 (A.58)
Now we apply the deformation flow ε~β(s) to the curve Γ, yielding deformed curve
Γ′ = ~Γ(s) + ε~β(s) (A.59)
186
One can see that for a small ε, the distance between 2 points on the perturbed curve is
d(~Γ′(s1), ~Γ′(s2)) = d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1, s2)
|~Γ(s1, s2)|(A.60)
It means that the distance between 2 points on the curve is augmented by the projection
of the difference between two deformation vectors on the vector ~Γ(s1, s2).
Now we can write a Gateaux semi-derivative G [H(Γ, λ), β] according to its definition
and using the previous result
G [H(Γ, λ), β] = limε→0
H(Γ′, λ) − H(Γ, λ)
ε=
limε→0
1∫
0
1∫
0
[
w(d′)h(
d(~Γ′(s1), ~Γ′(s2)) > λ)
− w(d)h(
d(~Γ(s1), ~Γ(s2)) > λ)]
ds1ds2
ε=
limε→0
1∫
0
1∫
0
[
w(d′)h(
d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1,s2)
|~Γ(s1,s2)|> λ
)
−
ε
− w(d)h(
d(~Γ(s1), ~Γ(s2)) > λ)]
ds1ds2
1(A.61)
In the following, we simplify the notation:
d = d(~Γ(s1), ~Γ(s2))
dβ = [β(s1)~n(s1) − β(s2)~n(s2)]
dΓ =~Γ(s1,s2)
|~Γ(s1,s2)|
Now we can rewrite
G [H(Γ, λ), β] = limε→0
1∫
0
1∫
0
[w(d′)h (d − λ + εdβ · dΓ) − w(d)h (d − λ)] ds1ds2
ε(A.62)
In order to find the limit, we use the smooth approximation of the indicator function:
h(x) −→ φα(x) =atan
(xα
)+ 1
2(A.63)
187
The Gateaux semi-derivative becomes:
G [H(Γ, λ), β] =
limε→0
12
1∫
0
1∫
0
w(d)[atan
(λ−dα
)+ 1]+ w(d + εdβ · dΓ)
[
atan(
εdβ·dΓ+d−λα
)
+ 1]
ε(A.64)
Considering a small ε we use the first term of Taylor series.
atan
(εdβ · dΓ
|dΓ| + d − λ
α
)
= atan
(d − λ
α
)
+ atan′
(d − λ
α
)(
εdβ ·dΓ
α|dΓ|
)
+ ... (A.65)
w(d + εdβ · dΓ) = w(d) + w′(d)εdβ · dΓ (A.66)
Now we can finally find the Gateaux semi-derivative
G [H(Γ, λ), β] = limε→0
12
1∫
0
1∫
0
w(d)atan′(
d−λα
) (εdβ·dΓ
α
)
+ w′(d)[atan
(d−λα
)+ 1]εdβ · dΓ
ε=
1
2α
1∫
0
1∫
0
w(d)atan′
(d − λ
α
)
(dβ · dΓ) ds1ds2 +
1
2
1∫
0
1∫
0
w′(d)
[
atan
(d − λ
α
)
+ 1
]
(dβ · dΓ)ds1ds2 =
1
2α
1∫
0
1∫
0
w(d)ds1ds2
1 +(
d−λα
)2 (dβ · dΓ)
︸ ︷︷ ︸
I
+1
2
1∫
0
1∫
0
w′(d)
[
atan
(d − λ
α
)
+ 1
]
(dβ · dΓ)ds1ds2
︸ ︷︷ ︸
II
Part I:
188
G1 =1
2α
1∫
0
1∫
0
w(d)ds1ds2
1 +(
d−λα
)2 [(nx(s1)β(s1) − nx(s2)β(s2))(x(s1) − x(s2))+
+(ny(s1)β(s1) − ny(s2)β(s2))(y(s1) − y(s2))] =
1
2α
1∫
0
1∫
0
nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))
1 +(
d−λα
)2 ds2
β(s1)w(d)ds1 −
1
2α
1∫
0
1∫
0
nx(s2)(x(s1) − x(s2)) + ny(s2)(y(s1) − y(s2))
1 +(
d−λα
)2 ds1
β(s2)w(d)ds2 (A.67)
By changing the order of integration in the second integral we obtain
G1 [H(Γ, λ), β] =
1
α
1∫
0
1∫
0
nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))
1 +(
d−λα
)2 ds2
β(s1)w(d)ds1 (A.68)
According to eq. A.4, the expression in square brackets is the gradient
∇H1(Γ)(s) =1
α
1∫
0
nx(s)(x(s) − x(t)) + ny(s)(y(s) − y(t))
1 +(
d−λα
)2 w(d)dt =
1
α
1∫
0
n(s) · dΓ(s, t)
1 +(
d−λα
)2 w(d)dt (A.69)
The gradient flow minimizing energy in eq. 4.29 is therefore
∇E1(Γ)(s) = 2
∫
λ
dλ [H(Γ, λ) − H∗(λ)]
1
α
1∫
0
n(s) · dΓ(s, t)
1 +(
d−λα
)2 w(d)dt
=
2
∫
λ
dλ [H(Γ, λ) − H∗(λ)]
1∫
0
αn(s) · dΓ(s, t)
α2 + (d − λ)2w(d)dt
(A.70)
For α ≈ 0, the expression inside the integral is only non-zero when d = λ. By changing
the order of integration we obtain:
189
∇E1(Γ)(s) =
2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|w(d)dt
∫
λ
[H(Γ, λ) − H∗(λ)]α
α2 + (d − λ)2dλ =
2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|w(|~Γ(s, t)|)
[
H(Γ, |~Γ(s, t)|) − H∗(|~Γ(s, t)|)]
dt (A.71)
Part II: Similarly we obtain:
∇H2(Γ)(s) =
1∫
0
w′(d)
[
atan
(d − λ
α
)
+ 1
]
n(s) · dΓ(s, t)dt (A.72)
The gradient flow minimizing the energy in eq. 4.29 is therefore
∇E2(Γ)(s) =
2
∫
λ
dλ [H(Γ, λ) − H∗(λ)]
1∫
0
w′(d)
[
atan
(d − λ
α
)
+ 1
]
n(s) · dΓ(s, t)dt
(A.73)
By changing the order of integration we obtain:
∇E2(Γ)(s) =
2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|w′(|~Γ(s, t)|)dt
∫
λ
[H(Γ, λ) − H∗(λ)]
[
atan
(d − λ
α
)
+ 1
]
dλ =
4
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|w′(|~Γ(s, t)|)dt
∫
λ
[H(Γ, λ) − H∗(λ)] h(|~Γ(s, t)| − λ)dλ =
4
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|w′(|~Γ(s, t)|)dt
|~Γ(s,t)|∫
0
[H(Γ, λ) − H∗(λ)] dλ (A.74)
190
The overall flow is
∇E(Γ)(s) = ∇E1(Γ)(s) + ∇E2(Γ)(s) =
2
1∫
0
~n(s) ·~Γ(s, t)
|~Γ(s, t)|dt
w(|~Γ(s, t)|)
[
H(Γ, |~Γ(s, t)|) − H∗(|~Γ(s, t)|)]
+
w′(|~Γ(s, t)|)dt
|~Γ(s,t)|∫
0
[H(Γ, λ) − H∗(λ)] dλ
(A.75)
A.5 Relative inter-object distances.
This feature function relates 2 shapes. Here we derive the curve flow for the curve Γ,
induced by distribution of distances between curves Γ and Ω. We call Ω the generating
curve and Γ the target curve. We assume that the signed distance transform of the curve
Ω is known at every point in the image plane. That is done by pre-computing the signed
distance function values DΩ(x, y) on the grid crossings and using linear interpolation to
obtain the values at non-integer locations. We compute the gradient ∇ ~DΩ(x, y) from grid
values using central rule of differentiation and linear interpolation to compute ∇ ~DΩ(x, y)
at arbitrary locations. We further normalize DΩ(x, y) by R(Ω) - the mean radius of shape
Ω with respect to its center of mass, so that
DΩ(x, y)′ =DΩ(x, y)
R(Ω)(A.76)
This feature function is computed as DΩ(x, y)′ measured along the curve Γ. The dis-
tribution function is given by
H(Γ, λ) =
1∫
0
h(
DΩ(~Γ(s))′ > λ)
ds (A.77)
191
Now we can write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition
G [H(Γ, λ), β] = limε→0
H(Γ′, λ) − H(Γ, λ)
ε=
limε→0
1∫
0
h(
DΩ(~Γ(s) + ε~n(s)β(s))′ − λ)
ds −1∫
0
h(
DΩ(~Γ(s))′ − λ)
ds
ε
limε→0
1∫
0
[
h(
DΩ(~Γ(s)) + εβ(s)~n(s)∇DΩ(~Γ(s))′ − λ)
− h(
DΩ(~Γ(s))′ − λ)]
ds
ε(A.78)
Using the differentiable approximation of the indicator function
G [H(Γ, λ), β] =
limε→0
1∫
0
ds[
φγ(DΩ(~Γ(s)) + εβ(s)~n(s)∇DΩ(~Γ(s))′ − λ) − φγ(DΩ(~Γ(s))′ − λ)]
ε
limε→0
12
1∫
0
ds[
atan′(
DΩ(~Γ(s))′−λγ
)
εβ(s)~n(s)∇DΩ(~Γ(s))′]
ε
1
2
1∫
0
ds
[
γ
γ2 + (DΩ(~Γ(s))′ − λ)2β(s)~n(s)∇DΩ(~Γ(s))′
]
(A.79)
Using Ritz representation we obtain
∇H(Γ, λ)(s) =1
2
γ
γ2 + (DΩ(~Γ(s))′ − λ)2~n(s)∇DΩ(~Γ(s))′ (A.80)
The gradient flow minimizing the energy is
∇E(Γ)(s) =
limγ−→0
∫
λ
dλ [H∗(λ) − H(Γ, λ)]γ
γ2 + (DΩ(~Γ(s))′ − λ)2~n(s)∇DΩ(~Γ(s))′ =
~n(s) · ~∇DΩ(s)
R(Ω)
[
H∗
(DΩ(s)
R(Ω)
)
− H
(
Γ,DΩ(s)
R(Ω)
)]
(A.81)
where DΩ(s) is the value of the signed distance function generated by curve Ω at the point
192
on the curve Γ given by X(s), Y (s), and R(Ω) is the mean radius of the shape Ω relative
to its center of mass.
Appendix B
Curve flow for intensity based feature function
Here we consider the intensity dependent feature function, see section 7.3. Let us consider
a single feature function Φ(s) defined on the curve as
Φ(s) = I(x(s)) = I(Γ(s) + R (s,n(s)) [i j]T
)(B.1)
where I is the image, Γ is the boundary parameterized by arc-length s, n(s) is the local
normal and R(s,n(s)) is the 2D rotation matrix aligning n(s) with j-axis of the patch
coordinate system. x(s) is the trajectory traced by point i, j as the coordinate system
origin is moved along the curve:
x(s) = Γ(s) + R (s,n(s)) [i j]T (B.2)
For this feature function, the CDF for the curve Γ is given by
H(Γ, λ) =
1∫
0
h(I(Γ(s) + R (s,n(s)) [i j]T
)> λ
)ds (B.3)
Applying a small perturbation εβ to the curve, the incremented feature value is
Φ′(s) = I(Γ(s) + εβ(s)n(s)R (s,n(s)) [i j]T
)(B.4)
Using Tailor series expansion
Φ′(s) = I(x′(s)) = I(x(s)) + ∇I(x(s)) · n(s)εβ(s) (B.5)
193
194
Now we can write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition
G [H(Γ, λ), β] = limε→0
H(Γ′, λ) − H(Γ, λ)
ε=
limε→0
1∫
0
[h(Φ′(s) > λ) − h(Φ(s) > λ)] ds
ε(B.6)
Using the differentiable approximation of the indicator function
G [H(Γ, λ), β] = limε→0
12
1∫
0
[
atan(
Φ(s)−λα
)
− atan(
Φ′(s)−λα
)]
ds
ε(B.7)
for a small ε, using Taylor expansion
atan
(Φ′(s) − λ
α
)
= atan
(Φ(s) − λ
α
)
+ atan′
(Φ(s) − λ
α
)∇I(x(s)) · n(s)εβ(s)
α+ O(ε2)
(B.8)
Now the Gateaux semi-derivative becomes
G [H(Γ, λ), β] = limε→0
12
1∫
0
[
atan′(
Φ(s)−λα
)∇I(x(s))·n(s)εβ(s)
α
]
ds
ε
1
2α
1∫
0
[
atan′
(Φ(s) − λ
α
)
∇I(x(s)) · n(s)β(s)
]
ds (B.9)
Hence,
∇H(Γ)(s) =1
2αatan′
(Φ(s) − λ
α
)
∇I(x(s)) · n(s) =
1
2α
∇I(x(s)) · n(s)
1 +(
Φ(s)−λα
)2 (B.10)
195
Now the gradient flow minimizing the energy is
∇E(Γ)(s) = 2
∫
λ
dλ[H∗(λ) − H(Γ, λ)]
1
2α
∇I(x(s)) · n(s)
1 +(
Φ(s)−λα
)2
∫
λ
dλ[H∗(λ) − H(Γ, λ)]
[∇I(x(s)) · n(s)
α2 + (Φ(s) − λ)2
]
(B.11)
Taking the limit of α = 0
∇E(Γ)(s) = [H∗(Φ(s)) − H(Γ, Φ(s))]∇I(x(s)) · n(s) (B.12)
Appendix C
Multidimensional CDF based shape prior
We now consider the multidimensional (joint) CDF prior. This form of prior arises when
feature functions are defined on the same space Ω, and are correlated random processes. In
such a case, joint CDF contains information about both marginal distributions of individual
feature functions and inter-variable correlations. We will consider the special form of
feature functions, namely, the feature functions defined on the curve, as used in Chapter 7.
Let us have N feature functions Φ1(s), Φ2(s),..., ΦN (s), where s ∈ [0, 1] is the normalized
arc-length of the curve Γ. The joint CDF corresponding to the set of feature functions is
defined as
H(λ1, λ1, ..., λN ) =
1∫
0
h(Φ′1(s) < λ1)h(Φ′
2(s) < λ2) · · ·h(Φ′N (s) < λN )ds (C.1)
The joint shape distribution prior energy is defined as
E =
∫
λ1
∫
λ2
...
∫
λN
[H∗(λ1, λ1, ..., λN ) − H(Γ, λ1, λ1, ..., λN )]2 dλ1dλ1 · · ·λN (C.2)
where H∗(λ1, λ1, ..., λN ) is the prior joint CDF and H(Γ, λ1, λ1, ..., λN ) is the joint CDF
for the curve Γ. We now follow the variational approach assuming small perturbation of
the curve εβ(s). Taking Gateaux semi-derivative of E with respect to this perturbation
G(E, β) = 2
∫
λ1
∫
λ2
...
∫
λN
[H∗(λ1, λ1, ..., λN ) − H(Γ, λ1, λ1, ..., λN )] ×
G (H(Γ, λ1, λ1, ..., λN ), β) dλ1dλ1 · · ·λN (C.3)
196
197
We now consider Gateaux semi-derivative G (H(Γ, λ1, λ1, ..., λN ), β).
G (H(Γ, λ1, λ1, ..., λN ), β) = limε→0
H(Γ + εβ) − H(Γ)
ε
limε→0
1∫
0
(h(Φ′1(s) < λ1) · · ·h(Φ′
N (s) < λN ) − h(Φ1(s) < λ1) · · ·h(ΦN (s) < λN ))ds
ε(C.4)
Let us now consider a single indicator function h(Φ(s) < λ). As previously, we use
smooth approximation that approaches h(Φ(s) < λ) as α approaches zero.
limα−→0
[atan (x/α) + π/2] = πh(x) (C.5)
We assume that Φ(s) is differentiable with respect to small curve perturbation at s, so the
Taylor series expansion of the updated value Φ′(s) with respect to this small perturbation
is
Φ(s, εβ) = Φ(s) +∂Φ(s)
∂nεβ(s) + O(ε) (C.6)
where ∂Φ(s)∂n
is the derivative with respect to small perturbation along the normal. Now we
can write the Taylor expansion for h(Φ′(s) < λ):
πh(Φ′(s) < λ) ≈ atan
(Φ(s) − λ
α
)
+π
2+ atan′
(Φ(s) − λ
α
)∂Φ(s)
∂nεβ(s) (C.7)
We now use the approximation to compute the Gateaux semi-derivative of H. We sub-
stitute approximation in eq. C.7 into eq. C.4. The resulting expression is a polynomial in
ε. It is easy to see that the terms containing ε0 mutually cancel. Since ε is small we can
retain only the largest terms, those containing ε1. As a result, we have
G (H(Γ, λ1, λ1, ..., λN ), β) =
1
πNlimε→0
∫ N∑
i=1atan′
(Φi(s)−λi
α
)∂Φi(s)
∂nεβ(s)
∏
j 6=i
(
atan(
Φj(s)−λj
α
)
+ π2
)
ds
ε=
1
πN
∫ N∑
i=1
atan′
(Φi(s) − λi
α
)∂Φi(s)
∂nβ(s)
∏
j 6=i
(
atan
(Φj(s) − λj
α
)
+π
2
)
ds (C.8)
198
Therefore, the minimizing flow ∇H is
∇H(Γ, λ1, λ1, ..., λN )(s) =
1
πN
N∑
i=1
atan′
(Φi(s) − λi
α
)∂Φi(s)
∂n
∏
j 6=i
(
atan
(Φj(s) − λj
α
)
+π
2
)
= (C.9)
1
π
N∑
i=1
∂Φi(s)
∂n
1
1 +(
Φi(s)−λi
α
)2
∏
j 6=i
h(Φj(s) < λj) (C.10)
In the limit of α ≈ 0, the gradient flow minimizing the energy in eq. C.2 is
∇E(s) =
1
π
N∑
i=1
∂Φi(s)
∂n
∫
...
∫
λj 6=i
[H(Φi(s), λj 6=i) − H∗(Φi(s), λj 6=i)] ×
∏
j 6=i
h(Φj(s) < λj)dλ1,...j 6=i,...N (C.11)
According to eq. C.11, the computation of the minimizing flow at each point on the contour
requires N−1 dimensional integration. Therefore, the computational expense in using joint
CDF prior is exponential in N , where N is the number of feature functions used. How-
ever, we hypothesize that an efficient approximation can be made, reducing the required
computations.
References
Alcayde, D., Blelly, P.-L., Kofman, W., Litvin, A., and Oliver, W. (2001). Effects of hotoxygen in the ionosphere: Transcar simulations. Annales Geophysicae, 19:257–261.
Alexandrov, O. and Santosa, F. (2005). A topology-preserving level set method for shapeoptimization. Journal of Computational Physics, 204(1):121–130.
Antani, S., Kasturi, R., and Jain, R. (2002). A survey on the use of pattern recognitionmethods for abstraction, indexing and retrieval of images and video. Pattern Recogni-tion, 35:945–965.
Basri, R., Costa, L., Geiger, D., and Jacobs, D. (1995). Determining the similarity ofdeformable shapes. In Proceedings of the Workshop on Physics Based Modeling inComputer Vision, pages 135–143, Cambridge, MA.
Belongie, S., Malik, J., and Puzicha, J. (2002). Shape matching and object recognitionusing shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence,24.
Berretti, S., Del Bimbo, A., and Pala, P. (2000). Retrieval by shape similarity withperceptual distance and effective indexing. IEEE Transactions on Multimedia, 2(4):225–239.
Blum, H. (1967). A Transformation for Extracting New Descriptions of Shape. MITPress.
Bookstein, F. (1991). Morphometric tools for landmark data: Geometry and Biology.Cambridge University Press.
Bookstein, F. (1998). Linear Methods for Nonlinear Maps: Procrustes Fits, Thin-PlateSplines, and the Biometric Analysis of Shape Variability. Brain Warping. AcademicPress.
Bookstein, F. L. (1997). Landmark methods for forms without landmarks: morphometricsof group differences in outline shape. Medical Image Analysis, 1(3):225–243.
Borgefors, G. (1984). Distance transformations in arbitrary dimensions. Computer Vision,Graphics, and Image Processing, 27:321–345.
Canzar, S. and Remy, J. (2005). Shape distributions and protein similarity. submitted.
Caselles, V., Kimmel, R., and Sapiro, G. (1997). Geodesic active contours. InternationalJournal of Computer Vision, 22(1):61–79.
199
200
Chan, T. and Vese, L. (2001). Active contours without edges. IEEE Transactions onImage Processing, 10(2):266–277.
Charpiat, G., Faugeras, O., and Keriven, R. (2003). Approximations of shape metrics andapplication to shape warping and empirical shape statistics. Technical Report RR-4820,INRIA, Sophia Antipolis, France.
Christensen, G. (1994). Deformable shape models for anatomy. PhD thesis, WashingtonUniversity, St. Louis, US.
Chupeau, B. and Forest, R. (2001). Evaluation of the effectiveness of color attributes forvideo indexing. Journal of Electronic Imaging, 10(4):883–894.
Cohen, Z., McCarthy, D., Kwak, S., Legrand, P., Fogarasi, F., Ciaccio, E., and Ateshian,G. (1999). Knee cartilage topography, thickness and contact areas from mri: In vitrocalibration, and in vivo measurements. Osteoarthritis & Cartilage, 7:95–109.
Cootes, T., Hill, A., Taylor, C., and Haslam, J. (1993). The use of active shape models forlocating structures in medical images. In Information Processing in Medical Imaging,pages 33–47, Berlin, Germany.
Cootes, T., Taylor, C., Cooper, D., and Graham, J. (1995). Active shape models – theirtraining and application. Computer Vision and Image Understanding, 61(1):38–59.
Cootes, T. F., Edawrds, G. J., and Taylor, C. J. (2001). Active appearance models. IEEETransactions on Pattern Analysis and Machine Intelligence, 23(6):681–685.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley Seriesin Telecommunications, John Wiley & Sons, New York, NY.
Davatzikos, C., Vaillant, M., Resnick, S. M., Prince, J. L., Letovsky, S., and Nick, B. R.(1996). A computerized approach for morphological analysis of the corpus callosum.Journal of Computer Assisted Tomography, 20(1):88–97.
Dimitrov, P., Phillips, C., and Siddiqi, L. (2000). Robust and efficient skeletal graphs.In Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),page 1417, Hilton Head Island, South Carolina.
Dryden, I. and Mardia, K. (1998). Statistical shape analysis. John Wiley & Sons.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification. John Wileyand Sons Inc.
Duta, N. (2000). Predicting dyslexia based on the shape of the corpus callosum in mrimages. In Proceedings of International Conference on Pattern Recognition (ICPR),Barcelona, Spain.
Duta, N. and Sonka, M. (1998). Segmentation and interpretation of mr brain images: animproved active shape model. IEEE Transactions on Medical Imaging, 17(7):1049–1062.
201
Fenster, S. D. and Kender, J. R. (2001). Msectored snakes: evaluating learned-energysegmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9).
Freedman, D., J., R. R., Zhang, T., Jeong, Y., and Chen, G. T. (2004). Model-based multi-object segmentation via distribution matching. In Third IEEE Workshop on Articulatedand Nonrigid Motion, Baltimore, US.
Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. Journal ofJapanese Society for Artificial Intelligence, 14(5):771–780.
Galand, P., Grimson, W. E. L., and Kikinis, R. (1999). Statistical shape analysis us-ing fixed topology skeletons: Corpus callosum study. In Proceedings of InformationProcessing in Medical Imaging (IPMI), pages 382–387.
Galand, P., Grimson, W. E. L., Shenton, M. E., and Kikinis, R. (2000). Small sample sizelearning for shape analysis. In Proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 72–82.
Greer, D. R., Fung, I., and Shapiro, J. H. (1997). Maximum-likelihood multiresolutionlaser radar range imaging. IEEE Transactions on Image Processing, 6(1):36–46.
Grigorescu, C. and Petkov, N. (2003). Distance sets for shape filters and shape recognition.IEEE Transactions on Image Processing, 12(10):1274–1286.
Herbulot, A., Jehan-Besson, S., Barlaud, M., and G., A. (2004). Shape gradient for multi-modal image segmentation using joint intensity distributions. In Proceedings of 5th In-ternational Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS),Lisbon, Portugal.
Ho, G. and Shi, P. (2004). Domain partitioning level set surface for topology con-strained multi-object segmentation. In Proceedings of IEEE International Symposiumon Biomedical Imaging, Washington DC, US.
Ip, C. Y., Lapadat, D., Sieger, L., and Regli, W. (2002). Using shape distributions tocompare solid models. In Proceedings of the Seventh ACM Symposium on Solid Modelingand Applications, pages 273–280, Saarbrucken, Germany. ACM Press.
Jaggi, S., Karl, W. C., Mallat, S. G., and Willsky, A. S. (1999). Silhouette recognitionusing high-resolution pursuit. Pattern Recognition, 32:753–771.
Jehan-Besson, S. (2003). Modeles de contours actif bases regions pour la segmantationd’images et de videos. PhD thesis, Universite de Nice - Sophia Antipolis.
Kapur, T. (1999). Model based three dimensional medical image segmentation. PhD thesis,MIT, Boston, US.
Kim, J., Fisher, J. W., Yezzi, A., Cetin, M., and Willsky, A. S. (2002a). Nonparametricmethods for image segmentation using information theory curve evolution. In Proceed-ings of International Conference on Image Processing (ICIP), Rochester, USA.
202
Kim, J., Tsai, A., Cetin, M., and Willsky, A. (2002b). A curve evolution-based variationalapproach to simultaneous image restoration and segmentation. Proceeding of IEEEInternational Conference on Image Processing (ICIP).
Klassen, E., Srivastava, A., and Mio, W. (2004). Analysis of planar shapes using geodesicpaths on shape spaces. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 26(3):372–383.
Koslow, S. and Huerta, M. (1997). Neuroinformatics: An Overview of the Human BrainProject. Lawrence Erlbaum Associates, Mahwah, New Jersey.
Kullback, S. (1968). Information theory and statistics. Dover, New York, NY.
Kunttu, I., Lepisto, L., Rauhamaa, J., and Visa, A. (2004). Multiscale fourier descriptor forshape-based image retrieval. In 17th International Conference on Pattern Recognition(ICPR), pages 765–768, Cambridge, UK.
Le, H. (1991). Locating frechet means with application to shape spaces. Advances inApplied Probability, 33(2):324–338.
Lemke, P., Skiena, S. S., and Smith, W. D. (2002). Reconstructing sets from interpointdistances. Technical Report TR 2002-37, DIMACS, NJ.
Leventon, M., Grimson, W. E. L., and Faugeras, O. (2000a). Statistical shape influ-ence in geodesic active contours. IEEE Conference on Computer Vision and PatternRecognition (CVPR).
Leventon, M. E., Grimson, W. E. L., Faugeras, O., and III, W. M. W. (2000b). Level setbased segmentation with intensity and curvature priors. IEEE Workshop on Mathemat-ical Methods in Biomedical Image Analysis Proceedings (MMBIA), pages 4–11.
Li, D. and Simske, S. (2002). Shape retrieval based on distance ratio distribution. Tech-nical Report HPL-2002-251, Intelligence Enterprise Technologies Laboratory, Palo Alto,CA.
Ling, H. and Okada, K. (2006). Emd-l1: An efficient and robust algorithm for compairinghostogram-based descriptors. In European Conference on Computer Vision (accepted).
Litvin, A. and Karl, W. C. (2002). Image segmentation based on prior probabilistic shapemodels. In Proceedings of IEEE International Conference on Acoustic Speech and SignalProcessing (ICASSP), Orlando, Florida.
Litvin, A. and Karl, W. C. (2003). Levelset based segmentation using data driven shapeprior on feature histograms. In Proceedings of 2003 IEEE Workshop on Statistical SignalProcessing, Minneapolis, Minnesota.
Litvin, A. and Karl, W. C. (2004a). Using shape distributions as priors in a curve evolutionframework. In Proceeding of IS&T/SPIE 16th Annual Symposium Electronic ImagingScience and Technology, San Jose, California.
203
Litvin, A. and Karl, W. C. (2004b). Using shape distributions as priors in a curve evolutionframework. In Proceedings of 2004 IEEE International Conference on Acoustic Speechand Signal Processing (ICASSP), Montreal, Canada.
Litvin, A. and Karl, W. C. (2004c). Using shape distributions as priors in a curve evolutionframework. Technical Report ECE-2004-03, Boston University, Boston, USA.
Litvin, A. and Karl, W. C. (2005a). Coupled shape distribution-based segmentation ofmultiple objects. In Proceeding of Information Processing in Medical Imaging (IPMI2005), Glenwood Springs, Colorado.
Litvin, A. and Karl, W. C. (2005b). Coupled shape distribution-based segmentation ofmultiple objects. Technical Report ECE-2005-01, Boston University, Boston, USA.
Litvin, A., Karl, W. C., and Shah, J. (2006). Shape and appearance modeling withfeature distributions for image segmentation. In International Symposium on BiomedicalImaging (ISBI), Arlington, Virginia.
Litvin, A., Konrad, J., and Karl, W. C. (2003). Probabilistic video stabilization usingkalman filtering and mosaicing. In Proceeding of IS&T/SPIE 15th Annual SymposiumElectronic Imaging Science and Technology, Santa Clara, California.
Litvin, A., Oliver, W. L., and Amory-Mazaudier, C. (2000a). Hot o and nighttime iono-spheric temperatures. Geophysical Research Letters, 27(17):2821–2824.
Litvin, A., Oliver, W. L., Picone, J. M., and Buonsanto, M. J. (2000b). The upperatmosphere during june 5-11, 1991. Journal of Geophysical Research, 105(A6):12789–12796.
Lynch, J., Zaim, S., Zhao, J., Stork, A., Peterfy, C. G., and Genant, H. (2000). Cartilagesegmentation of 3d mri scans of the osteoarthritis knee combining user knowledge andactive contours. Proceedings of SPIE Conference on Medical Imaging, 3979:925–935.
Martin, J., Pentland, A., and Sclaroff, S. (1998). Characterization of neuropathologicalshape deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(2).
Matsakis, P., Keller, J. M., Sjahputera, O., and Marjamaa, J. (2004). The use of force his-tograms for affine-invariant relative position description. IEEE Transactions on PatternAnalysis and Machine Intelligence, 26(1):1–18.
Maurel, P. and Sapiro, G. (2003). Dynamic shapes average. Proceedings of 2nd IEEEworkshop on variational, geometric and level set methods in computer vision.
McInerney, T. and Terzopoulos, D. (1995). Topologically adaptable snakes. In Interna-tional Conference on Computer Vision, Boston, Massachusetts.
Mezghani, N., Mitiche, A., and Cheriet, M. (2004). On-line character recognition usinghistograms of features and an associative memory. Proceedings of IEEE InternationalConference on Acoustic Speech and Signal Processing (ICASSP).
204
Minguez, J. and Montano, L. (2005). Abstracting vehicle shape and kinematics constraintsfrom obstacle avoidance methods. Autonomous Robots, in press.
Mio, W., Badlyans, D., and Xiuwen, L. (2005). A computational approach to fisher infor-mation geometry with applications to image analysis. In Proceedings of Internationalworkshop on Energy Minimization Methods in Computer Vision and Pattern Recogni-tion, St. Augustine, Florida.
Mumford, D. and Shah, J. (1985). Boundary detection by minimizing functionals. InProceedings of Computer Vision and Pattern Recognition, pages 22–26, San Francisco.
Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., Yanker, P.,and Faloutsos, C. (1993). Querying images by content, using color, texture, and shape.In SPIE Conference on Storage and Retrieval for Image and Video Databases, volume1908, pages 173–187.
Osada, R., Funkhouser, T., Chazelle, B., and Dobkin, D. (2001). Matching 3d modelswith shape distributions. International Conference on Shape Modeling and Applications,pages 154–166.
Osada, R., Funkhouser, T., Chazelle, B., and Dobkin, D. (2002). Shape distributions.ACM Transactions on Graphics, 21(4):807–832.
Page, D., Koschan, A., Sukumar, S., Roui-Abidi, B., and Abidi, M. (2003). Shape analysisalgorithm based on information theory. In Proceedings of the International Conferenceon Image Processing, Barcelona, Spain.
Paragios, N. and Deriche, R. (2000). Coupled geodesic active regions for image segmen-tation: A level set approach. In European Conference in Computer Vision, Dublin,Ireland.
Pizer, S. M., Fletcher, P. T., Fridman, Y., Fritsch, D. S., Gash, A. G., Glotzer, J. M.,Joshi, S., Thall, A., Tracton, G., Yushkevich, P., and Chaney, E. L. (2003). Deformablem-reps for 3d medical image segmentation. International Journal of Computer Vision -Special UNC-MIDAG issue, 55(2):85–106.
Pizer, S. M., Fritsch, D. S., Yushkevich, P. A., Johnson, V. E., and Chaney, E. L. (1996).Segmentation, registration, and measurement of shape variation via image object shape.IEEE Transactions on Medical Imaging, 18(10):851–865.
Poonawala, A., P. M. R. G. (2003). On the uncertainty analysis of shape reconstructionfrom areas of silhouettes. In Fifth International Conference on Advances In PatternRecognition, Calcutta, India.
Puzicha, J., Hofmann, T., and Buhmann, J. M. (1999). Histogram clustering for unsuper-vised segmentation and image retrieval. Pattern Recognition Letters, 20:899–909.
Rote, G. (1991). Computing the minimum hausdorff distance between two point sets on aline under translation. Information Processing Letters, 38:123–127.
205
Rousson, M. and Paragios, N. (2002). Shape priors for level-set representations. EuropeanConference on Computer Vision (ICCV).
Rubner, Y., Tomasi, C., and Guibas, L. J. (1999). The earth mover’s distance as ametric for image retrieval. Technical Report STAN-CS-TN-98-86, Computer ScienceDepartment, Stanford University, Stanford, CA.
Rudin, W. (1966). Real and Complex Analysis. McGraw-Hill.
Ruiz-Correa, S., Shapiro, L. G., Berson, G., Cunningham, M. L., and Sze, R. W. (2006).Symbolic signatures for deformable shapes. IEEE Transactions on Pattern Analysis andMachine Intelligence, 28(1)(1):75–90.
Sapiro, G. and Caselles, V. (1995). Histogram modification via partial differential equa-tions. International Conference on Image Processing Proceedings (ICIP).
Sapiro, G. and Caselles, V. (1997). Histogram modification via partial differential equa-tions. Journal of Differential Equations, 135:238– 268.
Sclaroff, S. and Liu, L. (2001). Deformable shape detection and description via model-based region grouping. IEEE Transactions on Pattern Analysis and Machine Intelli-gence (PAMI), 23(5):475–489.
Sethian, J. (1999). Level set methods and fast marching methods. Cambridge UniversityPress.
Shah, J. (1996). A common framework for curve evolution, segmentation and anisotropicdiffusion. In Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), page 136.
Shah, J. (2002). Elastica with hinges. Journal of Visual Communication and ImageRepresentation, 13:36.
Shah, J. (2005). Gray skeletons and segmentation of shapes. Computer Vision and ImageUnderstanding, 99(1):96–109.
Shen, H., Shi, Y., and Peng, Z. (2005). Applying prior knowledge in the segmentation of3d complex anatomical structures. In International Conference on Computer Vision,Beijing, China.
Shen, H. and Wong, A. (1983). Generalized texture representation and metric. ComputerVision, Graphics, and Image Processing, 23(2):187–206.
Shi, Y. (2005). Object-Based Dynamic Imaging with Level Set Methods. PhD thesis,Boston University, Boston, US.
Shi, Y. and Karl, W. (2005). A fast level set method without solving pdes. In InternationalConference on Acoustic, Signal and Speech Processing (ICASSP), Philadelphia, PA.
206
Siddiqi, K., Lauziere, Y., Tannenbaum, A., and Zucker, S. (1997). Area and lengthminimizing flows for shape segmentation. CVC TR-97-001/CS TR-1146.
Skiena, S. S. (1997). The Algorithm Design Manual. Springer-Verlag, New York.
Staib, L. and Duncan, J. (1992). Boundary finding with parametrically deformable models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1061–1075.
Staib, L. and Duncan, J. (1996). Model-based deformable surface finding for medicalimages. IEEE Transactions on Medical Imaging, 15(5):720–731.
Sukmarg, O. and Rao, K. (2000). B-spline curve representation of segmented object inmpeg compressed domain. In International Symposium on Wireless Personal Multime-dia Communications, Bangkok, Thailand.
Sumengen, B., Manjunath, B., and Kenney, C. (2002). Image segmentation using curveevolution and flow fields. In Proceedings of IEEE International Conference on ImageProcessing (ICIP), Rochester, NY.
Sundar, H., Silver, D., Gagvani, N., and Dickinson, S. (2003). Skeleton based shape match-ing and retrieval. In Proceedings of the International Conference on Shape Modeling andApplication (SMI), page 130.
Tagare, H. D. (1997). Deformable 2-d template matching using orthogonal curves. IEEETransactions on Medical Imaging, 16(1).
Tagare, H. D. (1999). Shape-based nonrigid correspondence with application to heartmotion analysis. IEEE Transactions on Medical Imaging, 18(7):570–579.
Tari, S. and Shah, J. (2000). Nested local symmetry sets. Computer Vision and ImageUnderstanding, 79(2):267–280.
Tasdizen, T. and Whitaker, R. (2004). Higher-order nonlinear priors for surface reconstruc-tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7):878–890.
Thayananthan, A., Stenger, B., Torr, P. H. S., and Cipolla, R. (2003). Shape contextand chamfer matching in cluttered scenes. In Proceedings of Conference on ComputerVision and Pattern Recognition, volume I, pages 127–133, Madison, USA.
Tsai, A., Wells, W., Tempany, C., E., G., and Willsky, A. (2003). Coupled multi-shapemodel and mutual information for medical image segmentation. In Information Pro-cessing in Medical Imaging, pages 185–197, Ambleside, UK.
Tsai, A., Wells, W., Tempany, C., Grimson, E., and Willsky, A. (2004). Mutual infor-mation in coupled multi-shape model for medical image segmentation. Medical ImageAnalysis, 8(4):429–445.
Tsai, A., Yezzi, A., Wells, W., Tempany, C., Tucker, D., Fan, A., Grimson, W., andWillsky, A. (2001a). Model-based curve evolution technique for image segmentation.IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
207
Tsai, A., Yezzi, A., and Willsky, A. (2001b). Curve evolution implementation of the themumford-shah functional for image segmentation, denoising, interpolation, and magni-fication. IEEE Transactions on Image Processing, 10(8):1169–1186.
Unal, G., Krim, H., and Yezzi, A. (2002). Stochastic differential equations and geometricflows. IEEE Transactions on Image Processing, 11(12):1405–1417.
Wang, Y. and Staib, L. (2000). Boundary finding with prior shape and smoothness models.IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(7):738–743.
Yang, J., Staib, L. H., and Duncan, J. S. (2003). Neighbor-constrained segmentation with3d deformable models. In Information Processing in Medical Imaging, pages 198–209,Ambleside, UK.
Yezzi, A., Tsai, A., and Willsky, A. (1999). A statistical approach to curve evolution forimage segmentation. Technical Report, LIDS, Massachusetts Institute of Technology.
Yushkevich, P. A., Joshi, S. C., Pizer, S. M., Csernansky, J. G., and Wang, L. E. (2003).Feature selection for shape-based classification of biological objects. In Proceedings ofInformation Processing in Medical Imaging (IPMI), Ambleside, UK.
Zhang, H. and Malik, J. (2003). Learning a discriminative classifier using shape contextdistances. In IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR), volume 1, page 242.
Zhu, S. (1999). Embedding gestault laws in markov random fields. IEEE Transactions onPattern Analysis and Machine Intelligence, 21(11).
Zhu, S. and Yuille, A. (1996). Forms: A flexible object recognition and modeling system.International Journal of Computer Vision, 20(3):187–212.
CURRICULUM VITAE
Andrey Litvin
e-mail: [email protected]
EDUCATION
Boston University, Boston, Massachusetts 2000-2006PhD in E.E., May 2006 GPA 3.9/4.0Advisor: Professor W. Clem Karl
Boston University, Boston, Massachusetts 1999-2000MS in E.E., May 2000 GPA 3.96/4.0Advisor: Professor William Oliver
Saint-Petersburg State University, Saint-Petersburg, Russia 1992-1996BS in Physics, May 1996
EMPLOYMENT
Siemens Corporate Research, Princeton, New Jersey May 2005 - November 2005Temporary Technical Employee
Boston University, Boston, Massachusetts 1999-2006Research Assistant
National Polytechnic Institute of Grenoble (INPG), France 1996-1997Intern
PUBLICATIONS
1. A. Litvin, W.C. Karl, J. Shah, Shape and appearance modeling with feature dis-tributions for image segmentation, International Symposium on Biomedical Imaging(ISBI), Arlington, Virginia, 2006
2. A. Litvin and W.C. Karl, Coupled shape distribution based segmentation of multipleobjects, Information Processing in Medical Imaging (IPMI05), 2005
3. A. Litvin and W.C. Karl, Using shape distributions as priors in a curve evolutionframework, Proceedings of 2004 IEEE International Conference on Acoustic Speechand Signal Processing (ICASSP), 2004
209
4. A. Litvin and W.C. Karl, Levelset based segmentation using data driven shape prioron feature histograms, Proceedings of 2003 IEEE Workshop on Statistical SignalProcessing
5. A. Litvin, J. Konrad and W.C. Karl, Probabilistic video stabilization using Kalmanfiltering and mosaicing, Proceeding of IS&T/SPIE 15th Annual Symposium - Elec-tronic Imaging Science and Technology, Santa Clara, California, 2003
6. A. Litvin and W.C. Karl , Image segmentation based on prior probabilistic shapemodels, Proceedings of 2002 IEEE International Conference on Acoustic Speech andSignal Processing (ICASSP), 2002
7. D. Alcayde P.-L. Blelly, W. Kofman, A. Litvin, W.L. Oliver, Effects of hot oxygen inthe ionosphere: TRANSCAR simulations, Annales Geophysicae, Vol. 19, pp. 257-261, 2001
8. A. Litvin, W. L. Oliver, J. M. Picone and M. J. Buonsanto, The upper atmosphereduring June 5-11, 1991, Journal of Geophisical research, vol. 105, No. A6, pp.12789-12796, 2000
9. A. Litvin, W. L. Oliver and C. Amory-Mazaudier, Hot O and nighttime ionospherictemperatures, Geophysical Research Letters, vol. 27, No. 17, pp. 2821-2824, 2000
10. A. Litvin, W. Kofman and B. Cabrit, Ion composition measurements and modelingat altitudes from 140 to 350 km using EISCAT measurements, Annales Geophysicae,vol. 16, pp. 1159-1168, 1998