+ All Categories
Home > Documents > BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice...

BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice...

Date post: 21-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
231
STATISTICAL SHAPE AND APPEARANCE MODELS FOR SEGMENTATION AND CLASSIFICATION ANDREY LITVIN Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy BOSTON UNIVERSITY
Transcript
Page 1: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

'

&

$

%

STATISTICAL SHAPE AND APPEARANCE

MODELS FOR SEGMENTATION AND

CLASSIFICATION

ANDREY LITVIN

Dissertation submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

BOSTON

UNIVERSITY

Page 2: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

BOSTON UNIVERSITY

COLLEGE OF ENGINEERING

Dissertation

STATISTICAL SHAPE AND APPEARANCE MODELS FOR

SEGMENTATION AND CLASSIFICATION

by

ANDREY LITVIN

B.S., Saint-Petersburg State University (Russia), 1996M.S., Boston University, 2000

Submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

2006

Page 3: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Approved by

First Reader

William C. Karl, Ph.D.Professor of Electrical and Computer Engineering and Professor ofBiomedical Engineering

Second Reader

Janusz Konrad, Ph.D.Associate Professor of Electrical and Computer Engineering

Third Reader

Stan Sclaroff, Ph.D.Associate Professor of Computer Science

Fourth Reader

Jayant Shah, Ph.D.Professor of Mathematics

Page 4: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Acknowledgments

First, I would like to express my profound gratitude to my advisor Professor Clem Karl,

whose expertise, encouragement, patience, and help in organizing and directing my own

ideas had an unmeasurable importance at all stages of my work. I am particularly thankful

for my advisor’s support in publishing our results, presenting them at conferences, and

writing reports. His mentoring helped me focus my work, develop key ideas and ultimately

made this dissertation possible. I am thankful to other committee members who provided

valuable advice at different stages of my PhD work. I would like to thank Professor Stan

Sclaroff for important suggestions in my prospectus preparation as well as for valuable

discussions. I also would like to thank Professor Janusz Konrad for support throughout

my PhD study. Professor Konrad’s advice regarding my research was very helpful at

uncountable occasions. I am particularly thankful to Professor Konrad for advising me

in a separate research project that did not become a part of this dissertation. I would

like to express particular gratitude to professor Jayant Shah for many discussion we had

at early stages of my PhD study. His expertise motivated me in several directions that I

have taken, especially in shape classification focus area of this thesis. I am also thankful

to Professor David Castanon for valuable advice throughout my PhD study. I am thankful

to all Boston University faculty members for support and high quality teaching. I am

particularly thankful to all ISS group faculty memebers, whose classes provided me with

important foundations and interesting ideas for my research. I recognize that this work

would not be possible without financial support provided to me by my advisor, Professor

Clem Karl and the department of Electrical and Computer Engineering.

I would like to thank other collegues from our lab who were part of my life throughout

my time at Boston University and made my experience enjoyable in many aspects. Yong-

gang Shi has been a great motivation for my own research. Our many discussions helped

me to formulate my ideas. Our collaboration on travel arrangements made our conference

trips productive and enjoyable. I would like to thank Robert Weisenseel for encourage-

iii

Page 5: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

ment and help in taking the right strategy in my research, as well as for great help in

maintaining computational resources in our lab. I appreciate John Kaufhold’s suggestions

in medical imaging research trends and industry opportunities. Zhengrong Ying has been

my teammate in class projects and a resourceful person who could offer advice in all as-

pects of life. I am happy to have many other friends in our ISS lab: Shuchin Aeron, Julia

Pavlovich, Karen Jenkins, George Atia. I thankfully remember Mirko Ristivojevic’s and

Nikola Bozinovic’s companionship during our research visit to I3S lab at Sophia-Antipolis,

France in June 2004. Without them, that experience would not be productive. I would

like to thank Professor Michel Barlaud from I3S lab for his assistance during our visit. I

would like to thank many other students from our lab who helped to enrich my experience,

among them Zhuangli Liang, Serdar Ince, Mujdat Cetin, Lingmin Meng and others.

I would like to thank Shen Hong and Shuping Qing at Siemens Corporate Research,

who provided me with guidance and shared valuable experience during my internship. I

am grateful to my other Siemens colleagues and friends, including Mikael Rousson, Jian

Li, Jie Shao, and Vassilis Athitsos for valuable ideas and support.

I would like to thank Dr. David Kennedy, at Center for Morphometric Analysis of

Harvard Medical School and Massachussetts General Hospital, for providing the brain

MRI data used in this dissertation. The knee MRI data used in this work were provided

by Paul Debevec at ICT Graphics lab.

I would also like to thank my family for the support they provided me through my

entire life and, in particular, I must acknowledge my wife Dan Xie, without whose love,

support, and help, I would not have finished this thesis.

iv

Page 6: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

STATISTICAL SHAPE AND APPEARANCE MODELS FOR

SEGMENTATION AND CLASSIFICATION

(Order No. )

ANDREY LITVIN

Boston University, College of Engineering, 2006

Major Professor: William C. Karl, Ph.D.Professor of Electrical and Computer Engineering,Professor of Biomedical Engineering

ABSTRACT

In this dissertation we develop and apply models of shape and models of image intensities

(appearance models) in object-based image processing tasks. We make contributions in

three areas of interest: constructing novel flexible models of shape and of image intensities,

using these models to extract object boundaries from images, and analyzing differences

between groups of shapes from given, extracted object boundaries.

In the shape and appearance model construction and application areas of focus we

are motivated by the task of extracting the object boundaries from images by an evolving

closed curve technique named curve-evolution. We develop and apply novel models of shape

and models of appearance for incorporation in such curve-evolution-based object boundary

extraction. In our first major contribution, we start with the statistical shape model based

on maximum entropy principle and designed to capture perceptual shape similarity of

training shape samples. In sampling experiments, this statistical shape model has been

shown to generate new shape samples with prominent visual features of the original training

shapes used to construct the model. For the first time, we develop methods to incorporate

this maximum entropy model into object boundary extraction tasks. We show that indeed

incorporation of such a prior can have a dramatic effect in object boundary extraction

v

Page 7: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

problems, favoring the solution similar to the training shapes.

In our next major contribution, we develop a new model of shape based on the notion of

shape distributions. Shape distributions have been introduced as cumulative distribution

functions of parameters continuously defined on contours. Shape distributions have been

used before for shape classification tasks, but our work is their first use for object bound-

ary extraction. The resulting shape models show an excellent ability to preserve prominent

visual object structures during boundary extraction in challenging segmentation problems

involving high noise, object obstruction, and weak or even missing intensity edges. Fur-

ther, these models exhibit robustness to limited training data. These models eliminate the

need for shape alignment at the model construction and estimationsteps, often a difficult

and critical task. We further extend these models to capture information on the relative

configurations of multiple contours, which helps to extract multiple boundaries more effi-

ciently. We also extend the shape distribution concept to model image intensities. This

allows us to achieve superior results on images where the desired object boundaries do not

coincide with visible edges and where image regions cannot be identified based on their

homogeneity.

In another major contribution, we focus on the identification and analysis of the dif-

ferences in extracted Corpus Callosum shapes of the brain. Historically, such analysis has

been based solely on area or volume measures. In contrast, we use medial axis based rep-

resentations of the shape to capture a far richer understanding of the underlying shape.

We develop statistically-based feature ranking metrics in order to reduce the dimension

of the original feature space, construct shape classifiers, and visualize inter-class shape

differences.

vi

Page 8: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Preface

The work presented in this dissertation was carried out during my time as a graduate

student at Boston University. This period was from the Spring semester of 1999 through

the Spring semester of 2006. The publications directly resulting from this research and on

which I am listed as the first author, can be found in the references and are cited below.

(Litvin and Karl, 2002)

(Litvin and Karl, 2003)

(Litvin and Karl, 2004a)

(Litvin and Karl, 2004b)

(Litvin and Karl, 2005a)

(Litvin et al., 2006)

Additionally, several papers on unrelated research were published during my stay at

Boston University on which I am listed as an author.

(Litvin et al., 2000a)

(Litvin et al., 2000b)

(Alcayde et al., 2001)

(Litvin et al., 2003)

vii

Page 9: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Major contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 8

2.1 Object based image processing . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Curve evolution framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Energy minimization based curve evolution . . . . . . . . . . . . . . 10

2.2.2 Probabilistic formulation for curve evolution . . . . . . . . . . . . . 12

2.3 Shape parameterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Parametric non-curve object representations . . . . . . . . . . . . . . 14

2.3.2 Explicit curve-based parameterization . . . . . . . . . . . . . . . . . 15

2.3.3 Implicit curve parameterization by level sets . . . . . . . . . . . . . . 15

2.4 Shape modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Constructing distance measures on shapes . . . . . . . . . . . . . . . 18

2.4.2 Constructing shape variability model . . . . . . . . . . . . . . . . . . 19

2.5 Shape distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6 Shape model prior work summary and motivation . . . . . . . . . . . . . . . 29

2.7 Appearance models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7.1 Role of distribution in appearance modeling and histogram equalization 33

2.8 Distribution difference measures . . . . . . . . . . . . . . . . . . . . . . . . 34

2.9 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

viii

Page 10: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

3 Maximum entropy shape model as a curve evolution prior 42

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Constructing a pdf on the space of shapes . . . . . . . . . . . . . . . . . . . 43

3.3 Application to curve-evolution based segmentation . . . . . . . . . . . . . . 49

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Shape-distribution-based prior shape model 56

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2 Our Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2.1 A prior energy based on shape distributions . . . . . . . . . . . . . . 59

4.3 Minimizing flow computation . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3.1 Exact solution using variational framework . . . . . . . . . . . . . . 62

4.3.2 Numerical solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.4 Feature function choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Intensity histogram equalization connection . . . . . . . . . . . . . . . . . . 70

4.6 Extension to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.6.1 Formulation 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.6.2 Surface flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.6.3 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.7 Extensions and additional issues . . . . . . . . . . . . . . . . . . . . . . . . 75

4.7.1 Weighted distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.7.2 Geodesic distance between distributions . . . . . . . . . . . . . . . . 75

4.7.3 Shape distribution uniqueness issues . . . . . . . . . . . . . . . . . . 77

4.7.4 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . 78

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5 Applications of shape distribution based shape priors 80

5.1 Shape focusing by shape term guided evolution . . . . . . . . . . . . . . . . 80

5.2 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

ix

Page 11: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

5.3 Image segmentation with occlusion . . . . . . . . . . . . . . . . . . . . . . . 98

5.4 Average shape computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6 Joint segmentation of multiple objects using shape distribution based

shape prior 113

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.2.1 Flow computation for inter-object distance feature function . . . . . 118

6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7 Shape and appearance modeling with feature distributions for image seg-

mentation 125

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2 Shape distribution principles . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.3 Extension to combined intensity and shape priors . . . . . . . . . . . . . . . 127

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.5 Multivariate distributions extension . . . . . . . . . . . . . . . . . . . . . . 138

7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8 Shape-Based Classification and Morphological Analysis using Medial Axes

and Feature Selection 141

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.2 Data and skeleton-based feature extraction . . . . . . . . . . . . . . . . . . 143

8.2.1 Fixed topology skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.2.2 Nested local symmetry sets method . . . . . . . . . . . . . . . . . . 146

8.2.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.3 Inter-class shape differences. Detection and visualization. . . . . . . . . . . 151

8.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

x

Page 12: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

9 Conclusions and future research 165

9.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

A Variational solution for the curve flow minimizing shape distribution

based prior energy 169

A.1 Inter-point distance function . . . . . . . . . . . . . . . . . . . . . . . . . . 170

A.2 Boundary curvature feature function . . . . . . . . . . . . . . . . . . . . . . 174

A.3 Multiscale curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

A.3.1 Computation of feature function . . . . . . . . . . . . . . . . . . . . 176

A.3.2 Curve flow computation . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.4 Feature classes with weighting function . . . . . . . . . . . . . . . . . . . . . 184

A.4.1 Inter-point distance feature function . . . . . . . . . . . . . . . . . . 185

A.5 Relative inter-object distances. . . . . . . . . . . . . . . . . . . . . . . . . . 190

B Curve flow for intensity based feature function 193

C Multidimensional CDF based shape prior 196

References 199

Curriculum Vitae 208

xi

Page 13: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

List of Tables

5.1 Experiment 1: Segmentation errors computed using different error measure.

First error measure is computed as a symmetric area difference (Hamming

distance in eq. 5.13) between final segmented region and true shape. The

second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 91

5.2 Experiment 2: Segmentation errors computed using different error measure.

First error measure is computed as a symmetric area difference (Hamming

distance in eq. 5.13) between final segmented region and true shape. The

second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 93

5.3 Experiment 3: Segmentation errors computed using different error measure.

First error measure is computed as a symmetric area difference (Hamming

distance in eq. 5.13) between final segmented region and true shape. The

second measure is given by our prior energy Eshape in eq. 4.2. . . . . . . . . 95

6.1 Symmetric difference (area based) segmentation error. For each object the

error measure is computed as a symmetric difference between final segmented

region and true segmented region. The values in the table are computed as

a sum of error measures for individual objects. . . . . . . . . . . . . . . . . 123

xii

Page 14: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

List of Figures

2·1 Level set curve embedding. Curve Γ is given by a set of points Γ : Φ = 0,

Φ is called level set function. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2·2 An example of constructing a shape distribution for a curve (left) based on

curvature κ(s) measured along the boundary (second graph). Third and

fourth graphs show the sketches of pdf(κ) and cumulative distribution func-

tion H(κ) of the samples of curvature respectively. Note the invariance of

H(κ) with respect to the choice of the initial point of arc-length parameter-

ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2·3 L1 difference measure computed on PDFs and CDFs. Difference value is

given by shaded area. Panels (A) and (C): modes of p1 and p2 are very

close; L1 difference on CDFs produces small value. Panels (B) and (D):

modes of p1 and p2 are further from each other. L1 difference on CDFs gives

larger value, while L1 difference on PDFs does not change. . . . . . . . . . . 36

3·1 MCMC move proposal in (Zhu, 1999). Point i in configuration A is moved

into one of the 8 positions under the constraints on the length of linelets

connecting node i with its neighbors. . . . . . . . . . . . . . . . . . . . . . . 45

3·2 Our new scheme of proposed MCMC move (3 nodes, including the start-

ing node are shown). Black circles represent the initial node configuration.

White circles represent 2 candidate positions for the starting node and the

new positions of the neighbors after a starting node was moved into the new

(bottom) position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

xiii

Page 15: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

3·3 Our scheme of open curve MCMC move proposal. Black circles represent

the initial configuration. White circles represent the final configuration after

the curve is bent at the trial node. . . . . . . . . . . . . . . . . . . . . . . . 48

3·4 Ground truth image with noise added constructed on the shape from dataset

1 (panel A) and dataset 2 (panel B). Black solid line: true shape; White

dashed line: initial contour . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3·5 Data set 1. Result of segmentation without using prior shape model (left)

and the best result using the model (right). True boundary is shown by

straight line and circles show reconstructed boundary. . . . . . . . . . . . . 52

3·6 Same as figure 3·5 for data set 2. . . . . . . . . . . . . . . . . . . . . . . . . 52

3·7 Segmentation obtained using penalty function in eq. 3.13. . . . . . . . . . . 54

4·1 An example of constructing a shape distribution for a curve (left) based

on curvature κ(s) measured along the boundary (second graph). Third

and fourth graphs show the sketches of pdf(κ) and cumulative distribution

function H(κ) of curvature respectively. Note the invariance of H(κ) with

respect to the choice of the initial point of arc-length parameterization. . . 57

4·2 Illustration of the descent on the manifold procedure to find the the curve

flow β(s)+, eq. 4.14. The surface represent the space of all realizable feature

function flows S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4·3 Feature function #1 values computed on the discretized curve. We show

interpoint distances d13..d15. . . . . . . . . . . . . . . . . . . . . . . . . . 66

4·4 Left: Graphic interpretation of the division of the space Ω into four sub-

spaces ΩS : Ω1, Ω2, Ω3, and Ω4. Corresponding intervals are [0, 0.125],

[0.125, 0.25], [0.25, 0.375], and [0.375,0.5]. Right: a curve with 3 pairs of

points, members of Ω1, Ω1, and Ω4 respectively. . . . . . . . . . . . . . . . . 68

4·5 Feature function #2 in discrete case: interpoint angles α−1,1,2..α−n,1,n are

shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

xiv

Page 16: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

4·6 Surface triangulation by the level set function zero crossings extraction. . . 75

5·1 Evolution of an initial contour under the sole action of the prior flow: initial

(dot-dashed), target (dashed), and resulting (solid) contours. (A) - prior

constructed on the inter-point distances (#1); (B) - prior constructed on

multi-scale curvatures (#2); (C) - Both feature classes #1 and #2 are included. 82

5·2 Target plane shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5·3 Evolution of the contour under the action of prior flow: initial (dot-dashed),

and final (solid) contours. Target contour is shown in Figure 5·2. . . . . . . 83

5·4 Evolution of the contour under using multi-scale prior defined on a group of

level sets: initial (dot-dashed), and resulting (solid) contours. . . . . . . . . 87

5·5 Segmentation results. A: Our method; B: PCA; C: Curve length penalty

prior; D: Method in (Leventon et al., 2000b); White - final result; Black -

true shape boundary; Dashed line - initial curve. Symmetric area distance

(in pixels) between true boundary and final result is shown on the top of

each panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5·6 Prior shapes used to construct the prior in our experiment. A: triangular

shapes (experiment 1); B: polygonal shapes (experiment 2). . . . . . . . . . 90

5·7 Segmentation results, polygonal prior. A: Our method; B: PCA; C: Curve

length penalty prior; White - final result; Black - true shape boundary;

Dashed line - initial curve. Symmetric area distance (in pixels) between

true boundary and final result is shown on the top of each panel. . . . . . . 92

5·8 Experiment 3: (A) - training shapes; (B) - noise free image . . . . . . . . . 94

5·9 Experiment 3 segmentation results. A: Our method; B: PCA; C: Curve

length penalty prior; White - final result; Black - true shape boundary;

Dashed line - initial curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

xv

Page 17: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

5·10 Knee cartilage segmentation results. A: Initial (dashed line) and true (solid

line) contours; B: Our method (solid line); C: Leventon’s (solid line) D:

Curve length penalty prior (solid line). . . . . . . . . . . . . . . . . . . . . . 96

5·11 Occlusion experiment 1. (A) - training shapes; (B) - true object contour

(thick line) superimposed with training shapes (thin lines). This plot illus-

trates that the prominent feature location is different in the true object and

in all training shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5·12 Occlusion experiment 1. Noisy image is shown in all four panels. (A) - Initial

contour (dashed line); (B,C,D): Dashed line - occluded region; Black solid

line - true boundary; White solid line - segmentation result. (B) - occlusion

pattern 1, result using our prior. (C) - occlusion pattern 1, result using PCA

prior (C). (D) occlusion pattern 2, result using our prior. . . . . . . . . . . 101

5·13 Experiment 2: Segmentation with occlusion. (A) - prior shapes; (B) - result

using our prior and (C) - PCA prior. Dashed white rectangle - occluded

region; Dashed white circle - initial contour; Black solid line - true boundary;

White solid line - segmentation result. . . . . . . . . . . . . . . . . . . . . . 103

5·14 Training planes shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5·15 Experiment 3: segmentation of plane with occlusion. Plane silhouette #2

was used to form the image. Dashed white rectangle - occluded region;

Dashed white smooth contour - initial curve; Black solid line - true boundary;

White solid line - segmentation result. . . . . . . . . . . . . . . . . . . . . . 105

5·16 The average shape of 2 triangles obtained using different shape distant mea-

sures: solid lines - prior shapes; dashed line - corresponding average shape;

filled areas - the family of solutions. (A) - asymmetric distance based mea-

sure; (B) - area based measure. One of the possible solution is shown by

dashed line; (C) - our distribution difference measure (dash-dotted line -

evolution result; dashed line - scaled result). . . . . . . . . . . . . . . . . . 108

xvi

Page 18: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

5·17 Initial (dash-dotted contour) and average shapes (solid contour) for 2 groups

of shapes. Prior shapes in each group are shown on the top of each panel. . 110

5·18 Experiment 3: Example shapes from (Klassen et al., 2004). . . . . . . . . . 110

5·19 Experiment 3: Average shapes computed on shapes in Figure 5·18: (A)

Result in (Klassen et al., 2004); (B) Our result. . . . . . . . . . . . . . . . . 111

6·1 Interaction matrix graphical interpretation using directed diagram. Three

objects are sketched in the right panel with assigned object indices. Arrows

in the right panel correspond to non-zero entries in the matrix Z. . . . . . . 117

6·2 Feature function #3 used in this work illustrated for a curve C1 discretized

using 6 nodes. Feature values for curve C1 are defined as the shortest signed

distances from the curve C2 to nodes of the curve C1. . . . . . . . . . . . . 118

6·3 Synthetic 2 shape example: (A) Bi-level noise free image; (B) Segmenta-

tion with curve length prior; (C) - shape distribution prior including only

autonomous feature functions #1 and #2; (D) - shape distribution prior

including directed feature function #3 along with autonomous feature func-

tions. Solid black line shows the true objects boundaries; dashed white lines

- initial boundary position; solid lines - final boundary. . . . . . . . . . . . . 120

6·4 Brain MRI segmentation: (a) Multiple structures and interactions used

for feature function #3; (b) Segmentation with independent object curve

length prior. (c) Segmentation using multi-object PCA technique in (Tsai

et al., 2004) (d) Segmentation with new multi-object shape distribution

prior. Solid black line shows the true objects boundaries; solid white line -

final segmentation boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7·1 Image patch based feature values measured along the boundary. Point O

(patch coordinate system origin) is positioned at Γ(s) (current boundary

point). j-axis is aligned with local inward normal. Two instances are shown. 128

xvii

Page 19: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

7·2 Example 1. Segmentation with shape/intensity distribution prior. True

shape - black solid line; Initial contour - dashed line; Final segmentation

contour - solid while line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7·3 Five points x(k) used to construct feature functions according to Eq. 7.2 in

Experiment 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7·4 Example 2. Segmentation with shape/intensity distribution prior. True

shape - black solid line; Initial contour - dashed line; Final segmentation

contour - solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7·5 Example 3. (A) - Segmentation with shape distribution prior and maximum

mutual information data term (Kim et al., 2002a); (B) - Segmentation with

shape/intensity distribution prior and shape distribution prior. True shape

- solid black line; Initial contour - dashed line; Final segmentation contour -

solid white line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7·6 Expert segmentation of the left lenticular nucleus showing variation in inten-

sity within the structure and lack of a consistent gradient along the boundary.135

7·7 Example 4. (a) Segmentation with shape/intensity distribution prior. (b)

Segmentation with only shape prior and intensity model in (Kim et al.,

2002a). True shape - black solid line; Initial contour - dashed line; Final

segmentation contour - solid line. . . . . . . . . . . . . . . . . . . . . . . . . 136

7·8 LADAR image of a tank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

7·9 Semi-synthetic tank image segmentation with intensity and shape prior: (A)

- intensity and shape prior; (B) - shape prior and threshold intensity term

in eq. 7.6. True shape - solid black line; Initial contour - dashed line; Final

segmentation contour - solid white line. . . . . . . . . . . . . . . . . . . . . 139

8·1 CC shape sketch is shown along with a skeleton found using trial end points.

Maximum radius circle centered at xk is shown. . . . . . . . . . . . . . . . . 145

xviii

Page 20: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

8·2 Extracted fixed topology skeleton. Circles represent the sampled discrete

points on the skeleton. Outside border is the segmented Corpus Callosum

shape. Dark regions near the border comprise the skeleton badness measure

in eq. 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8·3 Skeletons obtained from male subjects using the fixed topology method. . . 148

8·4 Skeletons obtained from male subjects using the nested local symmetry sets

method. Principle and secondary skeleton branches are shown. . . . . . . . 149

8·5 Features extracted from the medial axis . . . . . . . . . . . . . . . . . . . . 150

8·6 Male and female Corpus-Callosum differences and importance of individual

features using p-value based feature ranking . . . . . . . . . . . . . . . . . . 153

8·7 Normal and Schizophrenia Corpus-Callosum differences and importance of

individual features using p-value based feature ranking . . . . . . . . . . . . 154

8·8 Feature importance visualization using linear classifier weight as the feature

ranking score. Top: Male/female case; Bottom: normal/schizophrenia case. 155

8·9 Feature importance visualization using feature selection frequency as the

feature ranking score. Normal/schizophrenia case. . . . . . . . . . . . . . . 156

8·10 Male/female classification testing errors using t-test feature selection method.

Testing error is shown coded by color as a function of the number of Ada-

boost iteration (horizontal axis) and the number of features chosen (vertical

axis). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8·11 Classification testing error for gender and schizophrenia versus normal is

shown for different combinations of feature selection technique, classifica-

tion method, and number of features retained. “T-test; linear” - T-test

method of feature selection, MMSE classifier; “T-test; Ada Boosting” - T-

test method of feature selection, Ada Boosting classifier; “Weights; linear”

- linear weights feature selection method, MMSE classifier; “Weights; Ada

Boosting” - linear weights feature selection method, Ada Boosting classifier. 161

xix

Page 21: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

8·12 Feature selection normalized probability. Male/female classification with

p-value feature selection. Log of the normalized probability that a given

feature is chosen in the set of N features. Horizontal axis: N - number of

selected features; Vertical axis - feature index (1 through 37). . . . . . . . . 163

A·1 Inter-point distance augmentation due to curve deformation . . . . . . . . . 171

A·2 Illustration of feature value computation for feature function #2 . . . . . . 177

A·3 Four cases of the relative positions of three curve points. The support angle

can not be determined unambiguously. . . . . . . . . . . . . . . . . . . . . . 177

A·4 Sequential computation of the angles for a particular “base” point s1, start-

ing from r = 1 (assuming inside of the curve is up-wards). . . . . . . . . . . 178

A·5 Local perturbation of the curve at point ~Γ(s1 +s2). Perturbation εβ(s1 +s2)

is infinitely small comparing to |~Γ(s1, s1 + s2)| . . . . . . . . . . . . . . . . . 180

A·6 Illustration of 2 cases when the sign of the angle increment dα(1) is different

for the same curve perturbation εβ(s). . . . . . . . . . . . . . . . . . . . . . 180

xx

Page 22: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

List of Abbreviations

PDF . . . . . . . . . . . . . . Probability Distribution Function

CDF . . . . . . . . . . . . . . Cumulative Distribution Function

MRI . . . . . . . . . . . . . . Magnetic Resonance Imaging

CT . . . . . . . . . . . . . . Computerized Tomography

MAP . . . . . . . . . . . . . . Maximum a posteriori

PCA . . . . . . . . . . . . . . Principal Component Analysis

IID . . . . . . . . . . . . . . Independent Identically Distributed

SNR . . . . . . . . . . . . . . Signal-to-Noise Ratio

xxi

Page 23: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

1

Chapter 1

Introduction

1.1 Motivation

Humans’ exceptional ability to interpret a visual scene is largely defined by an object-

oriented processing aptitude. Humans are able to extract the knowledge about the shapes

of objects, and efficiently distinguish common features based on a very small number of

samples. Learned knowledge of shape similarity efficiently generalizes to unseen shape

examples given large variabilities and partial observations. On the one hand, this ability

allows us to efficiently identify objects in the scene in the presence of severe noise, obscura-

tion, and clutter. Indeed, humans are so good at reasoning about and finding shapes that

we even tend to find them where none really exists, as shown by some psycho-visual exper-

iments. On the other hand, we are able to efficiently and robustly discriminate between

shapes of different categories. Naturally, the potential of using shape related concepts has

been seen in various domains, including image processing, computer vision, and applied

mathematics. Effort has been made both to study and mimic human abilities, and to

create new algorithms that otherwise perform shape oriented tasks. Gains from robust

shape oriented processing are expected in machine vision, medical imagery interpretation

and countless other applications.

In this dissertation we contribute to shape based image processing in the following

directions:

1) Modeling: This direction is focused on formulating the knowledge of commonalities

within groups of shapes (shape models), and formulating the model of appearance

of shapes in images (appearance models). These shape and appearance models are

Page 24: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

2

designed to be used in image interpretation through the shape extraction task (Di-

rection 2). We focus on such shape and appearance modeling techniques that are

applicable in a curve evolution framework. The goal of our effort is to construct

efficient and robust models.

2) Shape extraction: This direction aims at extracting the boundary of objects of in-

terest in images. A curve evolution framework, using both shape and appearance

models, attempts to move the boundary curve to identify/segment the object. Our

focus is to adapt the models developed in Direction 1 to work in a curve evolution

framework. By combining effective shape and appearance models we aim at im-

proving segmentation results in challenging situations. These situation include noisy

images, weak edges, and high shape variability.

3) Shape Inference: Given the segmentation boundaries extracted from images, one may

be interested in further analysis of these shapes. A common problem is to detect mor-

phological differences in shapes and learn the relationships between these differences

and disease progression. Motivated by the Human Brain Project (Koslow and Huerta,

1997), we develop statistical methods for testing for morphological population differ-

ences and for subsequently identifying and localizing these morphological differences.

This work exploits a skeleton-based representation of the extracted brain shape.

The potential of shape based image processing has led to a considerable body of prior

work on various shape-related aspects. Numerous works explore human perception theories

and their direct application to shape models, theories of shapes and shape spaces, shape

extraction frameworks and algorithms. Applications range from medical image diagnostic

tools and processing aids to machine vision, communication and other areas. We review

the prior work related to our directions in Chapter 2.

In the first two directions of this dissertation we use an inherently object-based frame-

work in describing the boundary as a closed contour. A curve evolution framework is

then used to evolve the contour. In a typical curve evolution scheme, the curve is evolved

Page 25: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

3

by the combination of data forces (dependent on the image data) and prior forces that

regularize or constrain the solution curve. Such a framework is very popular due to its

many advantages. Focusing on one boundary allows for computational efficiency of the

evolution process. Often, computations can be performed exclusively in the neighborhood

of the boundary. Curve evolution permits flexible initialization and allows easy topology

handling when combined with level set framework. Finally, different forces may be used,

making the approach flexible.

According to our focus on curve evolution based methods, we are interested in shape

modeling approaches that can be incorporated into curve evolution. However, despite the

large body of prior work on shapes, at the time this work started, combinations of curve

evolution and shape priors had been largely limited to generic priors that incorporate a

penalty on curve length, bending energy, or similar aggregate measures. Such priors can

provide a regularizing (smoothing) effect to the solution but indiscriminantly smooth out

salient features of shapes. On the other hand, they are often too generic, treating all

shapes in the same way. At present, generic priors still dominate in use within curve

evolution based methods. Our approaches to construct a prior were conceived as alterna-

tive techniques, aiming at richer, data dependent priors applicable in the curve evolution

framework. A few other approaches to construct and use a prior were proposed. Notably,

deformable template approaches explicitly define the space of deformations (for example

using PCA analysis) with respect to the “average” shape (Staib and Duncan, 1996). These

methods can work well when observed configurations are well represented in the training

data, but they suffer from poor generalizability to unseen shapes. Our approaches to con-

struct the prior are designed to address the limitations of the generic prior approaches

and approaches based on explicit templates. Our approaches have distinctively different

properties from existing approaches. First, our shape prior encodes existence of prominent

shape features while being invariant to the location of these features within the shape. This

property allows the model to generalize to large variations preserving certain prominent

shape features. Second, due to the invariance properties of the shape descriptors used, the

Page 26: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

4

registration (alignment) of shapes is not necessary when using our framework. We aim

at challenging segmentation problems with little training data available and large shape

deformations.

Existing appearance models fall into three major categories. Boundary-based ap-

proaches assume that object boundaries coincide with image edges, attempting to locate

boundaries in areas of high image gradient. Region-based approaches assume uniformity

of certain statistics of the image intensity inside and outside the boundary. For example,

these approaches attempt to maximize the uniformity of region statistics or their separa-

tion. Template based approaches, such as the Active Appearance Model (AAM), assume

a template image is linked to the deformable boundary. The model tries to find the match

between the warped template and the image being segmented, finding the desired object

boundary as a deformed boundary template. These methods attempt to match the inten-

sities local to the boundary.

Our method to model appearance is built upon our shape modeling approach and ex-

tends its properties. It allows the solution of challenging segmentation problems, which

create difficulties for current boundary-based and region-based approaches. Our appear-

ance modeling approach attempts to capture the general appearance properties near the

object boundary and thus uses a richer description of the intensity properties than is typ-

ically used in boundary and region based approaches. Yet our method is constructed on

distributions of intensities, and thus it attempts to abstract and generalize these proper-

ties, in contrast to template-based methods. In this way it has the potential for greater

robustness and flexibility with respect to large variations of boundary appearance along

the boundary itself and across training boundaries and images than template-based ap-

proaches, which emphasize direct intensity matching between template and data.

In the shape inference part of this work, we focus on the problem of identification of

morphological differences in corpus callosum shapes of the brain and the task of automatic

classification of shapes based on observing prior shapes with known category memberships.

Historically, such analysis is based solely on area or volume measures but the information

Page 27: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

5

contained in these measures is not sufficient to draw conclusions about a shapes’ distinc-

tiveness nor is it sufficient to localize and quantify specific differences. We explore rich

skeleton-based shape descriptors that allow us to use statistical approaches to identify the

shape population differences and build optimal classifiers based on these descriptors.

1.2 Major contributions

In this part, we summarize the major contributions of this dissertation. The first major

contribution of this dissertation is the incorporation of the probability distribution shape

model proposed in (Zhu, 1999) into the curve evolution framework. This shape modeling

approach was motivated by the research on human shape perception and attempted to

capture such shape perception in a probabilistic context. We proposed an approach to

efficiently construct such a model from training data and incorporate the resulting model

into a curve-evolution based energy minimization framework. To our knowledge this is the

first time models from the perceptual modeling community had been used as priors for

object boundary estimation in images.

In the second major contribution of this dissertation we develop an approach to shape

modeling based on the concept of shape distributions. Shape distributions have been pre-

viously used for shape classification and show evidence of being able to encode shape under

large deformations while preserving visual similarity (Osada et al., 2002). Our work con-

tributes in using the shape distributions as a prior in object boundary estimation problems

and incorporating this prior in a curve evolution framework to solve challenging segmen-

tation problems. We further extend our shape distribution concept to 3D and to joint

multi-object shape modeling. Our new shape modeling methodology allows us to improve

the solutions to segmentation problems that present difficulties for current segmentation

methods. Particular difficulties include low signal-to-noise ratio, small training data sets,

large shape variability, object occlusion, and weak/diffuse boundaries.

In the third major contribution we extend the shape distribution concepts to model the

appearance of objects in images through intensity defined distributional descriptors. We ar-

Page 28: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

6

rive at the framework of modeling both shape and appearance under the principle we name

the joint shape and appearance distribution model. Our new appearance model, assisted by

the shape prior, allows us to approach segmentation problems that pose severe difficulties

for current approaches. In particular, we are able to segment images where intended object

boundaries do not coincide with edges and where regions can not be separated based on

commonly used region statistics.

In the fourth major contribution of this dissertation we propose tools to study mor-

phometric differences in manually segmented outlines of the corpus callosum shapes. We

exploit skeleton based parametric shape descriptions. We present novel statistically-based

feature ranking metrics, and use these metrics to reduce the dimension of the original fea-

ture space and to construct classifiers on the reduced feature space. Our feature ranking

metrics also allow for intuitive inter-class shape difference visualization.

This dissertation is organized as follows:

Chapter 2 reviews the background related to the topics of this dissertation and used

methods.

Chapter 3 introduces the maximum entropy shape modeling approach and implements

it in the curve evolution context for image segmentation.

Chapter 4 presents the shape distribution-based modeling approach. This chapter

describes the model construction and its use in curve evolution based shape inferencing.

Chapter 5 presents the applications, demonstrating the properties of the shape distribution-

based shape model.

Chapter 6 extends our shape distribution based shape modeling approach to modeling

relationships between multiple objects.

Chapter 7 introduces a framework of unified shape and appearance modeling using

shape distributions concept.

Chapter 8 presents our results on morphometric differences analysis of the shapes of

the Corpus Callosum brain structure.

Chapter 9 summarizes our results and discusses the possibilities of future work that

Page 29: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

7

extends the contributions of this dissertation.

Page 30: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Chapter 2

Background

In this chapter we review the technical background of the methods used in this dissertation

and prior work that is related to the major topics of this dissertation. First, we introduce

the basic concepts of object-based image processing. Next, we consider curve evolution

approaches in greater detail since curve evolution will be a major focus of this work.

We discuss different curve parameterizations, with particular attention paid to level set

methods. We review prior work on shape distances and models and on appearance models.

Finally, we review some elements of classification theory, relevant to the morphological

shape analysis direction of this thesis.

2.1 Object based image processing

In this dissertation we consider object-based tasks and concentrate on the use of shape and

intensity prior information in these tasks. The concrete applications of interest are image

segmentation, shape clustering, shape averaging, image reconstruction, and shape classifi-

cation. First, we will consider the boundary extraction thrust as the driving application.

In this context, the prior information is generally divided into shape and appearance priors.

The shape prior describes the geometry of the object boundaries and/or relationships be-

tween boundaries of multiple objects. The appearance prior formulates prior knowledge on

image intensities with respect to object boundaries. The shape prior imposes a structure

on the solution boundary or merely acts as a regularizer. The appearance prior links the

solution to the data, specifying the way the data (image) influences the solution.

A first choice made in a given application is that of a low level object (shape) descrip-

tion. Two types of approaches can be distinguished. In the first type of approach, the

8

Page 31: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

9

shape is described parametrically as a geometrical primitive shape (such as an ellipse) or a

combination thereof. An advanced version of such an approach is a skeleton-based param-

eterization utilized in part of this work. In skeleton based parameterizations, the shape is

described as a sequence of connected ribbons. Each ribbon is assigned a width parameter

that implies the shape boundary points.

A second type of approach to describe shapes can be named boundary-based or active

contour approaches, where the shape is explicitly defined by its boundary: a curve in 2D

or surface in 3D. Boundary based shape descriptions are the primary tool used in this

dissertation. In turn, boundary contour descriptions can be realized in two distinct ways.

Parametric boundary descriptions define a boundary as a sequence of nodes, landmarks

or spline segments. The geometric active contour method describes a boundary implicitly

using a higher order function called a level-set function. In this dissertation we utilize both

approaches to describe a contour. Throughout this dissertation, it is assumed that the

underlying contour is closed and non-self-intersecting. Although for some special cases this

assumption can be relaxed, real life objects can be described by one or several non-self-

intersecting closed contours. Curve evolution approaches (Caselles et al., 1997) to evolve

such parameterized shapes are the basic tool used here. Curve evolution methods are

intrinsically geometric since the boundary describes the object of interest directly. Focusing

on one boundary allows for computational efficiency of the evolution process (computations

may have to be performed in the neighborhood of or on the boundary). Curve evolution

permits easy topology handling and flexible initialization. Finally, different forces may be

used to move the curve, making the approach highly flexible.

2.2 Curve evolution framework

In this section we give a detailed review of the curve evolution framework. We consider

different underlying formulations and practical implementation approaches.

In a curve evolution framework, the boundary is continuously evolved using the curve

force specified at each point on the boundary. Typically, the force includes the shape and

Page 32: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

10

the intensity components. The shape component constrains or regularizes the boundary,

incorporating the prior knowledge on the shape of the boundary. The intensity component

depends on underlying data and the model of the relationship between the curve and the

data, which we call the appearance prior.

We distinguish three possibilities to formulate the curve evolution approach and to

compute the curve forces. First, a heuristic approach can be used to specify all or some

of the forces (Shen et al., 2005). The resulting approach may be effective but lacks the

integrity of a general principles formulation. Second, the energy minimization based ap-

proach defines the energy functional, which relates curve, image, and prior information.

The solution curve is then defined as the minimizer of this energy functional and is com-

puted through the evolution process. Curve force is typically computed as the gradient

flow - the direction of curve deformation that corresponds to the fastest decrease of the

energy. Third, a curve, an image and prior information can be encoded in an explicit

probabilistic framework, for instance using the MAP formulation. While being the most

theoretically justified approach, a probabilistic framework is problematic to implement in

practice, notably, because “true” probability distribution function definition and normal-

ization constant computation is difficult on the infinite dimensional spaces of shapes. In

fact, probabilistic approaches used in practice often do not include strict formulations for

the probability distributions used. On the other hand, any probabilistic approach can be

expressed as an energy minimization approach, but not conversely. Hence, the energy min-

imization approach presents an adequately general and convenient framework. Below we

give more detailed overviews of the energy based and probabilistic formulations of curve

evolution.

2.2.1 Energy minimization based curve evolution

In the energy minimization based curve evolution approach, one defines an energy func-

tional that depends on one or multiple curves. This energy functional is typically designed

to be minimized at the correct solution (i.e. desired object boundaries). The energy func-

Page 33: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

11

tional can be constructed by analyzing the structure of the problem, desirable solutions,

etc.

In the energy based formulation, an energy E(Γ) depending on the hypothesized object

boundary Γ is defined. The solution boundary is the minimizer of this energy:

Γ∗ = argmin

Γ

E(Γ) (2.1)

The energy typically consists of two types of terms: intensity term(s) and shape term(s).

E(Γ) = Eint(Γ) + αEshape(Γ) (2.2)

Sometimes these are referred to as “external” and “internal” terms. The intensity term

Eint insures the fidelity of the solution to the image data. Eint reflects the sensor model of

the expected appearance of the data corresponding to a given scene (appearance model);

for example, for image segmentation problems, this term can be the negative log-likelihood

of the image intensities. The shape term Eshape reflects the prior information about shape.

The parameter α is a regularization coefficient that weighs the strength of the prior. For

certain problems, the data term is absent in the formulation. An example of such a situation

is the curve morphing task, in which a curve must be evolved to match the prior curve(s).

Although the prior curve(s) can be interpreted as data, in order to avoid confusion, we will

call the energy term that depends on the prior curves - a shape term Eshape.

Given the energy functional E(Γ) we find the minimizing curve flow −∇E(s). It can

be interpreted as a force acting on the curve in the normal direction F (s) = −∇E(s). The

curve is evolved according to the following differential equation:

dt= −∇E(s) ~N(s) (2.3)

where ~N(s) is the normal direction to the curve Γ at s. In a discrete implementation

Γt+1(s) = Γt(s) − k∇E(s) ~N(s) (2.4)

Page 34: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

12

where t indicates the evolution time of the minimization and k is a speed coefficient chosen

small enough to guarantee numerical stability.

The gradient curve flow minimizing eq. 2.2 can be found using variational approaches

(Charpiat et al., 2003) or shape gradient approaches (Jehan-Besson, 2003). In this work

we utilize a variational approach. The possibility to apply shape gradient tools is a topic

of further research.

The following steps are involved in the variational approach to computing the gradient

curve flow (Charpiat et al., 2003) which minimizes the energy E(Γ):

1. Compute the Gateaux semi-derivative of the energy functional with respect to per-

turbation β. Let s be the arc-length of the curve Γ, then β(s) defines the normal

displacement of the curve at s. The Gateaux semi-derivative is defined as a directional

derivative

G(E, β) = limε→0

E(Γ + εβ) − E(Γ)

ε(2.5)

The space of perturbations β(s) defines a linear and continuous subspace of a Hilbert

space L2(Γ). The Hilbert product is given by < β1, β2 >=∫

Γ β1, β2.

2. If the Gateaux semi-derivative of a linear functional E exists, we can apply the

Rietz representation theorem (Rudin, 1966), and the Gateaux semi-derivative can be

represented as

G(E, β) =< ∇E, β > (2.6)

were ∇E is the gradient (flow) minimizing the functional E. Thus, one needs to cast

the Gateaux semi-derivative in the form 2.6.

2.2.2 Probabilistic formulation for curve evolution

Sometimes, the problem of finding an optimal object boundary in an image can be cast in

the probabilistic form. Given the image I, we want to find a maximum a posteriori (MAP)

Page 35: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

13

contour Γ∗ given by

Γ∗ = argmax

Γ

p(Γ|I) (2.7)

Using Bayes’ rule

p(Γ|I) ∼ p(I|Γ)p(Γ) (2.8)

where p(I|Γ) is the image likelihood and p(Γ) is a shape prior. The image likelihood is

a pdf of an image given the object contour, which is constructed using a model of image

appearance. The shape prior is the pdf on the space of possible shapes.

In fact, the energy-based and probabilistic formulations are closely related. Taking the

negative log of eq. 2.8 we obtain

−log p(I|Γ) = −log p(I|Γ)−log p(Γ) (2.9)

The maximization in eq. 2.7 is equivalent to minimization of −log p(I|Γ), and is thus equiv-

alent to an energy minimization problem with E(Γ) = − log p(I|Γ). In this formulation

the image likelihood can be interpreted as p(I|Γ) = e−Eint(Γ) and prior shape pdf can be

interpreted as p(Γ) = e−Eshape(Γ).

The advantage of the probabilistic approach is the knowledge of the regularization

constant α in the energy formulation given proper normalization of the probability distri-

butions. The difficulty of this approach is that the proper normalization is often hard to

find (for instance we need to integrate over the space of all curves). Hence, in this work

we use the energy formulation.

2.3 Shape parameterizations

Given the boundary extraction problem formulation, such as energy minimization based

or probabilistic, the next question is how to parameterize and deform the object. In this

section we review different options to parameterize the shape and in particular, level set

approach. Parametric non-curve object representations define the shape through a set of

parameters which are not boundary points coordinates. Such approaches are not directly

Page 36: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

14

combined with the curve evolution framework. Explicit and implicit curve parameteriza-

tions are naturally used in the curve evolution framework.

2.3.1 Parametric non-curve object representations

Starting with a description of the lowest dimensionality, let us briefly review shape param-

eterizations as geometrical primitives. Such approaches are incompatible with the curve

evolution framework but can be used in energy based and probabilistic boundary extrac-

tion formulations. One of the simplest approaches to parameterize an object is to assume

the ellipse shape (Poonawala, 2003). Only six parameters are sufficient in this case. Of

course, such a parameterization is only useful for a narrow range of problems, such as to

describe biological cells. Rectangular shapes were used to describe man made objects, see

(Minguez and Montano, 2005). Combinations of primitive shapes can be used to describe

more complex objects (Zhu and Yuille, 1996). Another significant strategy is to parame-

terize the object using medial primitives, as in the MREP approach of (Pizer et al., 1996;

Pizer et al., 2003). In the MREP approach, the object is described by a graph of medial

atoms and implied boundary points. Closely related skeleton or medial axis approaches

model the object by a medial axis (skeleton) and implied boundary (Dimitrov et al., 2000;

Galand et al., 1999; Shah, 2005; Tari and Shah, 2000). The unpruned skeleton is a result

of the Blum transform of the object boundary (Blum, 1967). The MREP representation

can be considered as a coarsely sampled version of the skeleton representation. Skeleton

based approaches are able to capture information about shape in a meaningful and visually

intuitive way, which warrants its use in some applications. In this dissertation we use a

skeletal shape representation for the analysis of brain morphology in Chapter 8. The shape

descriptors presented above are ground level descriptors in the sense that they are not de-

rived from or defined on the object boundaries. Conversely, these descriptions usually

imply the boundary.

Page 37: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

15

2.3.2 Explicit curve-based parameterization

We now consider curve based descriptors. A variety of explicit curve representations

have been proposed. Curves can be represented by landmarks (Bookstein, 1991), sam-

pled boundary points (Cootes et al., 1993), basis function coefficients (Kunttu et al., 2004;

Staib and Duncan, 1992) or splines (Sukmarg and Rao, 2000) among others. The main

advantage of the explicit parameterization is low computational complexity, while the dis-

advantages are the need for periodic re-parameterization (re-sampling in case of uniform

sampling of the boundary) and the need to cope with topology changes during evolution

(self intersection, split and merge, etc), see (McInerney and Terzopoulos, 1995). Direct

curve parameterization by arc-length is the basis for the curve evolution framework used

here. In this dissertation we use arc-length parameterization and its numerical implemen-

tation through uniform curve sampling to compute functions defined along the curve.

Higher level descriptors can be defined on the shape boundary representations. One

family of such descriptors, we name distribution-based descriptors, plays a crucial role in

this dissertation. The descriptors of this kind are reviewed in detail in Section 2.5.

2.3.3 Implicit curve parameterization by level sets

The implicit curve parameterization known as the level set framework (Sethian, 1999) has

gained significant popularity due to its many advantages. Level-set representations allow

easy incorporation of curve forces defined directly on level set function, implicit handling of

topology, straightforward implementation, and easy extension to higher dimensions. Under

the level set framework, the curve (2D) or surface (3D) is defined as a zero level set of the

function Φ(x) (see figure 2·1 for a 2D curve illustration). Although different possibilities

exist, this function is typically taken to be the signed distance function of the curve (or

surface), defined to be negative inside and positive outside of the object:

Φ(x) =

d(x, Γ) if x is outside Γ

−d(x, Γ) if x is inside Γ(2.10)

Page 38: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

16

where d(x, Γ) is Euclidean distance between the point x and the curve Γ.

Evolution of the curve is performed by evolving the embedding level-set function using

the update flow Φt(x). In the discrete implementation, the level-set function is iteratively

updated according to

Φ(x)′ = Φ(x) + δΦt(x) (2.11)

where Φ(x)′ is the updated level-set function at each time and δ is the update step taken to

be sufficiently small to guarantee numerical stability. There exist two ways to compute the

update flow Φt(x). First, certain definitions of curve flows can be expressed as equivalent

flows Φt(x) explicitly defined on the level-set function. This situation arises when the curve

properties (such as curvature) are computed on the level-set function itself. This is the

most natural way to evolve the level-set function but since the level-set function loses its

properties of being a distance function in the course of evolution, and to insure numerical

stability, one needs to perform periodic re-initialization, which is computationally costly.

Second, if the curve evolution flow Γt does not have an equivalent level-set function flow,

the evolution of the level-set function Φ is governed by the following differential equation:

Φt = −Γt|∇Φ| (2.12)

More accurately, eq. 2.12 only specifies the evolution of the level-set Φ = 0 corresponding

to the interface. Several options exist to define the evolution of the level-set function away

from the interface (Sethian, 1999):

• Force extension approach. Under this approach, the level set function update at any

point in the space is chosen to preserve the property of Φ being a signed distance

function of the contour defined by its zero level set. This is the most accurate but

also most computationally extensive approach. It’s most important advantage is that

there is no need for periodic level set function re-initialization.

• Interpolation based approach. Values of the level set function are computed by

interpolating the updates on the interface. Again, one needs to perform periodic

Page 39: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

17

re-initialization of the level-set function.

• Modified differential equation approach. The level-set function can be evolved under

a modified differential equation that tends to preserve the distance function property

approximately. Under this approach the need for re-initialization is reduced but not

eliminated completely.

A combination of flows required for a particular application may need a hybrid approach

to evolve the level set function. Approximations can be designed that remove the need for

periodic re-initialization and PDE-based evolution at the cost of reducing the accuracy of

the evolution (Shi and Karl, 2005).

In the base framework, the level set function in the whole domain (image or volume) is

evolved, which leads to significant computation overhead. Narrow band approaches reduce

the computational complexity of the level set evolution by constraining the level set update

to a narrow band around the interface. It is important to note that narrow band approaches

do not reduce (and can increase) the need for periodic re-initialization.

Re-initialization of the level set function consists of three steps: zero interface extrac-

tion, computation of the values of the level set function at grid points next to the interface,

and update of the level set function values away from the interface using a fast-marching

(Sethian, 1999) or fast sweep approach. Typically, the re-initialization has to be performed

every few iterations depending on the magnitude of the update steps.

2.4 Shape modeling approaches

Using prior shape information in boundary extraction problems is the major direction of

this dissertation. In the following sections we overview some existing approaches, dis-

cuss their advantages and limitations, and depict the area of deficiency of the current

approaches. Shape models are used to construct shape term Eshape in the boundary ex-

traction energy formulation.

Page 40: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

18

X

Y Φ =0

Φ

Figure 2·1: Level set curve embedding. Curve Γ is given by a set of pointsΓ : Φ = 0, Φ is called level set function.

2.4.1 Constructing distance measures on shapes

Let us first consider a particular situation when the solution for the object is expected

to be close (in terms of its shape) to the single known instance of shape. In such a case

one can define the penalty of deviating from this given shape and use this penalty in the

energy formulation as Eshape. This penalty can be considered the measure of distance

between the solution and the true shape but is not a model in a strict sense. On the other

hand, the definition of shape distance can be used for shape clustering, shape classification.

Not surprisingly, shape distances play an important role in object-based image processing.

They allow evaluation of the solution goodness and impose dynamic constraints. They also

can be used to constrain the relative positions of multiple objects. We first review some of

the widely used choices of shape distances, in particular, those used in this dissertation.

The distance between two shapes can be constructed on parametric shape representa-

tions, such as the angle function representation (Klassen et al., 2004; Tagare, 1999). One

possibility is to compare parametric representations directly using a definition of the dis-

tance on the space of parametric representations. Shapes under comparison are assumed

pre-registered. In (Klassen et al., 2004), shape registration is intrinsic in the distance def-

inition. Elastic matching has been used in (Basri et al., 1995) with particular attention

Page 41: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

19

paid to the similarity of shapes with deformable parts. A distance measure computed on

skeleton representations has been used in (Sundar et al., 2003). Other methods construct

shape distances on features extracted from the shapes. For instance, (Berretti et al., 2000;

Belongie et al., 2002; Li and Simske, 2002) among many others.

In curve evolution methods, a shape distance is often defined on the contours themselves

or on the embedded distance functions. An example of a generic curve distance measure is

the Chamfer distance (Borgefors, 1984; Thayananthan et al., 2003) that can be defined as

d(Γ1, Γ2) =

x∈Γ1

min

y∈Γ2

||x − y||ds (2.13)

where the integration is carried out along Γ1 accumulating the Euclidean distance between

the current point on Γ1 and the curve Γ2. Another often used shape difference measure

based on the total area between shapes is the Hamming distance (Skiena, 1997)

d(Γ1, Γ2) =

A: sign(D(Γ1))6=sign(D(Γ2))

dS (2.14)

where D(Γ1) and D(Γ2) are signed distance transforms for shapes Γ1 and Γ2 respectively.

Another measure is the Hausdorff distance (Rote, 1991; Charpiat et al., 2003), defined as

minimum distance between any point on Γ1 and any point of Γ2

d(Γ1, Γ2) = min

x∈Γ1

min

y∈Γ2

||x − y||

(2.15)

2.4.2 Constructing shape variability model

If no prior shapes are observed or if more then one prior shape is given, one needs a different

approach to define/constrain the space of shape variation, which is expressed as the term

Eshape in the energy based curve evolution framework. Here we review existing approaches

by dividing them into the following categories.

1. Methods using a generic prior

In generic regularization methods, a prior of regularizing type is assumed, such as

Page 42: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

20

“a curve must be short”. Practically, certain properties of the shape such as the

perimeter or the area are penalized in order to regularize the estimated boundary

curve. This group of methods amounts to generic regularization or geometric “low

pass” filtering to limit the effects of noise in the image. Such methods do not construct

a shape model in an explicit way (Mumford and Shah, 1985; Caselles et al., 1997;

Tsai et al., 2001b; Kim et al., 2002b; Siddiqi et al., 1997). Such “generic” penalties

are stationary along the curve, in that every point on the boundary experiences the

same effect. Such priors can remove object protrusions and smooth salient object

detail when the boundary location is not well supported by the observed data, since

they seek objects with short boundaries or small area. The important advantage of

methods using a generic prior is that they usually are easily implementable in a curve

evolution context.

2. Extensions of methods using a generic prior

Methods similar in nature but improving upon generic priors are numerous and de-

velop in several directions. One group of methods, constructs non-linear priors on

curvature or other image/shape descriptors. This group includes anisotropic diffu-

sion based approaches. Examples can be found in (Tasdizen and Whitaker, 2004)

and references therein. A common goal for these approaches is to preserve corners, or

edges of objects while preserving smoothness elsewhere. In a curve evolution context,

geometric flows which drive an evolving curve toward a polygon were developed in

(Unal et al., 2002). These flows potentially could be used as a prior force in the

curve evolution framework. Unfortunately, the geometric flows in (Unal et al., 2002)

favor polygonal shapes with predefined (chosen by the operator) edge orientation di-

rections. Such a prior is highly dependent on extrinsic properties (such as object

orientation), and does not appear adaptable to other, non-polygonal, shape classes.

In (Shah, 2002), the functional extending the curve length penalty was formed and

applied to the curve smoothing problems. In this functional, the curve length penalty

Page 43: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

21

was applied only in the regions with a low likelihood of being a corner. The likelihood

of a corner, called the corner strength function, was constructed jointly while evolving

the curve using modifications of the Mumford-Shah functional methodology (Shah,

1996). This technique yields a piecewise smooth curve, potentially improving the

segmentation that is achieved by using a stationary smoothing prior, such as a curve

length penalty. However, this technique, like the generic curve length penalty, does

not include more detailed information derived from training shapes. Moreover, aux-

iliary unknowns (the corner strength function field) makes the solution non-unique

and sensitive to the regularization parameters.

It is possible to improve the generic prior so that it includes more information about a

class of shapes but is still expressed as a local penalty, stationary with respect to the

shape boundary. One such alternative data-driven prior shape model was proposed in

(Leventon et al., 2000b) as a part of a level set based segmentation algorithm. The

overall distribution of curvature and intensity with respect to a segmenting curve

was found from training data. This spatially stationary model was then used in a

probabilistic MAP formulation to segment an image.

The shape prior in (Leventon et al., 2000b) is computed as follows. It is assumed

that the values of the curvature of the level set function Φ embedding the segmenting

curve are independent samples from the prior distribution on the level set curva-

ture computed on training shapes leading to the following formulation for the prior

distribution on the level-set function Φ

p(Φ) =∏

y∈R

n∑

j=1

x∈R

exp

(

−||K(Φ)(y) − K∗

j (x)||2

2σ2

)

(2.16)

where K(Φ)(y) is the curvature of the level set function at y, R is the image plane,

and K∗j (x) is the curvature of the level set function computed on the prior image j

at x. n is the number of prior images. Corresponding energy formulation for the

Page 44: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

22

Eshape(Γ) is given by

Eshape(Φ(Γ)) = −∑

y∈R

log

n∑

j=1

x∈R

exp

(

−||K(Φ)(y) − K∗

j (x)||2

2σ2

)

(2.17)

where Eshape(Γ) is explicitly defined on the level set function Φ computed on the

curve Γ, K(Φ)(y) is the curvature of the level set function at y, and K∗j (x) is the

curvature of the level set function computed on the prior image j at x. n is the

number of prior images.

Although giving better results than generic curve length penalty priors, this approach

still tends to suppress salient structures. The reason is that the stationary prior

coupled with the MAP criterion attempts to drive the curvature at every point on

the curve to the same value corresponding to the mode of the distribution. We use

this technique in Chapter 5 for comparison with our results.

A variation of Leventon’s model described previously was proposed in (Litvin and

Karl, 2003), where the PDE attempts to deform the curve in such a way that the cur-

vature histogram computed over the current level set function matches the curvature

histogram computed over the training level sets corresponding to the training shapes.

Such global histogram matching approach showed interesting properties and yielded

the results superior to those given by the method in (Leventon et al., 2000b). Un-

fortunately, approach in (Litvin and Karl, 2003) does not have corresponding energy

or probabilistic formulation. Both the approach in (Leventon et al., 2000b) and the

approach in (Litvin and Karl, 2003) are limited to encode only the level set curvature

as a feature and are not adaptable to other shape features.

3. Deformable templates

Numerous approaches have been proposed to construct prior models based on al-

lowable deformations of a template shape. One group of approaches are based on

representing and modeling shape as a set of landmarks (see (Dryden and Mardia,

1998) and references therein). In the Point Distribution Model (PDM), proposed in

Page 45: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

23

(Cootes et al., 1995), n labeled points on a boundary are selected to describe each

shape in the training set. The space of allowable shapes is then defined as a box in

2n dimensional space defined by the spread of points in this space, where each point

corresponds to one training shape.

A number of approaches use principal component analysis based on parameterized

boundary coordinates or level set functions to obtain a set of shape expansion func-

tions (Rousson and Paragios, 2002; Tsai et al., 2001a; Leventon et al., 2000a; Wang

and Staib, 2000) that describe the subspace of allowable shapes. Sinusoidal basis

functions were used in (Staib and Duncan, 1996). A solution is then sought in this

restricted shape space. In another approach the subspace of allowable shapes is com-

posed of the set of deformations of a predefined shape template (Sclaroff and Liu,

2001). The restricted shape space is then used to constrain the solution, or to com-

pute the likelihood of a particular boundary configuration. Still, other approaches

construct more complex parametric shape representations, such as the MREP ap-

proach in (Pizer et al., 1996), or deformable atlas based approach in (Christensen,

1994). (Fenster and Kender, 2001) builds a prior probability density as a multidi-

mensional Gaussian on a space of parameters calculated from image features along

the boundary.

Unfortunately, these methods can be overly sensitive to the global appearance of

particular shapes in the training data. These methods are effective when the space

of possible curves is well covered by the modeled template deformations as obtained

through training data, but may not generalize well to shapes unseen in the training

set. Another difficulty is the need to pre-register shapes, which can be problematic

in case of large deformations within the training set of shapes.

Let us focus on a particular implementation of PCA-based shape prior defined on

level-sets. First, training shapes are aligned using the technique in (Rousson and

Paragios, 2002). Second, PCA analysis is carried out on signed distance transforms

Page 46: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

24

corresponding to registered training data shapes. The PCA space of allowable shape

deformations was constructed using first 5 eigenshapes. In order to use the prior in

a curve evolution framework, one imposes the penalty on the mismatch between the

segmenting curve and its projection onto the PCA space. We penalize the area be-

tween the current shape and the identified PCA space projection using the Hamming

distance penalty. Therefore, the prior shape energy is given by

Eshape(Γ) =

A: sign(D(Γ))6=sign(D(Γ+PCA

))

dS (2.18)

where Γ+PCA is the projection of Γ onto the PCA space. The problem to find the

projection is formulated as the minimization problem with respect to the Hamming

distance between the given curve and the curve in the PCA space. This minimization

is carried out using gradient descent with respect to PCA projection coefficients.

Formally, the projection Γ+PCA is given by

Γ+PCA = argmin

ΓPCA∈PCA

A: sign(D(Γ))6=sign(D(ΓPCA))

dS (2.19)

In this dissertation we use this technique for comparison with our results.

4. Articulated models

Another approach to the inclusion of prior shape information is based on explicit

modeling and extraction of component parts (Pizer et al., 1996; Zhu and Yuille,

1996). Such models are also known as articulated models. These models have been

shown to represent well visual similarity within certain classes of shapes, such as

human silhouette shapes or human palm shapes. Unfortunately, articulated models

only give an ad hoc solution to a certain type of problems. Different models have to

be constructed for different classes of shapes. Comparison between different objects is

problematic. Such models are also not adapted to curve evolution based approaches.

Recently, the symbolic signatures method has been proposed in (Ruiz-Correa et al.,

Page 47: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

25

2006). A method can define shape parts, parameterize them by numerical signatures

and encode the relative positions of parts as 2D images. An SVD based classifier is

trained on prior shape examples to perform detection and classification tasks. This

methodology can be considered a part-based method. The skeleton of parts is not

predetermined and a measure of distance between pairs of shapes with different skele-

ton topology can be constructed. However, the approach has two major problems.

First, user intervention is needed in the process of model construction. Second, it is

still unclear if the model can be used in curve evolution.

5. PDF construction

Some methods attempt to construct “true” probability distributions on the space

of shapes. In one such technique, motivated by the theories of human perception,

Zhu (Zhu, 1999) developed the maximum entropy model of shape. The model is

probabilistic and flexible and thus seemingly good candidate for the shape prior.

Zhu showed that samples drawn from the constructed probability distribution show

impressive perceptual similarity to the prior shapes while being highly variable in

their shape.

Let us overview the model construction according to (Zhu, 1999). The model is re-

quired to have maximum variability in unconstrained directions while capturing some

important data statistics for a set of “significant” shape features φ(α)(s), which are

measured along the shape boundary, where s is arc length along the boundary and

α is a feature index. The shape feature statistics are defined as follows

µ(α)(z) =

δ(z − φ(α)(s))ds (2.20)

where z is the feature value.

We denote the pdf of the curve Γ for the given class of shapes by p(Γ), where Γ stands

for the instance of a boundary. Following (Zhu, 1999), to have a pdf of maximum

entropy that is consistent with observed statistics, p(Γ) should satisfy the following

Page 48: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

26

set of equations:

p∗(Γ) = argmax

p(Γ)

(

p(Γ) log p(Γ)dΓ

)

Ep∗(Γ)[µ(α)(z)] = µ

(α)obs(z) ∀z, ∀α∫

p∗(Γ)dΓ = 1 (2.21)

where Ep(Γ)[] is the expectation with respect to the probability distribution function

p(Γ); integration is carried out over the space of all curves; µ(α)obs(z) are averaged

statistics for the training set of M shapes:

µ(α)obs(z) =

1

M

M∑

i=1

µ(α)i (z) (2.22)

where µ(α)i (z) is the statistic of the feature α computed on the ith prior shape.

By solving the system of equations 2.21 using Lagrange multipliers, the following

solution form is obtained:

p(Γ) =1

Zexp

(

−k∑

α=1

λα(z)µα(Γ, z)dz

)

(2.23)

where λα(z) is the Lagrange multiplier function for a feature α, µα(Γ, z) is the statistic

corresponding to the shape Γ, and Z is the normalization constant. In practice, the

curve is discretized by sampling the boundary with fixed number of nodes. The range

of feature values is also discretized. In the discrete case, considering only one feature

(α = 1), eq. 2.23 becomes

p(Γ) =1

Zexp(− < Λ, µ1(Γ) >) (2.24)

where Λ is the set of Lagrange multipliers and µ1(Γ) is the discretized statistic (his-

togram) for the curve Γ. In order to construct a pdf of the form of eq. 2.24, one

should find the set of Lagrange multipliers Λ. Using the fact that log p(Γ) is concave

Page 49: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

27

with respect to Γ, it is possible to find Λ iteratively:

dt= Ep(Γ;Λ)[µ(Γ)] − µ

(α)obs (2.25)

where Ep(Γ;Λ)[] is the expectation with respect to the probability distribution function

p(Γ; Λ). The limit point of the iteration scheme in equation 2.25 will satisfy the

observations constraint (second equation in the set 2.21).

The key difficulty is the computation of the expected statistics from the current

distribution Ep(Γ;Λ)[µ(Γ)]. Analytical calculation of this quantity is very difficult. As

a result, numerical Monte Carlo Markov Chain (MCMC) methods are used in order

to obtain samples from p(Γ; Λ). The quantity Ep(Γ;Λ)[µ(Γ)] is then obtained as the

average of histograms over samples from the distribution.

(Zhu, 1999) uses the Metropolis-Hastings algorithm to simulate a random walk in

the space of possible configurations of a closed discrete contour. Curve nodes are

positioned on grid points. One move in the space consists of moving one node of a

curve to one of the 8 neighboring positions. The distance between nodes is constrained

within a certain bound. The resulting MCMC simulation is very slow and the number

of moves necessary for obtaining a sample from the distribution is of the order of 109,

making this MCMC simulation too slow for use in practical applications.

In Chapter 3 of this dissertation we investigate the possibility to improve upon the

model construction computational cost and use this probability distribution in curve

evolution based boundary extraction.

2.5 Shape distributions

We now review the shape distribution concept that is one of the key ideas used in this dis-

sertation. Shape descriptor based on distributions arises from the idea that a distribution

computed over a parameter extracted from the shape effectively encodes the presence of

parts or visual features of the shape, while being invariant to the positions of these parts or

Page 50: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

28

k

Shapes

−r−r

s

k −r

H(k)

k

1pdf(k)

Figure 2·2: An example of constructing a shape distribution for a curve(left) based on curvature κ(s) measured along the boundary (second graph).Third and fourth graphs show the sketches of pdf(κ) and cumulative dis-tribution function H(κ) of the samples of curvature respectively. Note theinvariance of H(κ) with respect to the choice of the initial point of arc-lengthparameterization.

features. Resulting shape descriptor may be invariant to larger shape deformations preserv-

ing certain prominent features of shapes. Shape distributions are derived representations,

in the sense that these representations are defined on the underlying contour representa-

tions. Moreover, in general case, shape distributions are many-to-one representations, since

different contours can have the same corresponding shape distribution representation.

Let us review the concept of shape distributions in greater detail. In (Osada et al.,

2002), shape distribution is defined as a cumulative distribution function computed on

random samples from the feature function defined on the shape. An illustrative example of

the shape distribution idea is shown in Figure 2·2, using boundary curvature as the feature.

Building a shape distribution is done in 2 steps:

1. Computing samples of the feature function (boundary curvature k(s) in this case)

on the parameter space on which feature function is defined. For example, in case of

boundary curvature, this space is the arclength of the curve.

2. Computing the cumulative distribution function (CDF) H(κ) on feature function

samples.

Distributions are computed on training data and also can be defined analytically. Shape

distribution for a set of shapes where computed as an average of the cumulative distribution

functions corresponding to the individual shapes in the group.

Page 51: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

29

The concept of shape distributions was successfully applied to shape classification tasks.

For example, a method using shape distributions in handwritten digits recognition exper-

iments yielded best performance (Osada et al., 2002) among other published techniques.

Other results using shape distributions in classification experiments include (Thayanan-

than et al., 2003; Mezghani et al., 2004; Ip et al., 2002; Osada et al., 2001). Protein

similarity has been studied using shape distributions in (Canzar and Remy, 2005). These

results indicate that shape distributions are robust, invariant, flexible shape representations

with good discriminative properties. In this work we propose to use shape distribution to

construct a shape prior for use in statistical inference tasks (see Chapter 4). Note that

shape distributions computed from sampled feature functions are not deterministic shape

descriptors. Instead, we introduce deterministically computed shape distributions that we

present in Chapter 4.

Building other types of distribution-based shape representations for recognition pur-

poses has been investigated in several papers. Distributions of radially computed parame-

terizations, known as shape context, were applied to shape classification tasks in (Belongie

et al., 2002; Zhang and Malik, 2003). A related idea named force histogram descriptors

have been used in (Matsakis et al., 2004). (Grigorescu and Petkov, 2003) uses distance

sets computed on contour points for shape recognition. To our best knowledge, these other

types of distribution-based shape representations have not been applied as priors for shape

inference.

2.6 Shape model prior work summary and motivation

Despite the large body of work on shapes, only few types of shape modeling approaches are

implemented using curve evolution, namely generic priors and certain deformable templates

based approaches. Generic approaches do not use specific information on prior shapes,

while deformable template approaches construct the shape deformation space around the

“average” shape, thereby restricting global deformations and impeding generalizability to

unseen shapes. One would like to construct a model that supports larger deformations pre-

Page 52: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

30

serving shape similarity as perceived by humans. Ideally, the model should be constructed

on small training sets and should easily generalize to unseen samples. At the same time,

we would like to incorporate the model in curve evolution framework for solving boundary

extraction problems. Handling registration is also an issue of concern, especially in case

of large deformations. Most widely used shape modeling approaches pre-register (align)

shapes prior to constructing the deformation model. The registration becomes increasingly

difficult when shapes are more different from each other. We would like to handle registra-

tion in transparent and efficient way regardless the magnitude of shape deformations.

Some methods are motivated by similar goals. In (Klassen et al., 2004), an attempt

was made to construct a shape distance measure based on matching angle function shape

descriptors. Invariance properties were encoded into the model by implicit shape correspon-

dence estimation. A certain degree of generalizability was achieved by allowing additional

unpenalized degrees of freedom through boundary stretching. Although this method at-

tempts to improve generalizability, its effectiveness is limited due to being constrained to

a particular along-the-curve parameterization. Articulated models (above) may capture

large deformations of shape parts but are application specific and are not curve evolution

based.

Our belief is that a successful strategy can be implemented through constructing the

distribution of certain shape features defined from the shape. The distribution can accu-

mulate the information about the presence of certain feature values but will not contain the

information about the precise “location” of the feature. Distribution-based shape repre-

sentations have been constructed and have been successfully applied to shape classification

tasks, see Section 2.5. In Chapter 4 we will further explore this idea and construct the

shape prior based on distribution-based shape representations. Maximum entropy model in

(Zhu, 1999) captures perceptual shape similarity under large deformations in probabilistic

framework. We will explore using this model for image segmentation in Chapter 3.

Page 53: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

31

2.7 Appearance models

In a curve evolution context, an appearance model characterizes the prior knowledge on the

image with respect to the object of interest and provides the link between the image data

and the resulting object boundary. In energy based curve evolution framework, appearance

prior is encoded in the intensity term Eint. Below we consider several types of appearance

models.

Boundary based approaches assume that the true boundary is positioned along high

image gradients. A notable example of this strategy is developed in (Caselles et al., 1997).

Gradient field attraction force via orthogonal curves used in (Tagare, 1997) and edge-flow

forces in (Sumengen et al., 2002) are some examples among many. The drawbacks of pure

gradient based methods include narrow capture area, sensitivity to initial conditions, need

for balloon forces, etc. The major limitation in using these methods is the need for strong

edges to be present everywhere along the object boundary. In many applications, edges

can be diffuse, weak, or even nonexistent due to anatomical features, imaging technique,

etc.

In region-based approaches, certain assumptions are made concerning the image inten-

sities inside and outside of the evolving curve. A benchmark approach in (Yezzi et al., 1999)

assumes that at the true position, the boundary maximizes the difference between certain

statistics computed on the inside and outside of the segmenting curve. Another bench-

mark approach in (Chan and Vese, 2001), named active contour without edges, assumes

constant average intensities inside and outside of the region of interest. The data term

in the segmentation functional is defined as the image likelihood given by this piece-wise

constant image model assuming independent identically distributed (IID) Gaussian noise.

In this dissertation we use the model in (Chan and Vese, 2001) to test the effect of different

shape priors in segmentation problems. Another example of a region-based model is the

information theoretic approach in (Kim et al., 2002a), casting the segmentation problem as

maximization of the mutual information between region labels and image intensities. The

Page 54: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

32

evolving curve tends to segment the scene into regions with homogeneous intensities. In

this dissertation we also use the method in (Kim et al., 2002a). Region intensity histogram

priors are constructed in (Herbulot et al., 2004; Puzicha et al., 1999). In (Leventon et al.,

2000b) the prior is directly constructed on training images.

A classical hybrid approach is the piece-wise smoothness model used in (Shah, 1996),

where the image is assumed smooth except on the evolving boundary. A boundary itself

is linked to the image gradient magnitude. Using the energy functional defined in (Shah,

1996), a boundary and image can be simultaneously reconstructed.

A common assumption for all region based statistical methods is the uniformity of a

certain statistic inside the region of interest. However, in many problems, this assumption

may be violated. In this dissertation we focus on the problems that pose difficulties for

current gradient and region based method and develop a new appearance model strategy

tailored to such problems.

Appearance modeling approaches of a different type, we name template-based ap-

proaches, are the counterparts of the shape template-based models. Active Appearance

Model (AAM) proposed in (Cootes et al., 2001) is a classical example. In AAM model, a

template image is linked to a deformable boundary template. The model tries to find the

match between the warped template and the image being segmented, finding the desired

object boundary as a deformed boundary template. The method uses PCA boundary de-

formation model and emphasizes an exact match between the segmented boundary/image

pair and the boundary/image template. Clearly, this method is strong when there is a

sufficient degree of consistency between the training images/boundaries and the segmented

boundary/image and also when there is enough training data to describe possible template

deformations and possible image variations.

In this dissertation we develop an alternative appearance modeling approach. Our

approach is different from AAM type of approaches in the type of boundary/image vari-

abilities accounted for. Active Appearance Model enforces fidelity between the boundary

and the deformable template image model as a whole. Template warpings are the only

Page 55: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

33

possible deformations. The positions of characteristic image/boundary features are linked

to the template. On the other hand, our method abstracts and generalizes boundary ap-

pearance properties, effectively encoding the appearance properties “content” for a given

boundary/image pair. In our method, characteristic image/boundary features have flexible

locations. Hence, our method has potential for greater robustness/flexibility with respect

to large variations of boundary appearance along the boundary itself and across training

boundaries/images. Our method can also be potentially more robust with respect to small

training sets.

2.7.1 Role of distribution in appearance modeling and histogram equalization

We pointed out that the distributional descriptors may be beneficial to describe shape.

Not surprisingly, measuring and comparing intensity distributions has also been used in

the context of appearance model construction, for instance (Freedman et al., 2004; Kim

et al., 2002a; Herbulot et al., 2004; Puzicha et al., 1999). The classical and most widely

used method to model appearance is to specify/compute the histogram of gray level values

over the region (Herbulot et al., 2004) and use it as the prior histogram of intensities in

the segmented region. Constructing appearance models will be considered in Chapter 7 as

an extension of our distribution-based shape modeling approach.

We now review another important application of intensity distributions, namely, image

enhancement through global histogram modification using Partial Differential Equations

(PDE) (Sapiro and Caselles, 1997; Sapiro and Caselles, 1995). We will also use these tools

later in a different context. Besides being a research topic, image histogram equalization

and modification has become a standard part of off-the-shelf image editing software, such

as Adobe Photoshop. We will consider the histogram modification task following (Sapiro

and Caselles, 1997). Given the image I(x, y)|R × R we would like to evolve its gray level

values in a unique and spatially uniform way so that the underlying gray level histogram

hI(λ) evolves to match a target histogram h∗(λ). Let the target Cumulative Distribution

Function (CDF) be defined as H∗(λ) =∫ λ

0 h∗(s)ds and the current image CDF be defined

Page 56: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

34

similarly HI(λ) =∫ λ

0 hI(s)ds. It can be shown that the gray level intensity flow

∂I(x, y, t)

t= H∗[I(x, y, t)] − HI [I(x, y, t)] (2.26)

always exists, is unique, and has the desired solution as a limit point

limt−→∞

hI(t) = h∗ (2.27)

The importance of this method lies in the possibility of evolving the function (image

I in this case) so that the distribution computed on it matches the target distribution.

The simplicity and nice properties of the flow are due to the lack of any constraints on the

values of the image values at different locations.

We will extend the general idea of matching two distributions using an evolution PDE

to a more general class of features that are computed with respect to a curve and/or image

intensities. Much of the difficulty in matching these distributions is related to the fact

that the different variables are subject to numerous constraints, so that their independent

evolution is not possible.

2.8 Distribution difference measures

We have reviewed two topics of interest that will be further explored throughout this

thesis, namely, distribution-based shape descriptors in Section 2.5 and distribution-based

intensity descriptors in Section 2.7.1. These descriptors have been previously used in

comparison schemes through the definition of the distribution difference measure. Hence,

such distribution difference measures play an important role in this work.

The natural questions are:

1. What form or parameterization of the distribution should we use (PDF, CDF or

parametric measure)

2. What measure of difference between distributions should be used.

Page 57: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

35

We first consider the choice of distribution parameterization. The highest level choice

is between parametric and non-parametric methods in describing distributions. We opt for

non-parametric methods for two reasons. First, we desire a rich descriptor of distributions.

In fact, we would like a descriptor with theoretically unlimited length in order to be able

to encode shapes, which are infinite dimensional objects. Second, we aim at universal

methodology, applicable to any feature computed on any shape. Therefore, no predefined

parametrical distribution shape can be a universally good model.

The second choice is between particular distribution descriptors and associated distri-

bution difference measures. In many applications, in particular in information theory and

statistics, the distances are defined on PDFs. Let us suppose that we have two PDFs of

features p1 and p2. Each has narrow peak and these peak are close to one another in 2 dis-

tributions but do not overlap (see Figure 2·3, panel (A), top). Obviously, a direct measure

of distribution difference constructed of these PDFs will not decrease even when peaks in

p1 and p2 are very close. “Direct” here means comparing values of two distributions at the

same parameter value. In the context of discretized histograms, these measures are known

as bin-to-bin measures (Antani et al., 2002). For instance, consider a Minkowski-form

distance (L1 norm), given by

dM (p1, p2) =

|p1 − p2|dλ (2.28)

We illustrate the resulting measure dM (p1, p2) as the shaded area in Figure 2·3, panel (A),

bottom. The measure in eq. 2.28 is small only when peaks in p1 and p2 match exactly.

Small difference in peak position sharply increases the measure up to its maximum value of

2 (for normalized distributions). Further increase of the distance between peaks of p1 and

p2 (see Figure 2·3, panel (B)) does not increase the distance measure. Other commonly

used mutual information measure, such as Kullback-Leibler (K-L) divergence (Kullback,

1968), and Jeffrey divergence (Cover and Thomas, 1991) have qualitatively similar behavior

with respect to cases A and B in Figure 2·3. We argue that in case (A), p1 and p2 are close

to each other because each of them describes the dominant value for the feature and these

Page 58: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

36

dominant values are close to one another. Clearly, a bin-to-bin measure on PDFs does

not capture our intention to get a small measure when dominant object feature values are

close. In fact, similar considerations have been made in designing the histogram difference

measures for similarity based image retrieval (see (Rubner et al., 1999; Ling and Okada,

2006) and references therein).

|difference|

1 2p

|difference||difference|

p ppp 21 1

|difference|

P

P

P

P

1 1

22

(A) (B) (C) (D)

Figure 2·3: L1 difference measure computed on PDFs and CDFs. Differ-ence value is given by shaded area. Panels (A) and (C): modes of p1 andp2 are very close; L1 difference on CDFs produces small value. Panels (B)and (D): modes of p1 and p2 are further from each other. L1 difference onCDFs gives larger value, while L1 difference on PDFs does not change.

Our illustrative example shows that we need to compare the distributions across dif-

ferent values of the parameter in order to quantify the similarity of the distributions. In a

discrete histogram comparison context, such measures are known as cross-bin dissimilarity

measures (Chupeau and Forest, 2001). To this end, several options have been studied. A

quadratic form distance between distributions has been used for image retrieval in (Niblack

et al., 1993). This distance does not establish the correspondence between masses in distri-

butions and is not theoretically justified. In image retrieval experiments it has been shown

to overestimate the mutual similarity of flat distributions. So called Earth Mover’s Dis-

tance (EMD) in (Rubner et al., 1999) is defined as the minimum work needed to transform

PDF p1 into p2, where a unit of work is given by moving a unit of mass over a unit distance.

Page 59: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

37

This distance is seemingly a good candidate for a distribution difference measure. However,

the difference measure can only be found as a solution to an optimization problem. We

prefer an analytical, and simple, expression for the distribution difference measure.

A recently developed measure of difference between distribution as the length of geodesic

path on the manifold of distribution has been proposed in (Mio et al., 2005). This measure

of distribution difference is seemingly a logical choice of distribution difference. However, as

in the case of EMD measure, the value of distribution difference is a solution to a numerical

optimization problem.

To this end consider a measure defined on CDFs P1 and P2, corresponding to p1 and p2

respectively. Considering the example of Figure 2·3 (A), where p1 and p2 where considered

“close”. In this case, while the pointwise measure on PDFs is large, the measure on CDFs

is small (see Figure 2·3 (C,D)). Moreover, as the difference between dominant feature

increases, the measure defined on the difference in the CDFs will increase monotonically

and smoothly which is the behavior we desire. We therefore conclude that a reasonable

“direct” measure to quantify the difference between feature distribution can be constructed

on the corresponding CDFs. In fact, the “match” measure of the difference between CDFs

has been proposed in (Shen and Wong, 1983) as the L1 norm of the CDF difference

dM (P1, P2) =

|P1 − P2|dλ (2.29)

The measure in eq. 2.29 represents a particular case of the EMD distance in (Rubner

et al., 1999). The measure in eq. 2.29 is not differentiable, which creates difficulty in using

variational approach. However, differentiable approximations can be used.

Later in this thesis we use L2 measure of CDF difference

dM (P1, P2) =

(P1 − P2)2dλ (2.30)

which has qualitatively similar properties with measure in eq. 2.29 while suppressing large

differences (outliers). The difference measure in eq. 2.30 is differentiable and leads to

Page 60: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

38

straightforward derivation of curve evolution equations. Moreover, the difference measure

in eq. 2.30 has been used for shape distribution matching with great success (Ip et al., 2002;

Osada et al., 2002; Osada et al., 2001; Mezghani et al., 2004), outperforming alternative

shape descriptors.

2.9 Classification

So far, we reviewed the background related to the first and second major directions of this

work: prior shape and appearance models construction and shape extraction methodology

utilizing the prior information. We now focus on the third major direction - inferencing on

shapes, given that the shapes have been extracted from images. We focus on morphological

analysis of shape differences based on corpus callosum brain shapes, motivated by the

Human Brain Project. In particular, we are interested in two tasks:

1. Localization and quantification of shape differences between groups of subjects.

2. Automatic discrimination between groups of subjects based on shapes.

Since the notion of morphological differences implies the need for localization, we select

a parametric shape descriptor methodology using medial axis representation, natural for

elongated shape of the corpus callosum. Sampled medial representations yield feature sets

describing shapes. We review the techniques utilized to extract medial axis representa-

tions and feature sets in Chapter 8. The task of localization and quantification of shape

differences is carried out by statistical analysis of the medial axis representation.

We are also interested in constructing optimal classifiers on these sets to tackle the

discrimination task (task 2 above). We are interested in 2 class discrimination problem

where the classifier is build given the training data with known class memberships. The

classifier is tested on testing data, not used to train the classifier. Classifier performance

is given by the estimated probability of the correct classification.

We now review some elements of classification theory used in this dissertation. We first

consider the MMSE (Minimum Mean-Square Error) linear discriminant function classifier.

Page 61: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

39

Suppose observations come from two categories w1 and w2 with n1 and n2 observations in

each category respectively. Each observation is given by a vector xi of length N . We first

compose the augmented feature matrix Y as follows

Y =

1n1

Xn1

−1n2−Xn2

(2.31)

where 1n is the column of n ones and Xniis the matrix composed of observations in the

category wi stacked as rows. A vector of margins b is defined as

b = 1n1+n2(2.32)

Margin value for a particular observation is the penalty of misclassifying that observation.

Since in our work all observations have equal importance, all margins are set to be equal.

We seek the decision boundary (linear MMSE classifier) given by a vector a of length

N + 1. The vector a specifies the hyper-plane that separates the N -dimensional space of

observations so that prior observations from each class stay on the same side of the hyper-

plane. The distance between the observation and the hyper-plane should be equal to the

margin value for the corresponding observation in the least square sense. The criterion

function to be minimized is

E = ||Ya − b|| (2.33)

The solution can be found using Pseudo-inverse

a = (YtY)−1Ytb (2.34)

A new observation is decided to belong to class w1 or w2 according to the half-space it falls

into. Formally, a discriminant function is given by

g(x) = a0 +∑

i=1..N

aixi (2.35)

The observation is decided to belong to class w1 if g(x) > 0 and to class w2 otherwise.

Page 62: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

40

Let us now introduce another decision boundary criterion function. Suppose we want

to find the decision boundary that minimizes the number of misclassified data samples.

Such misclassification criterion function can be written as

E = −

n1+n2∑

i

sign(biYia) (2.36)

The criterion function in eq. 2.36 can not be minimized using gradient descent approaches;

therefore, in practice, differentiable approximations are used, such as Perceptron criterion

function. We use a direct differentiable approximation to misclassification criterion that

approximates the criterion function in eq. 2.36 asymptotically. Let the error function be

defined as

E = −

n1+n2∑

i

atan

(biYia

γ

)

(2.37)

where γ is a small number (we use γ = 0.01). In the limit of γ = 0, the criterion in eq. 2.37

is equivalent to the one in eq. 2.36. A decision boundary minimizing eq. 2.37 can be found

by gradient descent.

As the second classifier technique, we use the learning-based technique ada-boost (Fre-

und and Schapire, 1999), which constructs the discriminant function as a non-linear com-

bination of iteratively constructed sequence of “weak” classifiers. Each subsequent “weak”

classifier minimizes the weighted misclassification error criterion, where the weight given

to a particular sample depends on whether the sample was previously misclassified. After

a sufficient number of steps, overall decision boundary is guaranteed to classify correctly

all training samples. The only requirement for the “weak” classifier is the guarantee to

have the error rate lower then 0.5.

For the weak classifier we use the weighted modification of the differentiable misclassi-

fication criterion in eq. 2.37. We define the criterion function

E = −

n1+n2∑

i

Di atan

(biYia

γ

)

(2.38)

Page 63: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

41

where Di are the weights applied to individual data points in Ada Boosting algorithm

and the summation is carried out over all data points. The criterion function in eq. 2.38

guarantees to achieve the training error lower then 0.5 and hence is suitable for use in Ada

Boosting algorithm. Ada-boosting can construct a complex decision boundary, which is

best suitable for separable cases with a large amount of training data. Since we have limited

and non-separable training data we only use few Ada-boost steps to prevent over-fitting.

Page 64: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

42

Chapter 3

Maximum entropy shape model as a curve

evolution prior

In the previous chapter of this dissertation we overview the approaches to introduce the

prior shape knowledge into the segmentation problems. In this chapter, we consider the

maximum entropy model proposed in (Zhu, 1999). In shape sampling experiments, this

model has been shown to yield samples from the distribution that appear to share visual

characteristics of shapes used to construct the model. We propose an approximation that

significantly accelerates model construction. Next, we propose to use this model in curve

evolution context using numerical curve flow computation and apply our method to segment

images with very low SNR. The work reported in this chapter previously appeared in (Litvin

and Karl, 2002).

3.1 Introduction

An attractive approach to boundary extraction problems is to adopt an explicitly Bayesian

framework, and to express the prior boundary shape information in the form of a probability

distribution function (pdf) on the space of curves or deformations. Such Bayesian perspec-

tives based on explicit shape priors and pdfs lead naturally to “optimal” segmentation and

classification approaches. However, most such approaches do not consider the distribution

measure defined on the space of shapes, but rather define the probability distributions

on parameterized deformations (see deformable template approaches in Chapter 2). Di-

rect comparison of parameterizations and deformation space modeling makes most of these

approaches valid on relatively small deformations with respect to an “average” shape.

Page 65: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

43

(Zhu, 1999) has proposed “natural” and data driven maximum entropy models of object

shape (see PDF construction approach in Chapter 2). While computationally expensive and

burdensome to find, these models have demonstrated the ability to capture intrinsic and

complicated characteristics of object structure. The model in (Zhu, 1999) was motivated by

the desire to capture perceptual shape similarity. A model having such properties would

seem to possess good characteristics for such practical problems as segmentation. Here

we investigate the possibility of using such a shape model for image segmentation. To our

knowledge, this is the first attempt at using such models to solve practical problems. In the

next section we briefly review the concept of constructing pdfs of shape using the maximum

entropy principle (Zhu, 1999), and present our technique to reduce the computational cost.

In the third section we apply the approach to the problem of segmentation and show

preliminary results.

The basic idea of the method in (Zhu, 1999) is to construct an approximation to the

true pdf based on the space of shapes such that this pdf will capture some features or

statistics from a training set but will have maximum variability in unconstrained direc-

tions. Increasing the number of representative features that are used we can make our

approximated pdf closer to the true pdf for a given class of shapes.

3.2 Constructing a pdf on the space of shapes

Here we summarize the maximum-entropy-based shape model we use derived from (Zhu,

1999) (see Chapter 2).

Zhu defines a shape model obtained from a set of training data. The model is required

to have maximum variability in unconstrained directions while capturing some important

data statistics for a set of “significant” shape features φ(α)(s), which are measured along

the shape boundary, where s is the arc length along the boundary and α is a feature index.

The shape feature statistics are defined in the form of observed histograms:

µ(α)(z) =

δ(z − φ(α)(s))ds (3.1)

Page 66: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

44

where z is the feature value. The observed statistics µ(α)obs(z) that the model has to satisfy

are given by the average over statistics computed on M training shapes

µ(α)obs(z) =

1

M

M∑

i=1

µ(α)i (z) (3.2)

where µ(α)i (z) is the statistic of the feature α computed on the ith training shape.

Let us define the boundary orientation function θ(s) to be the tangential direction at

the boundary point s. The only feature we use in this work is curvature, defined as the

derivative of the boundary orientation function with respect to the arc length:

φ(1)(s) =dθ(s)

ds(3.3)

We denote the pdf of the curve Γ for the given class of shapes by p(Γ), where Γ stands

for the instance of a boundary. In order to satisfy the maximum entropy condition and

have the correct observed statistics, the pdf takes the form:

p(Γ) =1

Zexp

(

−k∑

α=1

λα(z)µα(Γ, z)dz

)

(3.4)

where λα(z) is the Lagrange multiplier function, µα(Γ, z) is the statistic corresponding to

the shape Γ, and Z is the normalization constant. Discretizing the range of feature values

and considering only one feature, eq. 3.4 becomes

p(Γ) =1

Zexp(− < Λµ(Γ) >) (3.5)

where Λ is the set of Lagrange multipliers and µ(Γ) is the discretized statistic (histogram)

for the curve Γ. The set Λ can be obtained as the stationary point of the differential

equation

dt= Ep(Γ;Λ)[µ(Γ)] − µ

(α)obs (3.6)

where Ep(Γ;Λ)[] is the expectation with respect to the probability distribution function

p(Γ; Λ) computed on the space of all possible curves.

Page 67: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

45

Figure 3·1: MCMC move proposal in (Zhu, 1999). Point i in configurationA is moved into one of the 8 positions under the constraints on the lengthof linelets connecting node i with its neighbors.

The key difficulty is the computation of the expected statistics from the current distri-

bution Ep(Γ;Λ)[µ(Γ)]. The quantity Ep(Γ;Λ)[µ(Γ)] is obtained as the average of histograms

over samples from the distribution where samples are drawn using the Metropolis-Hastings

algorithm to simulate a random walk in the space of possible configurations of a closed

discrete contour. Curve nodes are positioned on grid points. One move in the space con-

sists of moving one node of a curve to one of the 8 neighboring positions. The distance

between nodes is constrained within a certain bound, see Figure 3·1. The resulting MCMC

simulation is very slow and the number of moves necessary for obtaining a sample from

the distribution is of the order of 109, making this MCMC simulation too slow for use

in practical applications where model construction measured in hours or days is not an

acceptable choice.

As a result we have developed a more efficient method of MCMC implementation. In

addition we have developed an even more efficient approximation that we discuss after

introducing our exact MCMC implementation. Unlike in (Zhu, 1999), in our algorithm the

curve nodes no longer lie on a grid but take any position on the plane, though we restrict

the linelets connecting nodes to now have a fixed length. A proposed move of our MCMC

Page 68: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

46

simulation now changes the coordinates of all nodes in such a way that the distances

between all adjacent nodes remain the same and fixed. This new implementation has

two important advantages over the original MCMC implementation in (Zhu, 1999). First,

by fixing the linelet length we eliminate the need to continuously approximate uniform

sampling along the curve. Second, and most important, is the considerable acceleration of

the MCMC simulation attained using our method. In the original MCMC method a move

of one node of the chain into a new configuration should be accompanied by successful

moves of neighboring nodes in a certain direction. Therefore, the probability of a valid

change of configuration is very low or alternatively a change in configuration requires a

large number of MCMC moves. In our new method all nodes move during each step;

therefore, the curve moves in the configuration space more quickly.

We now discuss our MCMC method. As in the original method, the first node is chosen

at random. Two new trial positions of the chosen node are proposed at a distance +δ and

−δ in the direction of the local normal to the curve, δ being a small number. One of these

two proposed positions is then chosen at random. Next, two neighboring nodes are moved

in the direction of the mean of the angle between old position of the starting node, the

neighboring node, and the new position of the starting node. Subsequent pairs of neighbors

are moved according to the same rule until the last node is reached. The new position of

the last node is then uniquely identified. An illustration of the proposed scheme is given

in Figure 3·2. It is easy to see that the proposed move is reversible, which is a requirement

for a valid MCMC simulation step. The new curve remains uniformly sampled. We denote

the initial position of the entire curve by A and the new proposed position by B. Let us

define the probability that configuration A is changed into B as K(A → B). Since we have

only two possible equally likely configuration changes, K(A → B) = 0.5. Probability of

the reverse change K(B → A) is also equal to 0.5 since there are always two candidate

positions for the node. The proposed move is accepted with probability:

Pa(A → B) = min

(K(B → A)p(ΓB)

K(A → B)p(ΓA), 1

)

= min

(p(ΓB)

p(ΓA), 1

)

(3.7)

Page 69: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

47

new position

dd

dd

Starting node

Figure 3·2: Our new scheme of proposed MCMC move (3 nodes, includingthe starting node are shown). Black circles represent the initial node config-uration. White circles represent 2 candidate positions for the starting nodeand the new positions of the neighbors after a starting node was moved intothe new (bottom) position.

where p(ΓA) and p(ΓB) are the values of the probability function p(Γ) evaluated on con-

figurations A and B respectively.

Overall, this move strategy is repeated until the chain converges, which means the

convergence of the expected statistics Ep(Γ)[µ(Γ)]. In practice, we observe the estimated

feature histograms as the MCMC simulation proceeds and the number of moves increases

until no further significant changes are observed. We found that after 20,000 moves no

visible changes were observed in µobs estimated from drawn samples. The number of MCMC

steps decreased by the order of ≈ 105 with respect to the original MCMC simulation scheme

in (Zhu, 1999). However, each MCMC step is computationally N times more extensive,

where N is the number of curve nodes. The overall computation time improvement factor

effectively becomes ≈ 103.

We can gain even greater time savings through some approximation. This approxi-

mation is based on the intuitive idea that as the number of nodes in the curve increases,

distant nodes become more and more de-correlated and the curve locally behaves like an

open curve. In the case of an open curve the MCMC simulation proceeds as follows. Sup-

pose the initial configuration is denoted as A. First, a trial node is chosen at random. Then

Page 70: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

48

dd

d

Trial node

new position

Figure 3·3: Our scheme of open curve MCMC move proposal. Black cir-cles represent the initial configuration. White circles represent the finalconfiguration after the curve is bent at the trial node.

we choose a random angle value in range [−0.1..0.1]. This angle value us used to bend the

curve at the chosen node, yielding the configuration B. The configuration change is re-

versible and probabilities K(A → B) and K(B → A) are equal. Therefore, the change is

accepted with probability Pa = min[p(B)/p(A), 1]. An illustration of the proposed scheme

is given in Figure 3·3. The chain converges after approximately 20,000 MCMC steps but

at a computational cost about two orders of magnitude smaller compared to a closed curve

MCMC simulation. The overall computational expense improvement factor (with respect

to the original MCMC method in (Zhu, 1999)) is about 105. Let us denote the expected

statistics computed on the samples drown from the distribution as µsym = Ep(Γ)[µ(Γ)].

We have run MCMC simulation with an open curve and with a closed curve with different

numbers of nodes and compared the simulated sets of features µsym. We find that as the

number of nodes increases the two methods give increasingly similar values for estimated

parameters µsym. The exact effect of our MCMC approximation on the constructed model

and the segmentation results remains to be investigated. At this point, we assume that the

open curve MCMC approximation in model construction preserves the qualitative proper-

ties of the model to encode the perceptual shape characteristics. Therefore, we use open

curve MCMC simulation instead of closed curve MCMC simulation with N = 80 to build

our shape priors. Time needed to compute the model parameters is measured in minutes

using our MATLAB implementation.

Page 71: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

49

3.3 Application to curve-evolution based segmentation

In this section we apply the shape prior identified using the methods we have described

in Section 3.2 to segment a shape based on observation of a noisy image. We pose the

segmentation problem in probabilistic MAP formulation, maximizing the posterior density

p(Γ|I) for the shape given the data. Using Bayes rule and converting to the equivalent

energy formulation by taking negative log

Γ∗ = argmax

Γ

p(Γ|I) = argmax

Γ

p(I|Γ)p(Γ) =

argmin

Γ

−1

2(u − v)2 − αlog(p(Γ)) (3.8)

We express image likelihood −log(p(I|Γ)) as the intensity component of the energy Eint =

−12(u − v)2. Assuming bi-level image, this energy attempts to maximize the difference

between u and v - the averaged intensities inside and outside of the curve Γ respectively,

thereby moving the curve towards the object boundary. The shape prior energy is Eshape =

−log(p(Γ)). Regularization parameter α governs the importance of the prior term.

We use the curve evolution framework to minimize the energy in eq. 3.8. Therefore, we

find the curve flow dΓdt

corresponding to the steepest decrease of the energy (the gradient

flow). The gradient flow corresponding to the first term is given by

(dΓ

dt

)

int

= (u − v)(2I − u − v) ~N (3.9)

where ~N is the local normal and I is average image intensity in the neighborhood of radius

R of a node. This averaging prevents the curve from wrapping on itself due to local noise

fluctuations. The smoothing radius R is set to half of the distance between neighboring

curve nodes.

The gradient flow corresponding to the second term in eq. 3.8 is computed for the

discretized curve by constructing a differentiable approximation of p(Γ) in eq. 3.5 based

directly on feature values instead of histograms. First, we define the continuous valued

Page 72: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

50

Lagrange multiplier function λc as a piecewise linear approximation of the set of Lagrange

multipliers Λ corresponding to the model p(Γ). Now the inner product in eq. 3.5 can be

approximated as a sum over Lagrange multiplier function λc values evaluated on the set

of discrete feature values. Since the feature values are curvatures θn measured on the

curve nodes, the final summation is over the curve nodes. The resulting differentiable

approximation is given by

p(Γ) = exp (− < Λ, µ(Γ) >) ≈ exp

(

−1

L

L∑

n=1

λc(θn)

)

(3.10)

where L is a number of curve nodes. Finally, the shape prior energy is given by

Eshape =1

L

L∑

n=1

λc(θn) (3.11)

which is the sum of Lagrange multiplier function values corresponding to the curvatures

measured on the curve nodes. The minimizing flow(

dΓdt

)

shapecorresponding to the shape

prior term in eq. 3.11 is now found by computing numerical partial derivatives of eq. 3.11

with respect to perturbations of curve nodes.

The overall curve flow minimizing the energy in eq. 3.8 is given by the sum of the two

components:

dt=

(dΓ

dt

)

int

+

(dΓ

dt

)

shape

(3.12)

The difficulty in implementing the resulting curve flow is the highly non-convex nature of

the probability distribution function p(Γ). This property can be explained by the presence

of potential barriers in the prior energy given by eq. 3.11. In practice, the set of Lagrange

multipliers Λ can contain very high values. For instance, it happens when some values of the

observed histograms µobs are small. Such high values of the Lagrange multiplier function

effectively forbid corresponding values of curvature (introducing large term in the sum in

eq. 3.11). In the process of curve evolution, configuration changes through intermediate

states that produce such “forbidden” curvature values can not be realized. As a result, the

evolution may stop prevented by a local increase of the energy due to just one high value

Page 73: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

51

of the Lagrange multiplier function. We use two approaches to overcome this difficulty.

First, we use sufficiently coarse histogram and Lagrangian multiplier discretization. We

discretize the histogram using 30 bins. Coarse histogram leads to the smooth Lagrangian

multiplier function. Second, we manually threshold the Lagrange multipliers Λ to eliminate

high values. Of course, this modification represents an uncontrollable change of the prior,

but we found it helpful to reduce the local minima problem.

Another difficulty is the sensitivity of the flow with respect to curve perturbations.

The prior energy is the function of the curvature values which are known to be sensitive to

noise. The computation of the flow involves computing the derivatives of the energy which

magnifies the noise sensitivity. In our implementation, this noise sensitivity is partially

eased by the smoothing effect of periodic curve resampling performed after every few steps

of evolving the curve.

(A) (B)

Figure 3·4: Ground truth image with noise added constructed on the shapefrom dataset 1 (panel A) and dataset 2 (panel B). Black solid line: trueshape; White dashed line: initial contour

We perform two experiments on two different classes of shapes. For each experiment

we generate the shape prior using a set of training shapes obtained by slight perturbations

(non-isotropic scaling) of the basic shape characterizing the class. One shape from the

Page 74: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

52

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=0

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=0.67

Figure 3·5: Data set 1. Result of segmentation without using prior shapemodel (left) and the best result using the model (right). True boundary isshown by straight line and circles show reconstructed boundary.

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=0

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=1.5

Figure 3·6: Same as figure 3·5 for data set 2.

Page 75: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

53

training set is then used to generate noisy observation data. The regions inside and outside

of the shape have values 1 and 0 respectively and independent Gaussian noise with a

standard deviation of σn = 3 is added to the resultant clean image to create the noisy

data. The resulting segmentation problem is difficult to solve well without a prior model

as we will show next.

The first training data set contains triangular shapes with sharp corners and the second

set contains shapes with smoother corners and nearly circular boundaries. In Figure 3·4,

panels (A) and (B) we show the noisy image constructed on a shape from dataset 1 and 2

respectively. In Figures 3·5 and 3·6 we show the results of segmenting a given shape from

each of the two data sets. On left panel we show the results obtained with no prior (α = 0).

On the right panel we show the corresponding result when a shape prior was used (α 6= 0).

The best result was chosen from results obtained for various α. As we can see, no shape

prior results in the noisy boundary. Conversely, using prior shape information smoothes

the boundary while preserving the high curvature areas.

An alternative, and currently popular, approach to regularize segmentations is to use

a generic boundary length penalty term:

Eprior =

Γ

ds (3.13)

In order to show the advantage of using a trained prior curve model over this more generic

approach we show the results using this generic approach in segmentation of one of our

shapes for optimal values of the prior weight α in Figure 3·7. We see that using a simple

length penalty can give a smooth boundary at the expense of severe corner smoothing.

3.4 Discussion

In this chapter we showed the results of applying a data derived pdf on the space of shapes

to the problem of segmenting noisy images. We used the pdf on shapes developed in (Zhu,

1999), which captures the perceptual similarity of shapes. The model in (Zhu, 1999) has

Page 76: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

54

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=0.25

0 50 100 150 200 2500

50

100

150

200

250

R=11 σn=3 α=1.0

Figure 3·7: Segmentation obtained using penalty function in eq. 3.13.

been previously used exclusively in shape sampling experiments. The computational cost

of constructing the model is prohibitive in the original formulation.

We developed an improvement of the MCMC method for simulation of shapes sampled

from the given distribution. We then applied this method to several segmentation examples.

In our example experiments we did observe significantly better segmentation results when a

prior shape model was used when compared to results obtained with no prior term or those

using a generic length penalty term. Using the proposed technique we were able to recover

a boundary which retained focused corners while smoothing out excessive variability and

fluctuation on the boundary.

In the process of constructing the model, we approximated the closed curve MCMC

chain by an open curve MCMC chain simulation. This approximation must be studied in

better detail in order to better understand its effect on the segmentation solution. Another

possibility is to learn the functional mapping between open and closed MCMC simulation

results (for instance using neural networks) in order to obtain better closed curve MCMC

chain approximation.

We only used one definition of feature function, namely the curvature. Therefore, the

descriptive power of our model is limited. Using other definitions of feature functions

Page 77: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

55

proposed in (Zhu, 1999) significantly increases the computational cost of constructing the

model. For instance, the “co-linearity” feature constructor in (Zhu, 1999) requires an addi-

tional layer of MCMC sampling that increases the cost by orders of magnitude (we did not

implement this feature function). Even constructing the model efficiently may be possible

using such advances feature functions, using resulting distributions in curve evolution may

pose additional difficulties (theoretical and computational) due to non-deterministic nature

of these feature functions. We conclude that practical use of the models developed in (Zhu,

1999) still presents difficulties. Potentially, the progress can be made using determinis-

tic definitions of the feature functions that we develop in the next chapter in a different

context.

Page 78: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

56

Chapter 4

Shape-distribution-based prior shape model

This chapter is devoted to the core topic of this dissertation, the construction and use of

shape-distribution-based shape models. In the following sections we discuss the motivations

behind using shape distributions, and introduce the framework of constructing the shape

prior based on shape distributions. The work reported in this chapter was published in

(Litvin and Karl, 2004a; Litvin and Karl, 2004b; Litvin and Karl, 2004c).

4.1 Motivation

In Section 2.4 we presented a review of prior shape modeling approaches. We believe that

a significant gap exists in the range of available techniques. On one end of the spectrum,

“generic” shape priors penalize some property of the shape, such as curvature or shape

area. These methods are too simple to be effective, since they merely smooth the curve.

In challenging applications more guidance is needed to overcome noise, occlusion, etc. In

particular, the solution should be influenced by the prior specific to the expected class of

objects. On the other hand, template based methods are usually specific to a particular

shape parameterization and do not generalize well. The goal of this part of the dissertation

is to develop a shape modeling approach that would possess better generalization properties

and be robust in the face of a small number of training examples. We aim at a model that

can be used in boundary inference using a curve evolution framework. We also need to

construct the model keeping in mind the need to register (align) shapes. Registering highly

variable shapes is a difficult task on its own. We desire a deterministic and computationally

efficient model.

Motivated by limitations in existing shape modeling approaches and by the represen-

Page 79: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

57

k

Shapes

−r−r

s

k −r

H(k)

k

1pdf(k)

Figure 4·1: An example of constructing a shape distribution for a curve(left) based on curvature κ(s) measured along the boundary (second graph).Third and fourth graphs show the sketches of pdf(κ) and cumulative distri-bution function H(κ) of curvature respectively. Note the invariance of H(κ)with respect to the choice of the initial point of arc-length parameterization.

tational richness of the distribution-based shape descriptors (see Section 2.6 for prior work

review), we propose to use shape distribution to construct a shape prior for use in bound-

ary inference tasks. In (Osada et al., 2002), shape distributions are defined as sets of

cumulative distribution functions (CDFs) of feature values (one distribution per collection

of feature values of the same kind) sampled along the shape boundary or across the shape

area. As shown by recent shape classification experiments (see Section 2.5), such shape

distributions can capture the intuitive similarity of shapes in a flexible way while being

robust to a small sample size and invariant to an object transformation.

An illustrative example of the shape distribution idea is shown in Figure 4·1, using

boundary curvature as the feature. Building a shape distribution is done in 2 steps:

1. Computing the curvature function on the shape boundary κ(s)

2. Computing the cumulative distribution function H(κ) on samples from the curvature

function

Our prior is based on an energy which penalizes the difference between a set of feature

distributions of a given curve and those of a prior reference set. Such prior shape distri-

butions can capture the existence of certain visual features of a shape regardless of the

location of these features.

Page 80: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

58

4.2 Our Formulation

First, let us summarize the important properties of shape distribution based shape repre-

sentations useful for constructing the shape prior (see Section 2.5):

• Invariance. Representations can be constructed to be invariant to rigid motion,

rotation, scaling, and mirror imaging by the proper choice of feature functions. In

fact, all definitions of shape distributions used in this work satisfy this important

property. Invariance property also eliminates the need to register shapes.

• Robustness. Small perturbations of shapes lead to small changes of the computed

distributions.

• Metric. The measure of shape distance defined using shape distributions possesses

the property of norm. Symmetry of the measure is one of the important consequences.

• Generality. The shape distribution methodology allows for the design of different

feature functions that are easily tailored to a particular problem.

• Generalizability. It has been shown in shape classification experiments, that shape

distribution based shape representations generalize well to unseen shapes in the train-

ing set even when sample size is small. Our experiments also confirm this property.

An important and key difference of our formulation of shape distributions, as opposed

to the original formulation in (Osada et al., 2002), is our deterministic framework. In

(Osada et al., 2002), distributions were computed on random samples from feature func-

tions, while we construct the distributions over the entire feature functions. Therefore,

the original shape distribution-based descriptors incorporate statistical uncertainty, while

our shape distribution represents a deterministic descriptor. Without this deterministic

approach, we would not be able to incorporate shape distributions as prior in a curve evo-

lution framework. All definitions made in this section are in the continuous domain unless

specified otherwise. Our computations are of course performed in the discrete domain by

discretizing curves, distributions, and curve evolution equations.

Page 81: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

59

Multiple feature functions can be defined to characterize a shape or groups of shapes.

So far, we reviewed the feature function defined as boundary curvature. Separate fea-

ture functions capturing different characteristics of shapes can be combined in a single

framework, creating a more versatile prior.

4.2.1 A prior energy based on shape distributions

We now introduce our formulation of the shape prior in the continuous domain. Let Φ(ω)

be a continuously defined feature (e.g. curvature along the length of a curve) on the domain

Ω and ω is the element in the space Ω. Let λ be a variable spanning the range of values Λ

of the feature. Let H(λ) be the CDF of Φ computed on the entire domain Ω:

H(λ) =

Ω hΦ(ω) < λ

Ω dω(4.1)

where h(condition) is the indicator function, which is 1 when the “condition” is satisfied

and 0 otherwise. Unlike in the original shape distribution formulation in (Osada et al.,

2002), our shape distributions are defined as deterministic descriptors.

We define the shape prior energy Eshape(Γ) for the boundary curve Γ in eq. 2.2 based

on shape distributions as:

Eshape(Γ) =M∑

i=1

wi

Λ

[

H∗i (λ) − Hi(Γ, λ)

]2dλ (4.2)

where M is the number of different distributions (i.e. feature functions) being used to

represent the object, Hi(Γ, λ) is the distribution function of the ith feature function for

the curve Γ, and the non-negative scalar weights wi balance the relative contribution of

different feature distributions. Prior knowledge of object behavior is captured in the set of

target distributions H∗i (λ). We choose L2 measure defined on CDFs due to its attractive

properties (see Section 2.8). This measure represents a cross-bin distribution difference

measure that grows monotonically for increasingly dissimilar distributions. In addition, this

measure is differentiable with respect to curve deformations, which is a property necessary

Page 82: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

60

to derive the energy minimizing curve flows. Note that our shape energy definition is a

deterministic measure due to our deterministic definition of the shape distribution based

shape descriptors.

We propose three strategies to define the target distributions H∗i in eq. 4.2:

1. The target distributions can correspond to a single prior shape. The resulting prior

penalizes the distance between the current curve Γ and the prior curve in terms of

their shape distributions.

2. The target distributions can be computed as averages derived from a group of M

training shapes:

H∗k(λ) =

1

M

M∑

i=1

H ik(λ) (4.3)

where H ik(λ) is the distribution of type k corresponding to the prior shape i.

3. The target distributions can be specified by prior knowledge (e.g. the analytic form

for a primitive, such as a square)

It is important to stress that the prior on the group of shapes can be constructed without

the need to register shapes due to the invariance properties of shape distribution represen-

tations.

In cases where the target distributions correspond to a single shape, eq. 4.2 can be

expressed as the distance between the two curves. For two curves Γ1 and Γ2, the measure

of dissimilarity is then

d(Γ1, Γ2) =

M∑

i=1

wi

Λ

[

Hi(Γ1, λ) − Hi(Γ2, λ)]2

dλ (4.4)

This definition of the distance is differentiable and is a metric. This measure is exactly the

one used in (Osada et al., 2002) in the shape classification framework when measuring the

distance between two shapes.

Page 83: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

61

4.3 Minimizing flow computation

Recall that the second major direction of this dissertation is using shape model to ex-

tract the boundary from images using constructed shape models. We utilize the energy

based curve evolution framework, where the curve is found as the minimizer of the energy

functional E(Γ):

Γ∗ = argmin

Γ

E(Γ) (4.5)

The energy typically consists of 2 types of terms: intensity term(s) and shape term(s).

E(Γ) = Eint(Γ) + αEshape(Γ) (4.6)

In Section 4.2 we defined the shape prior energy term Eshape(Γ). In order to use the energy

in eq. 4.2 as a prior in a curve evolution context we must be able to compute the gradient

curve flow (or just curve flow)(

dΓdt

)

shapethat deforms the curve in the direction of the

steepest decrease of the energy Eshape(Γ). In curve evolution framework, the overall curve

flow is constructed as

dt=

(dΓ

dt

)

int

+ α

(dΓ

dt

)

shape

(4.7)

where(

dΓdt

)

intis the minimizing curve flow corresponding to intensity energy term Eint(Γ).

For simplicity, we consider a shape prior energy term defined on just a single feature

function. Since eq. 4.2 is additive in the different feature functions, the minimizing flows for

single individual feature functions can be added with the corresponding weights to obtain

the overall minimizing flow. The energy for a single feature function is given by:

E(Γ) =

∫ [

H∗(λ) − H(Γ, λ)]2

dλ (4.8)

where H∗(λ) is the target CDF and H(Γ, λ) is the CDF computed on the evolving curve.

Because the energy depends on the whole curve in non-additive way, the minimizing flow

at any location on the curve also depends on the whole curve, and not just the local curve

properties, making this computation much more challenging than the flow computation for

Page 84: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

62

the popular shape prior energy definitions. The minimizing flow and its computation will,

of course, depend on the specifics of the feature function chosen.

We propose two techniques to compute the curve flow. The first method is a variational

based approach and produces exact analytical solutions for the feature functions used in

this work. The second method is the numerical gradient descent based solution. We apply

the unconstrained gradient descent to find the curve deformation that reduces the energy.

This method can be used to quickly test new feature functions.

4.3.1 Exact solution using variational framework

Here we introduce an efficient approach to analytically compute the minimizing flows for

certain feature functions by using a variational framework. The resulting flows guarantee

reduction of the energy functionals in eq. 4.8.

Let us briefly outline the strategy to find the curve flow minimizing an energy functional

(see (Charpiat et al., 2003) for background and Appendix A for detailed derivations):

1. Define a small perturbation of the curve in the direction of the normal as a function

of arc-length as β(s).

2. Find the Gateaux semi-derivative of the energy in eq. 4.8 with respect to the per-

turbation β. Using the definition of the Gateaux semi-derivative, the linearity of

integration, and the chain rule we obtain

G(E, β) = 2

∫ [

H∗(λ) − H(Γ, λ)]

G[

H(Γ, λ), β]

dλ (4.9)

3. If the Gateaux semi-derivative of a linear functional f exists, then according to the

Rietz representation theorem, it can be represented as

G(f, β) =< ∇f, β > (4.10)

where ∇f is the gradient flow minimizing the functional f . We must compute

G(

H(Γ, λ), β)

, then find corresponding ∇H(Γ, λ) using representation in eq. 4.10.

Page 85: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

63

4. Gateaux semi-derivative of the functional E can be now represented

G(E, β) = 2

∫ [

H∗(λ) − H(Γ, λ)]

< ∇H, β > dλ =

2

∫ [

H∗(λ) − H(Γ, λ)]

∇Hdλ, β

(4.11)

The overall flow minimizing eq. 4.8 is then given by

∇E = 2

∫ [

H∗(λ) − H(Γ, λ)]

∇H(Γ, λ)dλ (4.12)

4.3.2 Numerical solution

In (Litvin and Karl, 2004b), we propose a numerical scheme to estimate the flow. Let the

single feature function at time t be Φt(Γ) (continuously defined in the space Ω). In our

first step, we compute the gradient descent flow in the space Ω minimizing the eq. 4.2.

dΦ(λ) =∂Φ

∂t= −∇ΦE = [H∗(Φ(t)) − Ht(Φ(t))] H ′

λ(Φt(t)) (4.13)

where H ′λ(Φt(t)) is the derivative of the distribution with respect to λ evaluated at Φt(t).

This flow is defined on feature function values and specifies how to change the feature

function values to reduce E. The stationary point of this flow corresponds to the case

when H∗(Φ(t)) = Ht(Φ(t)), i.e. the distribution function for the given curve matches the

target prior distribution function. The discrete approximation of eq. 4.13 is straightforward

and can be efficiently computed.

The flow in eq. 4.13 is defined on the curve features separately, without consideration of

consistency between those features. For example, for boundary curvature feature function,

arbitrary evolution of the curvature values along the curve will no longer correspond to a

connected curve. In fact, curvature function computed on the curve must satisfy a condition

discussed in (Klassen et al., 2004). Let this flow on the feature function computed according

to eq. 4.13 be denoted by dΦ∗. The evolution of the feature function is constrained by the

fact that the feature function must remain corresponding to a valid contour Γ, which

induces a infinite dimensional manifold of valid feature functions. In general, the flow

Page 86: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

64

dΦ∗ is not restricted to this manifold. In order to find the actual curve flow, we adopt

a projection solution, finding the curve deformation that results in a feature space flow

closest to the given unconstrained flow dΦ∗.

An approach similar in spirit is used in (Klassen et al., 2004) to evolve shapes. In

Klassen’s approach, a prescribed direction of shape parameterization evolution is given on

the the space tangent to the manifold of valid shapes. This prescribed direction takes the

object outside of the manifold of shapes. The solution is then sought as a shape on the

manifold that is the result of evolution in the direction closest to the prescribed direction

on the tangent space.

Let us consider a given contour at time T . Let the space of all feature function flows be

Φ and let β(s) be a small displacement of the contour in the normal direction to the curve

at s. Let S be a hypersurface in Φ, given by feature function flows corresponding to all

possible curve deformations. We want to orthogonally project the unconstrained feature

flow dΦ∗ onto the hypersurface S. See Figure 4·2 illustrating this technique. Let us denote

the resulting projection dΦ+ and corresponding curve deformation β(s)+. Note that the

point dΦ = 0 ∈ S. We perform gradient descent on the manifold S starting from β(s) = 0

to find the projection dΦ+. We seek the curve deformation solution as follows

β(s)+ = argmin

β(s)

||dΦ(β(s)) − dΦ∗|| (4.14)

where ||x|| is the L2 norm.

We use the following strategy to find the solution β(s)+. Starting from zero deformation

β(s)t=0 = 0, we consecutively apply small random perturbations dβ(s)t to the curve. Each

new random perturbation adds to the perturbation at the previous time step:

β(s)t = β(s)t−1 + dβ(s)t (4.15)

We record the resulting cumulative perturbation as an intermediate result if the cost

||dΦ(β(s)) − dΦ∗|| given the new perturbation is lower then the previous value, that is,

Page 87: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

65

dΦ=0t=0 Φd

(s)t=n

*

d

Φ* −d (s)

Φ (β )

d Φ (β )||||

Figure 4·2: Illustration of the descent on the manifold procedure to findthe the curve flow β(s)+, eq. 4.14. The surface represent the space of allrealizable feature function flows S.

if

||dΦ(β(s)t) − dΦ∗|| < ||dΦ(β(s)t−1) − dΦ∗|| (4.16)

The procedure is stopped when the average decrease of the energy over a fixed number of

steps is lower then a specified threshold value.

The procedure is symbolically illustrated in Figure 4·2. The surface represents the

realizable feature function flow manifold S. The trajectory shows the value of the feature

function flow during the minimization procedure.

As we pointed out previously, feature function values are computed by uniformly dis-

cretizing the curve and periodically re-sampling the curve. The computational burden of

computing the curve evolution under the numerically computed flow is substantial due to

the two nested levels of iterations (curve evolution and curve flow computation).

Even though the flow dΦ+ found using our procedure is gradient related to the flow dΦ∗

(< dΦ+, dΦ∗ > is positive), we have no guarantee that dΦ+ is not asymptotically orthogonal

to dΦ∗. In fact, complex geometry of S can trap the solution to the minimization problem

in eq. 4.14 into a local minimum, which can potentially result in ||dΦ+|| << ||dΦ∗|| and/or

Page 88: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

66

< dΦ+, dΦ∗ >≈ 0. However, in our experiments the flow obtained using this numerical

scheme did minimize the cost in eq. 4.2. Since the approach does not produce the true

gradient curve flow but only gradient related flow, the combination of the resulting flow dβ+

with(

dΓdt

)

intin eq. 4.7 does not necessarily minimize the original energy functional in eq. 4.6.

In our experiments, the flow computed using this approach produced the segmentation

result that is reasonably close to that given by the true gradient flow for Eshape. Given

the above theoretical issues with this numerical scheme and high computational cost, exact

analytical solution for the curve flow is highly preferable and is used in experiments reported

in this dissertation.

A different formulation of the distribution difference measure based on geodesics in the

space of PDFs is introduced in Section 4.7.2. In Section 4.7.2 we also pave the way to

compute the corresponding minimizing flow.

4.4 Feature function choice

Figure 4·3: Feature function #1 values computed on the discretized curve.We show interpoint distances d13..d15.

We now consider the choice of the feature functions. In the example in Section 4.1,

we considered the curvature measured along the boundary as the feature function. In

Appendix A.2, we derive the minimizing curve flow corresponding to this choice of feature

function and show that resulting flow is likely to be numerically unstable and sensitive

to noise. Our experiments also confirmed this fact. Instead, we design feature function

definitions having better properties by using non-adjacent points on the curve to compute

Page 89: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

67

feature function values.

We use two specific feature functions in our experiments in this chapter, which we define

below:

• Feature function #1. Inter-point distances.

We first define the space Ω as a subspace of R2, where each dimension encodes the

position on the curve parameterized by arc-length. Without loss of generality we

normalize the arclength to the interval [0, 1]. Two positions s1 and s2 on the curve

Γ define the point ω = s1, s2. Now the subspace ΩS of the space Ω is defined as

ΩS :min(|s1 − s2|, 1 − |s1 − s2|)

L∈ S

S = [a, b] a, b ∈ [0, 0.5] (4.17)

where L is the total length of the boundary; S is the interval specified by a and b.

Basically, ΩS specifies the set of curve points within a given distance along the curve.

The upper bound 0.5 of the interval corresponds to two points dividing the boundary

into two parts of equal length. Let us assume a point ω ∈ ΩS . The feature function

corresponding to this point is given by the normalized Euclidean distance between

the corresponding boundary points:

Φ(s1, s2) =d(Γ(s1), Γ(s2))1Ω

ν∈ΩS

d(ν)dν(4.18)

where d(ν = s1, s2) = d(Γ(s1), Γ(s2)) is the Euclidean distance between the bound-

ary points specified by s1 and s2. By defining multiple intervals S we obtain mul-

tiple distributions that we use simultaneously in our prior in eq. 4.2. In the ex-

periments presented in this paper we used a single interval [0, 0.5] or four different,

non-overlapping intervals of equal length given by 0.125. These correspond to points

of increasing separation along the boundary curve. The corresponding division of the

space Ω into four sub-spaces is illustrated in Figure 4·4.

In the discrete formulation (used to compute our distributions), the feature value

set F consists of the normalized distances between nodes of the discrete curve, see

Page 90: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

68

Ω

s

s

ΩΩΩ

321

2

1

s = 0

s = 0.5

s = 0.1

s = 0.9

2

Ω 3

Ω 4

Ω 1

Ω 1

Ω 1Ω 4

Figure 4·4: Left: Graphic interpretation of the division of the space Ωinto four subspaces ΩS : Ω1, Ω2, Ω3, and Ω4. Corresponding intervals are[0, 0.125], [0.125, 0.25], [0.25, 0.375], and [0.375,0.5]. Right: a curve with 3pairs of points, members of Ω1, Ω1, and Ω4 respectively.

Figure 4·3

F =dij | (i, j) ∈ S

mean(

dij | ∀(i, j) ∈ S) (4.19)

The set S defines the subset of internodal distances along the curve used in the

feature. For (a, b) ∈ [0, 0.5], S(a,b) defines this subset of nodes

(i, j) | (min(|i − j|, 1 − |i − j|)) ∗ ds/L ∈ [a, b], where a and b are the lower and

upper bounds of the interval respectively; ds is the distance between neighboring

boundary nodes and L is the total boundary length. Note that the feature function

so defined is invariant to shape translation, rotation and scale.

For feature function #1, the analytically computed energy minimizing flow is given

by (see Appendix A):

∇E(Γ)(s) = 2

t∈S

~n(s) ·~Γ(s, t)

|~Γ(s, t)|

[

H∗(|~Γ(s, t)|) − H(Γ, |~Γ(s, t)|)]

dt (4.20)

where Γ is the parameterized curve as a function of arc length X(s), Y (s) with

s ∈ 0, 1, ~Γ(s, t) is a vector with coordinates X(t)−X(s), Y (t)−Y (s), and ~n(s) is

the outward normal at X(s), Y (s). The flow at each s is an integral over the curve,

indicating the non-local dependence of the flow. The expression under the integral

can be interpreted as a force acting on a particular pair of locations on the curve,

projected on the normal direction at s.

Page 91: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

69

• Feature function #2. Multi-scale curvature.

As in feature function #1, we define the space Ω as a subspace of R2 where each

dimension encodes a point on an arc-length parameterized curve Γ. The subspace

ΩS of space Ω is defined similarly according to equation 4.17 and is parameterized

by an interval S = [a, b]. We now assume that ω = s0, s+. Let us define l(ω) =

min(|s+ − s0|, 1− |s0 − s+|) as the shortest distance along the curve between points

s+ and s0. We now define point s− obtained by moving from s0 along the curve by

l(ω) in the opposit direction. Hence, ω = s0, s− ∈ ΩS . We now define the feature

function value Φ(ω) as an angle of support between points s+, s− and s0:

Φ(s0, s+) = ∠(s−s0s

+) (4.21)

In a discrete formulation, the feature value set F consists of the collection of angles

between nodes of the discrete curve, see Figure 4·5

F = ∠i−j,i,i+j ∀(i, j) ∈ S (4.22)

where ∠(ijk) is the angle between nodes i, j, and k. Again, the set S defines the

subset of internodal angles used in the feature and again we used S in a multi-scale

way, as described in the definition of feature function #1. Invariance with respect

to translation, rotation, scale and mirror imaging holds for this feature function by

construction.

For feature function #2, the analytically computed energy minimizing flow is given

Page 92: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

70

by (see Appendix A)

∇E(Γ)(s) =

t∈S

cos(β) cos(n(s),Γ+)+cos(γ) cos(n(s),Γ−)sin α

if α 6= π

sin(~n(s), ~Γ−)) otherwise

×

a · r(s,t)

bc−

f (s−t)

1 −(

~n(s − t) ·~Γ−

|~Γ−|

)2

|~Γ−|−f (s+t)

1 −(

~n(s + t) ·~Γ+

|~Γ+|

)2

|~Γ+|

×

[

H∗(α(s, t)) − H(Γ, α(s, t))

]

dt (4.23)

where r(s,t) and f (s+t) take values -1 and 1 and indicate the sign of change of the

angle α(s, t) = ∠(~Γ−, ~Γ+) with respect to along-the-normal perturbation of the point

Γ(s) and Γ(s + t) respectively, ~Γ+ = ~Γ(s, s + t); ~Γ− = ~Γ(s, s − t); a = |~Γ+ − ~Γ−|;

b = |Γ−|; c = |Γ+|; β = ∠(−~Γ+, ~Γ− − ~Γ+); γ = ∠(−~Γ−, ~Γ+ − ~Γ−). See Appendix A

for details on notation and feature function computation.

Figure 4·5: Feature function #2 in discrete case: interpoint anglesα−1,1,2..α−n,1,n are shown.

4.5 Intensity histogram equalization connection

Recall the histogram modification task (Section 2.7.1). Given the image I(x, y)|R × R we

would like to evolve its gray level values in unique and spatially uniform way to match

Page 93: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

71

gray value PDF hI(λ) and arbitrary target PDF h∗(λ). Let the target CDF be defined as

H∗(λ) =∫ λ

0 h∗(s)ds and the current image CDF be defined similarly. It can be shown that

the flow

∂I(x, y, t)

t= H∗[I(x, y, t)] − HI [I(x, y, t)] (4.24)

always exists, is unique and has the desirable solution as the limit point.

Suppose now we want to match the distribution H(Φ(Γ), λ) of the feature function Φ(Γ)

and the target distribution H∗(λ) using the energy criterion in eq. 4.2. Our minimizing

feature function flow in eq. 4.13 is similar to the histogram modification flow eq. 4.24. The

difference is the multiplicative term H ′λ(Φt(t)) which is absent in the histogram modification

flow PDE. Since the term H ′λ(Φt(t)) is always positive, the histogram modification flow

in eq. 4.24 is gradient related to the flow in eq. 4.13 and is therefore a minimizing flow

of eq. 4.2. To our knowledge, an energy minimization interpretation of the histogram

modification flow in eq. 4.24 has not been presented. Therefore, we establish that eq. 4.2

can also be minimized by the histogram modification flow.

Our shape distribution based shape prior is connected to the histogram equalization

in that our method extends the general idea of matching distributions using PDE to more

general class of features that are computed on the curve and/or image intensities. We saw

that considerably more complicated approach is needed to match these general distributions

due to many constraints on evolving variables. In our case, these constraints are given by

the connection of the feature function values to a given curve.

4.6 Extension to 3D

In recent years, 3D image processing techniques have gained more interest fueled by increas-

ing processing power, data acquisition improvements and application needs. For instance,

CT and MRI are able to produce volumetric data with comparable resolutions in three

dimensions. One can achieve good results in interpreting such data by exploring volume

as a whole rather then its separate 2D slices. Exploring the volume as a whole can be

achieved in different ways. One direction of research focuses on processing 2D slices while

Page 94: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

72

exploiting the data consistency in the third dimension. In the shape modeling context, this

approach is known as “2D-to-3D”. It is especially effective and natural when a particular

dimension has a different nature from the other two or otherwise when data resolution

differences prevent using isotropic 3D approach.

A different, and more interesting approach is to extend a given 2D approach in a

“natural” way. In fact, many shape modeling techniques allow for such an extension.

The most interesting example is the 3D version of level set based shape models. In level

set approach, the boundary is represented implicitly as a zero level set surface of the

higher dimensional function; therefore, level-set boundary representation can be carried

out in any dimension. Direct, explicit boundary representation is also possible in 3D

and is known as deformable mesh. On the other hand, approaches relying on a curve-

based parameterization, such as (Klassen et al., 2004), may not have a straightforward 3D

extension.

As for any shape modeling approach, it is desirable to have a natural 3D extension of

the method. In this section we present the foundation for such an extension for our shape

distribution based shape model. We first discuss the theoretical aspects using a continuous

formulation and then discuss implementation issues.

4.6.1 Formulation 3D

We define the 3D shape as a smooth closed surface that we denote S. The definition of the

prior is similar to the 2D case. We first define a feature function Φ(ω) that can be defined

on the surface or on the volume. The CDF of Φ(ω) is given by

H(λ) =

Ω hΦ(ω) < λ

Ω dω(4.25)

and the prior shape energy is given by

Eshape(S) =

M∑

i=1

wi

Λ

[

H∗i (λ) − Hi(S, λ)

]2dλ (4.26)

Page 95: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

73

where H∗i (λ) are the prior distributions computed on training surface(s), M is the number

of different feature functions.

The challenge arises in the definition of the feature functions, and their computation.

Extension to 3D of the feature function #1 (inter-point distances) can be defined as follows.

Let Sp and Sq be two points on the surface S. We define the element ω as a combination

of 2 surface elements, that is ω = Sp, Sq, where Sq, Sp ∈ S. The space Ω is therefore

defined as the set of all possible combinations of Sq, Sp. The value of the feature function

is the normalized distance between Sp and Sq.

Φ(Sp, Sq) =d(Sp, Sq)

mean (d(Ss, Sw)|∀Ss, Sw ∈ S)(4.27)

Extension to 3D of feature function #2 (multi-scale curvatures) can not be performed in

a straightforward way. Extending the idea used in 2D case, one has to define three surface

elements S1, S2, S3 and measure the angles between vectors ~V12 and ~V13. In 2D case, in

order to reduce the space Ω dimensionality, we constrained along-the-contour distances

between pairs of points by requiring them to be equal. However, the along-the-surface

distance between two elements on the surface can not be defined in a unique way. In fact,

one could use optimization based distance definition but the computation burden of feature

and flow computation is deemed unreasonable.

4.6.2 Surface flow

For feature functions #1 in 3D case, the surface flow expression is similar to its 2D coun-

terpart. For feature function #1 the minimizing flow is given by

∇E(S)(Sa) = 2

Sb∈S

~n(Sa) ·~Γ(Sa, Sb)

|~Γ(Sa, Sb)|

[

H∗(|~Γ(Sa, Sb)|) − H(S, |~Γ(Sa, Sb)|)]

dSb (4.28)

where Sx is a point on the surface S with coordinates X(Sx), Y (Sx), Z(Sx), ~Γ(Sa, Sb) is

a vector with coordinates X(Sb) − X(Sa), Y (Sb) − Y (Sa), Z(Sb) − Z(Sa), and ~n(Sx) is

the outward normal to the surface at Sx. The flow at each location Sx is an integral over

Page 96: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

74

the surface. The expression under the integral can be interpreted as a force acting on a

particular pair of locations on the surface, projected on the normal direction at Sx.

4.6.3 Implementation issues

Discrete implementation of feature distribution and flow computation requires sampling of

the curve (2D) or the surface (3D). In 2D, we accomplished that by uniform sampling of

the curve, which is straightforward. In 3D, uniform sampling of the surface is not defined

uniquely. However, one may use non-uniform sampling in computing the feature functions

and associated flows. Here we propose to avoid surface resampling by using level set surface

embedding that provides non-uniform sampling. In fact, level set function zero crossings

with elementary grid voxels provide the triangulated approximation of the original surface.

Suppose we want to compute the integral∫

SK(s)ds of the quantity K(s) defined on the

surface.

1. Using fast level-crossing extraction algorithm, we obtain zero grid crossings Cri|i =

1..N and triangular patches Pai|i = 1..M. Let us denote the corners of the patch

Pai as Cr1(Pai),Cr2(Pai), and Cr3(Pai). The union of all patches approximate the

original surface:

UMi=1Pai ≈ S (4.29)

2. For each patch Pai, we compute the surface A(Pai), the normal n(Pai), and the

center of mass C(Pai).

3. The surface integral can therefore be approximated as

I =

S

K(s)ds ≈M∑

i=1

K(C(Pai))A(Pai) (4.30)

where K(s) is the quantity defined on the surface point.

An example of the surface triangulated using level set function zero crossings as surface

points is shown in Figure 4·6.

Page 97: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

75

Figure 4·6: Surface triangulation by the level set function zero crossingsextraction.

4.7 Extensions and additional issues

4.7.1 Weighted distributions

In certain applications it may be advantageous to impose different weights on different

ranges of feature function values. For instance, it may be more important to preserve

sharp corners than straight portions of boundaries. We propose an extension of our shape

prior energy to achieve that effect. Let us introduce a differentiable weighting function

w(λ). We now consider a typical term in general shape distribution prior energy term. We

write the modified energy as

E(Γ) =

w(λ)[

H∗(λ) − H(Γ, λ)]2

dλ (4.31)

The weight emphasizes fidelity to certain features more then others. The derivation for

the curve flow minimizing this energy definition is presented in Appendix A.4.

4.7.2 Geodesic distance between distributions

In Section 2.8 we discuss and justify the choice of distribution difference measure used

throughout this work. Here we would like to suggest an alternative approach based on the

Page 98: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

76

recent work by (Mio et al., 2005). In (Mio et al., 2005), the authors present an approach

to compute the distance between CDFs based on the shortest geodesic computation. The

geodesic is the trajectory in the space of PDF that transforms one PDF into another along

the shortest path. The results show that the geodesic path computed between two distri-

butions evolves PDF in a “shape-preserving” fashion, which means that as the distribution

is evolved, its shape is approximately preserved. In terms of the examples considered in

Section 2.8, the distance between unimodal distributions with closely located modes will

be small because the path of moving a mode is short. The same property holds for EMD

measure, as discussed in Section 2.8. Unfortunately, as for EMD measure, the analytic

form for the geodesic distance measure is not available. The measure computation involves

numerical optimization. Hence, analytical curve flow computation is not feasible.

Yet, one property of the method in (Mio et al., 2005) can make it useful for our purpose.

The geodesic computation in (Mio et al., 2005) involves the computation of the initial

tangent direction in the space of PDFs. Suppose we have the initial and target PDFs p0(x)

and p∗(x) respectively, and corresponding CDFs H0(x) and H∗(x). The tangent direction

f is computed in the space of the log-likelihood of the distribution p(x). That means the

elementary increment dp of the PDF is given by

dp = p0(ef − 1) (4.32)

It follows that the corresponding elementary increment dH∗ of the CDF is given by

dH0(x) =

x∫

−∞

p0(ef − 1) (4.33)

dH0(x) gives the direction in which CDF must be evolved to reach the target CDF along

the geodesic trajectory. Now we may find the curve flow that moves the curve in way

that the CDF evolution direction is as close as possible to the prescribed direction dH0(x).

To this end, let us construct the energy function based L2 norm in the space of CDF

Page 99: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

77

increments. We desire to find the gradient curve flow that minimizes this energy:

E(Γ) =

[dH0 − dH(Γ)]2 =

[dH0 − (H(Γ) − H0)]2 =

[H0 + dH0 − H(Γ)]2 (4.34)

Equation 4.34 is exactly one term in eq. 4.2 where the target distribution is H∗ = H0+dH0.

Therefore, we can find the gradient curve flow minimizing 4.34 by one of the two procedures

described in Section 4.3. As described previously, the curve evolved using such gradient

curve flow may not achieve the target distribution H∗ in general case. The reasons are

non-linearity of the manifold of realizable CDFs, discretization errors, and possible local

minima of the energy. Hence, the flow will move the PDF away from the computed geodesic,

and the geodesic direction must be re-estimated after each curve evolution step. Moreover,

since actual PDF evolution will not follow the geodesic exactly, we can not guarantee that

the along-the-geodesic distance measure will be minimized by the curve flow computed.

Computationally, this method is more extensive than using our formulation in eq. 4.2,

since additional optimization problem to find the geodesic direction f must be solved at

each iteration. The advantage of this method is the more intuitive formulation of the

distribution difference measure.

4.7.3 Shape distribution uniqueness issues

In this section we return to the discussion of the properties of shape distribution based

shape representations, namely, invertibility and uniqueness. Many known parameteriza-

tions of shape are invertible, meaning that a given shape representation, such as skeleton

representation, corresponds to one and only one shape. On the other hand, a given shape

distribution representation does not corresponds to a single shape in general case, which

is the consequence of the fact that distribution-based representation is constructed to be

invariant with respect to the location of features.

Let us consider the feature function #1 (inter-point distances) defined in this chapter.

In (Lemke et al., 2002), theoretical work has been done to study reconstructing discrete

point sets from the sets of measured inter-point distances between any 2 points in the set of

Page 100: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

78

points. In fact, this problem was studied previously in X-ray crystallography and restriction

site mapping of DNA. (Lemke et al., 2002) defines sets of points that follow “turnpike”

and “beltway” configurations. The later implies the points to lie on a circular loop while

the former - on the line. A distance set is then defined as a combination of all inter-point

distances between the points in the set. Distances are measured as Euclidean distances

for “turnpike” configuration and as arc-length distances for “beltway” configuration. The

problem of reconstructing point coordinates from distance set boils down to reconstructing

1D coordinates of points (since the points are constrained to lie on the curve). Lower and

upper bounds on the number of possible solutions where shown to be sub-exponential with

respect to the size of the set of points. Unfortunately this problem setup is quite different

from our problem. In our problem, the points are restricted to be uniformly spaced along

the curve, while the curve has no predefined shape. However, we hypothesize that the

number of solutions (possible generating point sets for a given distance set) in our problem

can also be large.

We suggest that this solution non-uniqueness is the underlying property of the shape

distribution representation that is responsible for generalization ability of our prior. We

suggest that the configuration solving the point set reconstruction problem (the problem of

reconstructing the sets of points from measured inter-point distances) share some important

visual shape properties. It is even more difficult to approach the point set reconstruction

problem from the distribution similarity point of view. To our best knowledge, no research

has been done in this direction. More theoretical work is needed to fully understand the

issues of shape distribution representation uniqueness.

4.7.4 Computational complexity

In this section we present some results on the computational complexity involved when

using our shape prior. We only consider the computations needed to construct the curve

flow corresponding to our prior during one curve evolution iteration and ignore the overhead

computations required to evolve the curve, reinitialize level-set function, etc, since these

Page 101: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

79

computations are independent of the used prior. We also do not consider the computations

required to construct (train) our model, since model construction only requires feature

distribution computation for each of the training shapes.

Let us consider a single feature function and a single curve evolution iteration. Overall

computational expense will be linear in the number of feature functions used. The number

of FLOPs needed to compute the feature function and the corresponding curve flow is

O(N2), where N is the number of nodes discretizing the curve. Assuming that the actual

number of FLOPs needed is 20N 2 and N = 200, on 1.8GHz processor, we estimate that

the required computation time is 0.02 seconds. Our MATLAB implementation on 1.8GHz

processor with significant overhead computations integrated, requires 0.6 sec and 0.8 sec for

the flow corresponding to inter-point distance and multi-scale curvature feature functions

respectively. Therefore, major optimization of our implementation can be carried out.

4.8 Summary

In this chapter we introduced a novel method of constructing and using shape prior. Our

method relies on modeling distributions of certain significant features of shapes and creating

a measure of similarity between these distributions. A key to the useful implementation of

obtained measure lies in our ability to incorporate it into a curve evolution framework as

a prior energy term. To our best knowledge, shape prior based on shape distributions is

for the first time formulated and coupled with curve evolution framework. In the second

part of this chapter we presented the 3D extension of our approach and discuss arising

implementation issues.

Page 102: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

80

Chapter 5

Applications of shape distribution based shape

priors

In this chapter we consider applications of our shape modeling approach. First, we test

the behavior of our framework when the curve is guided solely by the shape prior force

based on shape distributions. We test the focusing ability of the prior to recover the single

curve used to construct the prior shape distributions. Next, we use our prior for image

segmentation in a curve evolution context. We perform several experiments; in particular,

we perform segmentation of images with occlusion. Finally, we explore another application

of our prior to find the intrinsic mean of several shapes. The work reported in this chapter

previously appeared in (Litvin and Karl, 2004a; Litvin and Karl, 2004b; Litvin and Karl,

2004c).

5.1 Shape focusing by shape term guided evolution

In this section we examine the evolution of a curve driven solely by our shape prior energy

term. We have removed the data term to examine and understand the effect of the prior

dependent flow term on the curve evolution. Roughly, this should produce the curve most

favored by the prior (modulo initialization effect and local minima). By using different

feature functions in our prior we can obtain some insights on how this most favored curve

is affected by the nature of the used feature functions. These insights are useful in designing

feature functions for a particular application.

Now the curve is evolved solely under the action of the shape term; that is, the energy

Page 103: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

81

to be minimized is

E(Γ) =M∑

i=1

wi

Λ

[H∗i (λ) − Hi(Γ, λ)]2 dλ (5.1)

where H∗i (λ) is computed on the target shape. This energy minimizes the distance in

eq. 4.4 between the evolving and target shapes, thus matching shape distributions of the

evolving curve and the target curve.

Experiment 1

In our first experiment we choose a rather simple target shape and evolve the curve

using different choices of feature functions to see how different feature functions capture

different aspects of a shape. Our method is flexible in that different elements of a shape

can be encoded through different feature function choices. We consider 3 different choices

of features in constructing the model: (1) feature function #1 (inter-point distances) alone

(w2 = 0); (2) feature function #2 (multi-scale curvatures) alone (w1 = 0); (3) features

functions #1 and #2 combined with w1 = w2. The evolution is started from the curve

shown by the dash-dotted line in Figure 5·1. The target shape (on which the target

distributions are computed) is shown by the dashed line. For each of the 3 experiments,

the curve is evolved until it stops. The result is shown by the solid line.

All 3 experiments yield shapes quite similar to the target shape, but small differences

give insight into what properties of the shape are captured by particular feature functions.

Note that our distribution-based representation is scale/rotation/translation invariant, so

differences in scale and position of the result are not considered important. First, we

consider the flow based on feature function #1. This feature function encodes distances

between pairs of points on the curve. The resulting curve (panel A in Figure 5·1) has

a slightly bent, elongated shape. In fact, boundary curvature is not captured directly by

feature function #1; therefore, it is expected that differences in global bending deformation

are not effectively corrected by the flow based on this feature function. However, the

distance between opposing boundaries captured by feature function #1 is well preserved

in the resulting shape. We now consider the flow based on feature function #2. This

Page 104: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

82

feature function encodes curvatures measured at different scales. The flow based on feature

function #2 (panel B in Figure 5·1) yields a shape highly similar to the target shape but

slightly non-symmetric and cone-shaped. This is explained by the fact that feature function

#2 is designed to capture boundary curvature rather than relative boundary position. In

fact, the resulting shape has correct curvatures (straight lines and circular regions) but

relative boundary positions do not match those of the target (the result is a cone rather

than a tube). Finally, both flows combined (panel C in Figure 5·1) yield nearly a perfect

shape. The flows for both feature functions combined complement each other and work to

correct for their separate deficiencies. By including more terms into the prior energy, more

information about shape can be captured.

40 60 80 100

50

60

70

80

90

100

40 60 80 100 120 140

40

50

60

70

80

90

100

110

40 60 80 100

50

60

70

80

90

100

(A) (B) (C)

Figure 5·1: Evolution of an initial contour under the sole action of theprior flow: initial (dot-dashed), target (dashed), and resulting (solid) con-tours. (A) - prior constructed on the inter-point distances (#1); (B) - priorconstructed on multi-scale curvatures (#2); (C) - Both feature classes #1and #2 are included.

Experiment 2

In our previous experiment, the curve converges close to the target, indicating the

concaveness of the energy functional on the space of shapes. A prominent global minimum

allows the curve evolution to focus on the target shape. In our second experiment, we

show that in some cases shape complexity can lead to difficulty for our prior to focus

on the target. In fact, shape complexity increases the entropy of our shape distribution

descriptors (evidence given in (Page et al., 2003) and confirmed by our experiments),

Page 105: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

83

Figure 5·2: Target plane shape.

Figure 5·3: Evolution of the contour under the action of prior flow: ini-tial (dot-dashed), and final (solid) contours. Target contour is shown inFigure 5·2.

making more curves have similar distributions. It is expected that the energy functional

on the space of shapes may have more significant local minima that can trap the evolving

curve. In this experiment we test the significance of such local minima by using a complex

target shape. We further suggest a modification of our prior that allows us to improve its

focusing properties on complex shapes.

Our new, more complex target shape is shown in figure 5·2. We use feature functions

#1 (inter-point distances) and #2 (multi-scale curvatures) in eq. 5.1. For each feature

function type (#1 and #2) we use 4 intervals S, leading to a total of 8 terms in eq. 5.1.

The result given by evolving the curve using the target distribution computed on the

Page 106: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

84

curve in Figure 5·2 is given in Figure 5·3. The initial curve is the average over 16 prior

aircraft shapes and is used throughout experiments with aircraft shapes. The prior aircraft

shapes were not pre-aligned; hence, their average shape computed by averaging signed

distance transforms is not symmetric. This essentially random initial contour simulates

unsupervised, random initialization in real life situations. By not using a symmetric initial

curve we eliminate the possibility of obtaining better focusing due to well chosen initial-

ization. Our result (solid curve) shows that the evolution stops at the local minimum of

the energy functional, coming close to the global minimum (target shape itself). Notice

that the resulting shape is not symmetric. In fact, the shape energy does not encourage

symmetry. The evolving curve starts from the non-symmetric initialization and remains

non-symmetric until it stops at the local minimum which happens to be non-symmetric.

The structural elements of the resulting shape are well developed (see wings, tail). The

relative positions of the elements are maintained. The difference between the target and

the resulting shape is in the fine details of the tail, the wings, and the fuselage. Therefore,

our prior still leads to a curve evolution that focuses onto the target but can be trapped

into local minima in the vicinity of the global minimum.

Clearly, we would still like to have a focused prior on more complex shapes. Now we

propose the way to improve the focusing ability of our prior. We can capture the prior

shape at multiple scales to smooth out the energy. The motivation is that our prior has

better focusing effectiveness on the less complex shape; therefore, the prior at coarser scale

should be allowed to resolve larger shape details, while the prior at finer scale will focus

on fine details.

Now let us introduce the concrete implementation of this strategy. The question is

how to obtain the coarse shape representations and how to use the prior built on coarse

representations in our curve evolution approach. Addressing the second question, we choose

to augment the shape prior energy E(Γ) by additional terms corresponding to different

scales, which penalize the difference between the representation of Γ at a given scale (in

term of shape distributions) and the target shape representation at that scale. Now the

Page 107: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

85

curve evolution flow should minimize the new, augmented energy.

In order to build the multi-scale shape representations, instead of sampling the curve,

we take advantage of using the level set framework. Suppose a signed distance function

transform is computed on the curve. In fact, the level sets corresponding to increasing

absolute non-zero values of the level set function represent the smoothed versions of the

contour (corresponding to the zero level set). Level sets corresponding to the positive large

distance from the contour are circles. The level sets inside the contour collapse to a point.

Therefore, we can approximate the shape representations at multiple scales by its level

sets. Furthermore, the level set is the natural way to relate the representations at multiple

scales to one another. By fixing the level set representation to be a signed distance function

transform, we automatically link our representations at multiple scales to a valid contour.

If we evolve these representation in the distance function preserving fashion, we achieve a

valid curve and its corresponding coarse representations at any step of the evolution.

Now let us assume that the signed distance transform of the target curve Γ∗ is D(Γ∗).

Suppose we extracted the level sets corresponding to the set of values L : l1..lN of

the distance function, and the resulting N contours are Γ∗l1 = D(Γ∗) = l1,...,Γ∗lN =

D(Γ∗) = lN. We now built the target distributions on each of these level sets. We

denote the target distribution for the feature function i, extracted from the contour Γ∗lj

as H∗lji . Let the current curve be denoted Γ and its distance transform D(Γ). The new

augmented prior energy is given by the sum of terms corresponding to the prior at different

scales. The term for each scale is constructed as the shape distribution difference measure

between the target distribution computed on the corresponding target level set and the

distribution computed on the corresponding level set of D(Γ) The new energy is

E(Γ) =N∑

j=1

M∑

i=1

wi

∫ [

H∗lji (λ) − Hi(D(Γ) = lj, λ)

]2dλ (5.2)

Let Γlj = D(Γ) = lj be the level set (at value lj) corresponding to the curve Γ. Now let

us abstract from the existence of the signed distance transform embedding Γ and assume

Page 108: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

86

that Γlj are separate evolving curves. Minimization of eq. 5.2 can now be rewritten as a

constrained minimization problem

Γ+ = argmin

Γ:Γlj =D(Γ)=lj ∀j

N∑

j=1

M∑

i=1

wi

∫ [

H∗lji (λ) − Hi(Γ

lj , λ)]2

dλ (5.3)

Unfortunately, the constraints in eq. 5.3 can not be written in the closed analytical

form; therefore, a closed form for the curve flow minimizing the energy in eq. 5.3 is difficult

to find. We propose an approximate solution to solve this minimization problem. Let us

relax the constraints in eq. 5.3 and assume that each curve Γlj evolves independently. In

such a case, the energy decouples and the flow for each curve Γlj is given by

dΓlj

dt= −∇E(Γlj ) = −∇

(M∑

i=1

wi

∫ [

H∗lji (λ) − Hi(Γ

lj , λ)]2

)

(5.4)

where the flow ∇E(Γlj ) is the gradient curve flow for Γlj minimizing the corresponding

energy term. Let us now move back to the level set domain. Suppose we start from the

signed distance transform D(Γ) and evolve the level set function infinitsmally in such a

way that eq. 5.4 holds for any j ∈ L. We can do so by evolving the level set function at

each point according to the curve flow corresponding to the closest level set. We can write

such evolution as

Dt(x) = −dΓD(x)

dt∇D(x) (5.5)

where the flows dΓD(x)

dtare given by eq. 5.4, with lj = round(D(x)). After an evolution step

according to eq. 5.5, the level set function is no longer a signed distance transform. We may

now perform the correcting step, trying to make D(Γ) a signed distance transform again.

In (Sethian, 1999), the PDE-based approach to perform such correction was introduced for

dynamic re-initialization. In (Sethian, 1999), it is shown that the level set function update

flow necessary to produce the signed distance transform is ∇D. We can now perform the

update and correction of the level set function using the following PDE

Dt(x) = −dΓD(x)

dt∇D(x) + β∇D (5.6)

Page 109: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

87

where β is the strength of the correction term chosen empirically. Using large enough β

the flow in eq. 5.6 approximately preserves the signed distance transform property of D.

In the absence of local minima, the PDE in eq. 5.6 approximately minimizes the energy in

eq. 5.2.

Figure 5·4: Evolution of the contour under using multi-scale prior definedon a group of level sets: initial (dot-dashed), and resulting (solid) contours.

The result of applying this procedure to the current shape focusing example is shown in

Figure 5·4. We used 61 terms in the energy formulation in eq. 5.2 with L : l1 = −30, l2 =

−29, ..., lN = 30. The final shape is now closer to the target shape in Figure 5·2. Notice

the improved shape of the tail (narrow neck is better pronounced), wings, and fuselage.

The resulting shape is also nearly symmetric despite the non-symmetric initialization. We

confirm our intuition that the multi-scale application of our prior can improve the focusing

ability of the shape prior and ease the problem of local energy minima in case of complex

shapes.

5.2 Image segmentation

In the previous section we investigated the ability of our prior to generate the curve flow

that converges onto the target shape. We built the energy formulation that did not include

the image data. We showed that the prior cost function based on our formulation indeed

captures global shape geometry. The flow computed on the prior results in a curve similar

to the target curve, which is the evidence of the convexity of our prior shape energy term

Page 110: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

88

on the space of shapes.

We now focus on image segmentation applications that are the most important appli-

cation of our prior. Recall that in our image segmentation framework, the solution curve

Γ∗ is sought as the minimizer of the following segmentation functional

Γ∗ = argmin

Γ

E(Γ) = argmin

Γ

Eint(Γ) + αEshape(Γ) (5.7)

where Eint is the intensity data term and Eshape is the shape prior term. Our goal is to

study the properties of our shape prior on segmentation problem by comparing different

choices of Eshape while using the same data term Eint.

Experiment 1

First, we approach the problem of segmenting a polygonal shape from bi-modal images

corrupted by noise. Our goal is to obtain the segmentation satisfying two criteria:

• Low segmentation error in terms of area distance between the resulting contour and

the true contour. We use Hamming distance to quantify the segmentation error.

• Subjective visual similarity between the result and the true shape. We will define

visual similarity as the preservation of significant visual features of the true shape. For

the considered case, we would like to preserve sharp corners and straight boundary

segments.

We compare the results for the following forms of the prior energy Eshape:

• Our distributionally based prior. Energy Eshape is given by in eq. 4.2

• A generic curve length prior Eshape(Γ) =∫

Γ

ds

• Leventon’s “probabilistic” prior on level-set curvatures presented in (Leventon et al.,

2000b). Energy Eshape is given by eq. 2.17.

• A PCA prior introduced in Section 2.4.2. Energy Eshape is defined as the Hamming

distance between the segmenting curve and it’s projection onto the PCA space. The

Page 111: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

89

prior energy is given by eq. 2.18.

Since our focus is on the shape prior, we adopt a simple region-based data fidelity term

that we combine with different choices of the prior. The data fidelity term is given by the

probability of the observed image, given the location of the boundary: Eint = − log p(I|Γ).

In our model we assume a bi-level image model and that intensities inside and outside of

the object boundary, are known. In the case of Gaussian noise, the data fidelity term Eint

is then given by (Chan and Vese, 2001)

Ed =

Ru

(I − u)2dA +

Rv

(I − v)2dA (5.8)

where u and v are the known image intensities inside and outside of the segmenting bound-

ary and were integration is carried out over inside and outside areas Ru and Rv respectively.

The curve flow component corresponding to a gradient descent with respect to eq. 5.8 is

given by

dt=

(u − v)

2

(

I −u + v

2

)

~N (5.9)

Our shape distribution based prior in eq. 4.2 is constructed using two feature functions:

inter-point distances and multi-scale curvatures using one distribution corresponding to

the interval S[a,b] = [0, 0.5] for each feature function. The target distributions H∗ for each

feature function are computed as averages of distributions of the 4 prior triangular shapes

shown in Figure 5·6 (A). The regularization parameters in eq. 5.7 in each choice of the

prior were chosen to obtain the best result for that prior. In Figure 5·5 we compare the

segmentation results given by different choices of Eprior. Gaussian IID noise (SNR=−17.5

db) was added to a bi-modal image of a triangle to create the data image. The true

boundary is shown by a solid black line. We show the segmentation obtained using our

distribution-based prior model in frame (A); the PCA method in frame (B); the result of

using the curvature density prior of (Leventon et al., 2000b) in frame (D); and the result

of using the curve length penalty in frame (C).

The goal of this experiment is to illustrate the significant advantage of using our method

Page 112: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

90

Symm. Dist = 274.89 Symm. Dist = 314.9

(A) (B)

Symm. Dist = 294.3 Symm. Dist = 296.32

(C) (D)

Figure 5·5: Segmentation results. A: Our method; B: PCA; C: Curvelength penalty prior; D: Method in (Leventon et al., 2000b); White - finalresult; Black - true shape boundary; Dashed line - initial curve. Symmetricarea distance (in pixels) between true boundary and final result is shownon the top of each panel.

(A) (B)

Figure 5·6: Prior shapes used to construct the prior in our experiment. A:triangular shapes (experiment 1); B: polygonal shapes (experiment 2).

Page 113: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

91

over the existing approaches when training shapes visually similar to the segmented ob-

ject shape are provided to the algorithm. The curve length penalty (C) yields smoothed

curve, obviously not a desirable solution. The Leventon’s prior (B) yields the curve with

somewhat straighter boundaries. In fact, Leventon’s method uses prior information on the

level-set curvatures of the training shapes. Since level-sets computed on the training shapes

have predominantly zero curvature, the prior tends to straighten up the boundaries. Un-

fortunately, level-set smoothness is enforced uniformly across the image plane, and corners

are not preserved. The PCA method (B) yields the shape which has even straighter bound-

aries and sharper corners. Unfortunately, the space spanned by the principal components

contains smoothed shapes and does not preserve sharp corners. Finally, our method (A)

gives the best result, both in terms of preserving the sharp corners and straight boundary

segments as well as in terms of shape area distance measure. In Table 5.1, we summa-

rize the segmentation errors as measured by the symmetric distance (in pixels) between

the true boundary and the final result. For our method (A), the resulting error is mostly

attributed to the bias in location and angular position of the resulting shape, while the

error for other two methods is due to shape ”distortion”. In Table 5.1, we also show the

distribution difference measure which we use in Eshape in eq. 4.2. As expected, our result

yields a much smaller value for this measure because this measure is directly the part of

the segmentation functional being minimized.

Table 5.1: Experiment 1: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.

Our method PCA method Curve length Leventon

Symmetric area distance 274 314 294 296Distribution distance 0.03 0.14

Experiment 2

Page 114: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

92

Symm. Dist = 309.26 Symm. Dist = 416.09

(A) (B)

Symm. Dist = 294.3

(C)

Figure 5·7: Segmentation results, polygonal prior. A: Our method; B:PCA; C: Curve length penalty prior; White - final result; Black - trueshape boundary; Dashed line - initial curve. Symmetric area distance (inpixels) between true boundary and final result is shown on the top of eachpanel.

Page 115: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

93

In our next experiment we test the robustness of our technique to the use of different

sets of training shapes. Ideally, one would like to have a good segmentation whenever the

shapes used for training share significant visual features with the expected shape. We use

the same experimental set up as described previously but now we use a different set of

training shapes. We now use polygonal prior shapes shown in Figure 5·6 (B) which are far

less similar to the ground truth shape.

The result for this prior is shown in Figure 5·7 compared with the result obtained

using the PCA prior (B) and with minimum curve length prior (C). Comparing our prior

with minimum curve length prior, one can see that despite a slightly larger area distance

for our result, our result is still visually superior to that obtained with the curve length

prior since our result preserves sharp corners and straight boundary segments. Resulting

errors are summarized in Table 5.2. The visual superiority of our result is reflected in

smaller distribution difference measure obtained using our prior. Comparison with the

result using a PCA-based prior is most illustrative. Different polygonal shapes used here

have prominent features (corners) located at different arbitrary positions. Alignment of

the different polygonal shapes does not bring their prominent features into correspondence.

As a result, the constructed PCA space only contains the four training polygonal shapes as

special cases and does not contain triangular shapes. The solution given by the PCA-based

method gives the shape that is far from the true triangular shape.

Table 5.2: Experiment 2: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.

Our method PCA method Curve length

Symmetric area distance 309 416 294Distribution distance 0.03 0.08

Experiment 3

Page 116: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

94

The segmentation of noisy real images with limited prior data is an interesting ap-

plication that can be addressed by our prior. Suppose we want to segment a 3D object

from its 2D projection given that two views of the object to be segmented are known to

the algorithm. These views can be given by previously segmented contours in a tracking

experiment. In Figure 5·8, we show the noise free image to be segmented in panel (B) and

2 training contours of the object to be segmented in panel (A). It is important to note that

none of the training contours is close to the object shape to be segmented.

(A) (B)

Figure 5·8: Experiment 3: (A) - training shapes; (B) - noise free image

We employ the same segmentation technique as used in previous experiments. In Fig-

ure 5·9 we show the results obtained using our prior, the curve length prior, and the PCA

prior. Table 5.3 summarizes the segmentation errors obtained using the three methods.

The curve length prior yields almost straight boundaries at the expense of smoothed cor-

ners. The PCA space does not contain the true object, resulting in the large error. Our

segmentation preserves the salient shape features (straight boundaries and sharp corners).

Our result also has the lowest segmentation error, both in terms of the area-based error

measure and the shape distribution based measure.

Experiment 4

We now apply our technique to knee cartilage segmentation from MRI data. This type

Page 117: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

95

(A) (B)

(C)

Figure 5·9: Experiment 3 segmentation results. A: Our method; B: PCA;C: Curve length penalty prior; White - final result; Black - true shapeboundary; Dashed line - initial curve.

Table 5.3: Experiment 3: Segmentation errors computed using differenterror measure. First error measure is computed as a symmetric area differ-ence (Hamming distance in eq. 5.13) between final segmented region andtrue shape. The second measure is given by our prior energy Eshape ineq. 4.2.

Our method PCA method Curve length

Symmetric area distance 697 1492 729Distribution distance 0.048 0.11

Page 118: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

96

(A) (B)

(C) (D)

Figure 5·10: Knee cartilage segmentation results. A: Initial (dashed line)and true (solid line) contours; B: Our method (solid line); C: Leventon’s(solid line) D: Curve length penalty prior (solid line).

Page 119: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

97

of problem arises in osteoarthritis disease diagnostics and treatment. By measuring the

segmented cartilage thickness, doctors can assess the rate of cartilage loss due to the disease

progression and evaluate medication effectiveness.

In a clinical setting, the segmentation of the cartilage has been routinely done using

manual region growing techniques. A number of automatic and semi-automatic segmen-

tation techniques have been applied to cartilage segmentation, see (Lynch et al., 2000;

Kapur, 1999; Cohen et al., 1999) and references therein. The difficulties faced by au-

tomated cartilage segmentation include diffuse boundaries, neighboring structures with

similar intensities, and very narrow shape. These factors require a strong prior to guide

the segmenting contour toward the correct boundaries. The presence of salient features

(cartilage corners) whose particular location changes from subject to subject makes our

shape prior an appealing approach. The presence of obstructing neighboring structures

requires an initial contour that is rather close to the true boundary. However, we claim

that the initial solution for this problem is easy to acquire using model-based bone segmen-

tation algorithms, such as in (Kapur, 1999). In our illustrative experiments we construct

the initial contour by uniformly expanding (5-10 pixels) the true contour superimposed on

the image being segmented. In Figure 5·10 we present results of applying different shape

priors to segment the cartilage. We compare the results given by our distribution-based

prior, statistical level-set curvature prior proposed by Leventon (see Section 2.4.2), and

curve length prior. In constructing our prior and Leventon’s shape prior we used manually

segmented cartilage boundary. In Figure 5·10, panel (A) we show the manually segmented

cartilage boundary and the initial contour used in our experiments. In panel (B) we show

the result given by our prior. Note the location of the upper right corner of the cartilage.

In our result, this corner propagates further along the cartilage boundary comparing with

the manually segmented boundary. This is the consequence of the generalizing properties

of our prior, which effectively constrains the width of the cartilage and the sharpness of

corners. In panel (D) we show a segmentation result given by applying a curve length

prior. The cartilage corners are roughly at the correct locations but the bottom part of

Page 120: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

98

the cartilage expands into the adjacent structure driven by the high image intensity area

close to the cartilage. In fact, the curve length prior effect is weak in the areas of low

curvature and can not prevent the expansion. Increasing the prior strength leads to the

rapid collapse of the contour when the curve curvature force at the tip of the cartilage

overcomes the local image force. Leventon’s prior in panel (C) has roughly the same effect

as the curve length prior. In fact, the average curvature of the level set computed on the

true cartilage shape is low. The curvature of the boundary tends to this low value of the

curvature during the evolution leading to the corner smoothing effect.

The experiments in this section illustrate the ability of our prior to yield segmentations

preserving the salient features observed in the training shapes given limited training data

and high variability in the training data. For the considered problems, the results given

by our shape prior are consistently better than the results given by the alternative choices

of shape prior in terms of the segmentation error and subjective visual similarity.

5.3 Image segmentation with occlusion

Partial object occlusion is the difficulty which is encountered frequently in different im-

age analysis applications. An example is the surgical navigation task, where anatomical

structures and surgical tools must be segmented from X-Ray or optical real-time imagery.

In this section we investigate the ability of our framework to segment objects in case of

partial occlusion. We are particularly interested in reconstructing the salient shape fea-

tures characteristic for the training data. We focus on the case of limited prior data. We

demonstrate that our prior can enforce the reconstruction of the salient features and is

flexible with respect to the location of these features in the segmented image. We further

illustrate the invariance properties of our prior.

Experiment 1

First, we consider a segmentation example where training and true shapes are nearly

identical, except for the location of one prominent feature. Our training shapes are shown

Page 121: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

99

in Figure 5·11, panel (A). The only difference between these shapes is the position of

the dorsal fin. In the true shape, the dorsal fin takes yet another position. Positions

of the dorsal fin in the training shapes and true shapes are shown in Figure 5·11, panel

(B). It can be easily seen that the prior constructed on these shapes is very difficult to

generalize using traditional approaches. The level-set based PCA approach applied to the

training shapes produces eigenshapes that reduce and grow the dorsal fin in 3 locations

corresponding to the locations of the fin in 3 training shapes. Indeed, the shortest path

to change the position of the fin in terms of the change of the boundary coordinates is to

reduce the fin in one location and grow it in another location. Among explicit boundary

parameterization based approaches, only the part based approaches can reliably encode

the change of the fin position and generalize to the new unobserved positions of the fin.

The goal of this experiment is to test the ability of our prior to learn the dorsal fin from its

different positions in the training shapes and reconstruct the fin under different occlusion

patterns.

(A)

(B)

Figure 5·11: Occlusion experiment 1. (A) - training shapes; (B) - trueobject contour (thick line) superimposed with training shapes (thin lines).This plot illustrates that the prominent feature location is different in thetrue object and in all training shapes.

Page 122: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

100

Using the ground truth contour in Figure 5·11, panel (B), we construct the bimodal

image and add IDD Gaussian noise with SNR -5dB. Accordingly, we use the data curve flow

component given by eq. 5.9. Our prior uses 2 feature functions: inter-point distances and

multi-scale curvatures. One interval S[a,b] = [0, 0.5] is used to construct the distributions

for each of the 2 feature functions. We experiment with 2 occlusion patterns. Occlusion

pattern 1 occludes the dorsal fin. In this case, we desire to reconstruct the fin in its correct

location. Occlusion pattern 2 does not occlude the dorsal fin. In this case, the position of

the dorsal fin is enforced by the data and we desire to verify that our prior does not produce

an extra fin. Our results for this problem are shown in Figure 5·12. Panel (A) shows the

initial contour obtained by averaging the distance function transforms of the three training

contours. The white dashed line in panels (B,C,D) shows the occluded region, while the

solid white line stands for the segmentation result and the black solid line corresponds to

the true contour. Panels (B) and (C) show the results for occlusion pattern 1, using our

prior and PCA prior (as described in Section 2.4.2) respectively. One can see that with our

prior, we are able to reconstruct the “missing” dorsal fin. The location of this fin is slightly

misplaced with respect to the true location. In fact, we do not have data in the occluded

region to correct the position of the fin, so we obtain an equivalently good solution given

the data. As we expected, the PCA prior does not reconstruct the dorsal fin since PCA

components constructed on 3 training shapes do not describe the new position of the dorsal

fin. Panel (D) shows the result using our prior on occlusion pattern 2. Since the dorsal fin

is reconstructed by the data, the straight boundaries in the occluded area should remain

straight and in fact, that is the case in our result. This experiment confirms that our prior

measures and enforces the “amount” of certain feature in the shape and does not lead to

spurious perturbations unexplained by the training data.

Our prior effectively generalizes to the unobserved location of the prominent shape

feature. It is also important to note that we use previously proposed general use feature

functions, namely inter-point distances and multi-scale curvatures, which are not specifi-

cally adapted to the shape used in this experiment.

Page 123: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

101

(A) (B)

(C) (D)

Figure 5·12: Occlusion experiment 1. Noisy image is shown in all fourpanels. (A) - Initial contour (dashed line); (B,C,D): Dashed line - occludedregion; Black solid line - true boundary; White solid line - segmentationresult. (B) - occlusion pattern 1, result using our prior. (C) - occlusionpattern 1, result using PCA prior (C). (D) occlusion pattern 2, result usingour prior.

Page 124: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

102

Experiment 2

In this experiment we test the ability of our prior to reconstruct the occluded single

visual feature in case of multiple similar features present in the shape. We use hand shapes

and assume that one finger in the noisy image of a hand is occluded. We use manually

segmented contours of hands as training data and test images. Four contours were used to

train the model (Figure 5·13, panel (A)), and one contour was used as the testing example.

Notice that some training examples are flipped. This will negatively affect a PCA based

model, but our model construction process is invariant with respect to mirror transform.

Again, a bi-level image model was used with -5dB Gaussian IID noise. In Figure 5·13, panels

(B) and (C), we show the results obtained using our shape distribution based shape prior,

and PCA prior respectively. In order to construct our prior we used 2 feature functions,

inter-point distances and multi-scale curvatures, using one interval S[a,b] = [0, 0.2] for each

of the feature functions. PCA prior fails to reconstruct the occluded finger since the

principle components constructed on the training shapes attempt to account for the mirror

transform applied to hand shapes rather then for finger displacements. On the other hand,

our prior is able to approximately reconstruct the occluded finger.

Experiment 3

We perform yet another test of the abilities of our model to reconstruct partially oc-

cluded shapes. A collection of aircraft contours was used in this experiment, as shown

in Figure 5·14. The shapes are very different, yet they all share easily recognizable vi-

sual features, such as 2 wings, and a fuselage. Most of the training shapes also have a

delta-shaped tail. We construct the bi-level noisy image using contour #2 and impose the

occlusion pattern obstructing part of the wing and the tail (see Figure 5·15). We attempt

to test the ability of the prior to encode and reconstruct the characteristic features for the

class of shapes, based on highly variable training shapes. We construct our prior using all

16 contours. Our results are presented in Figure 5·15. The reconstructed wing does not

lie close to the ground truth shape but represents an “average wing” for the aircraft in

Page 125: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

103

(A) (B)

(C)

Figure 5·13: Experiment 2: Segmentation with occlusion. (A) - priorshapes; (B) - result using our prior and (C) - PCA prior. Dashed whiterectangle - occluded region; Dashed white circle - initial contour; Blacksolid line - true boundary; White solid line - segmentation result.

Page 126: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

104

the training set, as expected. Notice that the asymmetric resulting contour is consistent

with our prior being insensitive to the symmetry or the lack thereof. One may consider the

feature functions that capture and enforce the symmetry in order to obtain more visually

appealing solution for this problem.

Our experiments with segmentation of occluded shape are designed to illustrate the

generalizing properties of our shape model. Our model is able to capture and reconstruct

features that are otherwise only possible to capture using explicit, part based, task specific

models.

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16

Figure 5·14: Training planes shapes.

5.4 Average shape computation

We now consider other applications of our shape prior energy formulation. We consider

tasks that do not involve image data. One task of interest is the shape interpolation

problem, that is finding the intermediate curves that can be obtained by transforming one

curve into another. Video sequence object-based resampling is one of possible applications.

Another task of interest is finding the average shape over a group of shapes, which can be

considered a particular case of shape interpolation if the average of the two shapes is sought.

Our proposed framework provides a straightforward approach to the determination of the

Page 127: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

105

Figure 5·15: Experiment 3: segmentation of plane with occlusion. Planesilhouette #2 was used to form the image. Dashed white rectangle - oc-cluded region; Dashed white smooth contour - initial curve; Black solid line- true boundary; White solid line - segmentation result.

average shape Γ of a collection of N shapes Γi. We will consider average shape computation

task in greater detail. In the context of our shape prior, our goal is to find the average

shape that would capture significant features of the shapes Γi while being close to all shapes

in some sense. We will compare our result to the average shapes computed using other

approaches and underline significant difference in using our approach.

One way to define an average shape is to find the average of the shape representa-

tions corresponding to prior shapes. This method is often used in practice (i.e. in PCA

methods); however, due to non-linearity of shape spaces, this method is typically not well

justified. Furthermore, this method inherently smoothes out the salient features of shapes

and therefore is not a good match for our goal. A better way to define an average shape

of the collection of shapes Γi is to find an intrinsic mean shape Γ that minimizes the total

distance to all shapes Γi. Of course, the resulting average shape depends on the used defi-

nition of the distance between shapes. Intrinsic mean is also known as the Karcher mean

Page 128: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

106

and can be found as (Maurel and Sapiro, 2003; Charpiat et al., 2003; Le, 1991):

Γ = argmin

Γ

N∑

j=1

d2(Γ, Γj) (5.10)

where d(Γ1, Γ2) is the distance between shapes Γ1 and Γ2. To this end, let us define the

shape distribution-based distance between two curves Γ1 and Γ2 in as follows:

d(Γ1, Γ2) =

√√√√√

M∑

i=1

wi

Λ

[H i(Γ1, λ) − H i(Γ2, λ)]2 dλ (5.11)

By substituting eq. 5.11 into eq. 5.10 it can be shown that the resulting mean curve Γ is the

curve which minimizes the distance measure between its distributions and the average of

feature distributions corresponding to prior curves Γi. Therefore, one effectively needs to

minimize eq. 4.2 using the corresponding average distributions as the target distributions

H∗(λ).

From the shape focusing experiments in Section 5.1 we saw that one shape distribution

distance based term in eq. 5.10 results in a flow that converges close to the target shape.

The importance of local minima increases with the shape complexity. Now the target

distributions are computed as averages over prior curves and these average distributions

do not necessarily correspond to any shape. One may argue that local minima may be

more significant in this situation. We will show that the flow computed using such average

target distributions still converges in the vicinity of the contour that we can consider

a visual average of the prior shapes. We show that alternative definitions of the shape

distance do not produce such visual average shape.

From the Karcher mean point of view, we seek the curve that is closest to prior ex-

amples in terms of its feature distributions’ differences. We can loosely interpret shape

distributions as probabilities of occurrence of certain characteristic features of the shape,

such as boundary segments with certain curvature values. Using this interpretation, we

seek the curve with average frequency of occurrence of these characteristic feature. We will

Page 129: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

107

see that our results appear to confirm this loose interpretation.

Experiment 1

In our first example we intend to compare the results of using our definition of shape

distance in eq. 5.11 with other choices of the shape distance definition. We will illustrate

that using traditional curve distance measures in eq. 5.10 often results in unsatisfactory

mean shapes. In Figure 5·16, we show two shape instances (solid lines), whose mean shape

we would like to find. From the visual similarity point of view, the mean of these two

triangles should be a triangle with two corners coinciding with the matching corners of

the prior shapes and the rightmost corner located somewhere between the corresponding

corners of the two prior shapes.

An example of a generic curve distance measure is the Chamfer distance defined as

d(Γ1, Γ2) =

Γ1

min||x − Γ2||ds (5.12)

where the integration is carried out along Γ1 accumulating the Euclidean distance between

current point on Γ1 and curve Γ2. In Figure 5·16 (A) we show the mean shape corresponding

to this definition of the distance. Clearly, such a mean shape is wrong from the point of

view of visual similarity.

Another often used shape difference measure based on the total area between shapes is

the Hamming distance

d(Γ1, Γ2) =

A: sign(D(Γ1))6=sign(D(Γ2))

dS (5.13)

where D(Γ1) and D(Γ2) are signed distance transforms for shapes Γ1 and Γ2 respectively.

When used in eq. 5.10, this shape difference measure yields an infinite number of solutions

for the mean shape. These solutions are located in the shaded areas in Figure 5·16 (B).

None of those shapes is a triangle with the right proportions (a perceptual mean shape).

A similar result is obtained using the Hausdorff distance (Charpiat et al., 2003). For this

Page 130: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

108

(A) (B) (C)

Figure 5·16: The average shape of 2 triangles obtained using differentshape distant measures: solid lines - prior shapes; dashed line - correspond-ing average shape; filled areas - the family of solutions. (A) - asymmetricdistance based measure; (B) - area based measure. One of the possiblesolution is shown by dashed line; (C) - our distribution difference measure(dash-dotted line - evolution result; dashed line - scaled result).

measure, the solution to eq. 5.10 is not unique, although the set of solutions includes a

perceptually correct mean shape.

Finally, we use our measure of shape difference in eq. 5.11 based on the difference

between shape feature distributions. We construct the difference measure using 2 feature

functions: the inter-point distances and multi-scale curvatures (#1 and #2). In Figure 5·16

(C) we show the result of applying our method to find the mean shape. We initialize the

solution using the result obtained for the Chamfer distance in Figure 5·16 (A). The contour

produced by the shape distribution distance is shown by the dash-dotted line. The size of

this shape is smaller than that of the original shapes due to initialization. This behavior is

consistent with the fact that our measure is insensitive to scale, translation, and rotation

by design. We manually scale and shift the resulting contour to match the position of the

original shapes for visualization purposes (dashed line). One can see that the scaled result

produces the expected “mean shape” from a visual similarity point of view.

It is important to note that using shape models (measures) that explicitly capture the

correspondence of shape features, such as the point distribution model (PDM) in (Cootes

et al., 1995), would also find an average shape with good visual similarity properties.

Our achievement is to obtain a similar good result without correspondence but only using

Page 131: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

109

the aggregate shape dissimilarity measure based on uniformly sampled information about

shapes.

Experiment 2

In our second example we compute the average shapes over two sets of aircraft contours

(we use the database from (Jaggi et al., 1999)). We demonstrate that our average shape

computation scheme is capable of capturing the salient structure shared by a group of

shapes. The two groups of shapes used in this experiment are shown in Figure 5·17, panels

A and B. In each group, the shapes are different but share certain common features. The

shapes in group A have narrow wings bent backwards at approximately 45 degree angle

and tails with a concavity in the middle. The shapes in group B have delta-shaped wings

and narrow protrusions on the front side of the wing. In order to effectively capture these

shape groups, we extend the descriptive properties of our prior. We use 4 distributions for

each of the feature functions #1 and #2. These distributions are created using four non-

overlapping intervals S of equal length. We use a non-symmetric initialization computed

as the average over the signed distance transforms of non-aligned 16 prior aircraft shapes

(dash-dotted contours). Our results show the preservation of the characteristics of prior

shapes, such as wing shape and thickness, tail shape. Resulting shapes are slightly non-

symmetric. This non-symmetry is caused by the non-symmetric initialization and the

evolution path ending at a non-symmetric local minimum. One may also notice 2 humps

on on the front side of the wing for case B. Since the wing protrusions are highly variable

in the (case B) prior shapes, our average shape is “uncertain” on the shape and location of

these protrusions, but it is important that these protrusions appear in the result at their

average locations. This result shows that our prior is capable of capturing and averaging

the global shape structure and salient features, such as localized protrusions. Since we

use the initialization that is far from the final result, we predict robustness of our average

shape solution with respect to initialization.

Experiment 3

Page 132: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

110

(A) (B)

Figure 5·17: Initial (dash-dotted contour) and average shapes (solid con-tour) for 2 groups of shapes. Prior shapes in each group are shown on thetop of each panel.

Figure 5·18: Experiment 3: Example shapes from (Klassen et al., 2004).

So far, we experimented with shape groups that were relatively similar: triangles or

plane shapes of roughly similar design. Now we attempt to capture more subtle shape sim-

ilarity, coming close to what we call perceptual shape similarity. In this experiment we use

the shapes that share only one common characteristic of being polygons (see Figure 5·19).

We test the ability of our prior to capture this salient shape property and yield the average

shape that has the structure of a polygon.

The set of shapes in Figure 5·19 have been previously used in (Klassen et al., 2004),

where the average shape is computed using the Karcher mean formula. The shape distance

measure was taken to be a geodesic length of the path connecting two shapes’ arc-length

Page 133: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

111

(A) (B)

Figure 5·19: Experiment 3: Average shapes computed on shapes in Fig-ure 5·18: (A) Result in (Klassen et al., 2004); (B) Our result.

angle function representations. Pose invariance and invariance with respect to the start-

ing point of the angle-function parameterization were intrinsicly encoded into the distance

computation. Still, this technique represents curves by angle functions and hence it suf-

fers the drawbacks of all parametric shape representations. In the process of finding the

“average” representation using Karcher mean formula, angle function representations are

effectively averaged. Hence, we may expect averaging of salient features of the shape. The

resulting average shape is given in Fig. 8 in (Klassen et al., 2004), which we reproduce in

Figure 5·19, panel (A). As expected, only one corner is present in the resulting shape. This

is the corner that occurs at the same location in all input shapes. Parts of the resulting

boundary are rounded which is not a feature of prior shape examples.

We now apply our approach to find the average shape using the Karcher mean formula

combined with shape distance measure based on shape distribution difference. Our result

is shown in Figure 5·19, panel (B). We obtain the shape with distinct corners and straight

sides. It is interesting that while we impose no constraints on the number of corners, we

obtain a mean shape with an average number of corners (our result has 5 corners and the

input shapes have from 3 to 6 corners). This result conforms to our intuitive interpretation

of the embedding of the distributional representation into the Karcher mean formula. Our

intuitive interpretation states that since our distribution based prior captures the amount

of certain feature in the shape, Karcher mean will effectively find the shape with average

Page 134: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

112

amount of this feature. Our result is also approximately symmetric, as input shapes are.

Overall, we appear to obtain an average shape with better retention of the key visual

characteristics of this collection of objects.

5.5 Summary

In this Chapter we present the experiments demonstrating the properties of the 2D shape

prior developed in Chapter 4. Our shape focusing experiments show the ability of the

prior curve flow to transform one curve into another curve resembling the prior example.

Our experiments illustrate the dependence of captured properties of shape on the chosen

feature functions. Our segmentation experiments show the ability of our prior to extract

the boundary that is visually similar to the training data shapes and superior in terms of

quantitative segmentation error measure, comparing with the boundary produced by alter-

native techniques. In segmentation with occlusion experiments, our prior shows the ability

to reconstruct missing salient features of training shapes at new locations, unobserved in

the training shapes. Alternative techniques are not capable to achieve the same degree of

generalization and robustness. In another experiment we show the ability of our prior to

find visual average of given shapes, unattainable by existing approaches.

Our shape prior methodology is shown to benefit from the key properties of shape

distribution based shape representations. Among others, invariance properties allows us

to make registration intrinsic to the model construction. Distributional nature of the used

shape descriptors is responsible for excellent generalization properties. Flexibility allows

us to use different feature functions tailored to particular applications.

Page 135: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

113

Chapter 6

Joint segmentation of multiple objects using

shape distribution based shape prior

In this chapter we focus on multiple object segmentation problems where prior information

on relative object positions can be beneficial. The work reported in this chapter previously

appeared in (Litvin and Karl, 2005a; Litvin and Karl, 2005b).

6.1 Introduction

Let us now consider segmentation problems that involve multiple structures of interest

with consistent relative positions. Such applications are abundant in medical imaging,

where multiple organs often need to be segmented simultaneously. In medical images,

multiple structures often preserve certain relative positioning across subjects and these

relationships can be used to properly constrain the solution space. Even in the case when

only one structure needs to be segmented, neighboring structures, if included in the process,

can provide additional useful information/guidance to segment the structure of interest.

Examples of such problems include knee cartilage and femur segmentation, bladder and

rectum segmentation, vertebra segmentation and so on.

Because of its potential, simultaneous multiple object segmentation is an important

direction of research in medical imaging. One can distinguish two directions undertaken

to use information about multiple objects in segmentation. The approaches in the first

group aim at constructing joint shape prior on multiple structures. In (Yang et al., 2003;

Tsai et al., 2003; Freedman et al., 2004) the authors extend the PCA shape model to

constrain multiple object locations. In (Duta and Sonka, 1998), a Point Distribution

Page 136: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

114

Model (PDM) was extended to model multiple shapes. The approaches in this group

capture information on the entire boundaries of multiple objects. However, they reflect

the advantages and drawbacks of their single object versions. For instance, generalizability

can be problematic for PCA based multiple object model as well as for PCA single object

model. Our experiments later in this chapter illustrate this point.

A different group of approaches attempt to apply constraints between multiple objects

to preserve certain properties of their boundaries relative to each other. In (Ho and Shi,

2004; Paragios and Deriche, 2000; Alexandrov and Santosa, 2005) the authors mutually

constrained shapes by penalizing quantities local to the area of contact between different

structures, such as area of overlap. Approaches in this group are only useful to prevent the

overlap of different structures. In (Shi, 2005), more general minimin criterion is applied

to evolving multiple curves. This criterion is designed to locally repel curves positioned

closer to each other than a specified distance. Approaches in this group are only concerned

about parts of curves positioned close to each other and only model/penalize one specific

quantity - distance. Therefore, these methods are limited since they take into account only

one aspect of mutual shape positioning and ignoring other information.

A different idea was proposed in (Matsakis et al., 2004) to characterize mutual positions

of shape boundaries. In this approach, the concept of a force histogram was used to compute

quantity depending on the entire two boundaries. Resulting histogram-based descriptors

were used in discrimination, while our focus is in estimation problems.

Our approach naturally extends the single object shape-distribution based shape prior

to jointly model multiple shapes. We share common idea with (Matsakis et al., 2004) in that

an effective way to capture the information about multiple objects is to compute multiple

descriptors on the entire set of boundaries. Our approach reflects the advantages of our

single object shape model. It is designed to preserve salient features of shapes, invariance,

and is implementable using curve evolution. Our approach to modeling multiple structures

naturally integrates with single object shape distribution shape prior.

Page 137: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

115

6.2 Formulation

We now distinguish two types of feature functions. For the first type of feature functions,

which we term autonomous, the values for a particular curve are computed with refer-

ence only to the curve itself. Two feature functions of this type have been introduced in

Chapter 4: inter-point distances and multi-scale curvatures. For the second type of feature

functions, which we term directed, the feature function values are computed with reference

to the curves of other objects. By incorporating directed feature functions into our shape

models we provide a mechanism for modeling the relationships between different objects

in a scene and, thus, create a framework for multi-object segmentation.

We first briefly review the principles of shape distribution based shape modeling ap-

proach developed in Chapter 4. Let Φ(ω) be a continuously defined feature on the space

Ω, where ω is the element of the space Ω. Let λ be a variable spanning the range of values

Λ of the feature. Let H(λ) be the CDF of Φ:

H(λ) =

Ω hΦ(ω) < λ

Ω dω(6.1)

where h(condition) is the indicator function, which is 1 when the “condition” is satisfied

and 0 otherwise. We define the prior energy Eshape(Γ) for the boundary curve Γ based on

this shape distribution as:

Eshape(Γ) =

M∑

i=1

wi

∫ [

H∗i (λ) − Hi(Γ, λ)

]2dλ (6.2)

where M is the number of different distributions (i.e. feature functions) being used to

represent the object, Hi(Γ, λ) is the distribution function of the ith feature function for the

curve Γ, and the non-negative scalar weights wi balance the relative contribution of the

different feature functions. Prior knowledge of object behavior is captured in the set of

target distributions H∗i (λ). These target distributions H∗

i can correspond to a single prior

shape, an average derived from a group of training shapes, or can be specified by the prior

knowledge.

Page 138: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

116

We now extend our single object prior to model multiple curves as follows. Let N

be the predefined number of objects in the image. We define a directed feature function

Φqp(ω)|p 6= q, where q, p are object indices. This feature function is computed on object

q using certain information given by object p. As in the single object case, each feature

function gives rise to a CDF computed according to eq. 6.1. The multi-object prior energy

functional is defined as

Emultshape(Γ1, ..., ΓM ) =

N∑

q=1

N∑

p=1

Zqp

M∑

i=1

wi

Λ

[

H∗iqp(λ) − Hiqp(Γq, λ)

]2dλ (6.3)

Where Zqp are the values of the interaction matrix Z. Element Zqp defines the confidence

in the prior information on the relationship between objects q and p, expressed by the

difference between directed distributions corresponding to the pair of structures. This

matrix is application specific. Generally, the objects that can be segmented well without

prior can be used to constrain those that cause problems in the absence of an additional

prior. It is worth noting that the matrix Z is not necessarily symmetric. Its diagonal values

are the weights on single object prior terms for individual objects. In this dissertation we

empirically choose the values of Z to achieve the best segmentation performance. Here we

also use the graphical interpretation of the interaction matrix. On the diagram representing

the segmented objects for a particular problem, each non-zero off-diagonal element of Z is

shown by a directed link as illustrated in Figure 6·1.

As in the case of single object prior, the choice of feature function defines the model

performance. We now introduce directed feature function #3. Feature function #3 will

represent relative inter-object distances. Let us have two contours C1 and C2. We now

assume that the feature function is defined on contour C1, using information on contour

C2. At each point s on the boundary of C1, we measure the distance between between

point s and the closest point on the curve C2, normalized by the average radius of C2 with

Page 139: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

117

Z =

0 1 10 0 10 1 0

=⇒

1

2

3

Figure 6·1: Interaction matrix graphical interpretation using directed di-agram. Three objects are sketched in the right panel with assigned objectindices. Arrows in the right panel correspond to non-zero entries in thematrix Z.

respect to its center of mass. Formally

ΦC1C2(s) =sign[DC2(C1(s))] min d(C1(s), C2)

1LC2

p∈C2d(C2(p), MC2)dp

(6.4)

where MC2 is the center of mass of the contour C2; LC2 is the length of the contour C2;

d(x1, x2) is the Euclidean distance between points x1 and x2; DC2 is the signed distance

transform corresponding to the curve C2. The indicator function sign[DC2(C1(s))] is equal

to 1 and −1 if C1(s) is outside and inside C2 respectively. The space Ω on which the feature

function is defined is the arc-length of the curve C1. Note that the prior defined using this

feature function provides a descriptor richer than those in the penalty based approaches in

(Ho and Shi, 2004; Paragios and Deriche, 2000), while being less restrictive than a PCA

based multi-object prior. Again, this feature is invariant to translation, rotation and scale

applied simultaneously to the pair of shapes C1 and C2. We illustrate the feature function

construction in Figure 6·2. For simplicity we discretize curve C1 using 6 nodes.

In practice, we use the signed distance transform to compute the values of ΦC1C2(s).

Given the signed distance transform DC2 corresponding to the curve C2, the feature func-

Page 140: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

118

tion can be computed as

ΦC1C2(s) =DC2(C1(s))

1LC2

p∈C2d(C2(p), MC2)dp

(6.5)

1

4

32

56

C2

C1D1 D2 D3

D4

Figure 6·2: Feature function #3 used in this work illustrated for a curveC1 discretized using 6 nodes. Feature values for curve C1 are defined as theshortest signed distances from the curve C2 to nodes of the curve C1.

The feature function #3 can be extended to 3D in a straightforward way. Suppose we

have surfaces S1 and S2. Let s and p denote points on the surfaces S1 and S2 respectively.

We will define the feature function on the surface S1. Feature value Φ(s) is defined as a

normalized value at s of the signe distance transform DS2 corresponding to the surface S2.

Φ(s) =DS2(s)

1AS2

p∈S2d(S2(p), MS2)dp

(6.6)

where AS2 is the area of the surface S2 and MS2 is the center of mass of the surface S2.

6.2.1 Flow computation for inter-object distance feature function

For feature function #3 introduced above, relating the curve C1 to another object C2, the

gradient flow computed using variational approach (see Appendix A) is given by:

∇E(C1)(s) = ~n(s) · ~∇DC2(s)

[

H∗

(DC2(s)

R(C2)

)

− H

(

C1,DC2(s)

R(C2)

)]

(6.7)

Page 141: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

119

where DC2(s) is the value of the signed distance function generated by the curve C2 at

the point s on the curve C1, and R(C2) is the mean radius of the shape C2 relative to its

center of mass. The details of the flow derivation are given in Appendix A.

For 3D version of the feature function #3, relating surfaces S1 and S2, the gradient

flow is given by

∇E(S1)(s) = ~n(s) · ~∇DS2(s)

[

H∗

(DS2(s)

R(S2)

)

− H

(

S1,DS2(s)

R(S2)

)]

(6.8)

where DS2(s) is value of the signed distance function generated by the surface S2 at the

point s on the surface S1, and R(S2) is the mean radius of the shape S2 relative to its

center of mass.

6.3 Experiments

In this section we show the results of applying our feature-distribution-based shape prior

to joint segmentation of multiple objects. In fact, in many problems, information about

the relative positions of the boundaries of different objects in the image provides crucial

constraints helping to achieve better segmentation. In our case, the information about

relative object positions is cast in a shape distribution framework, allowing for a unified

shape prior framework including both autonomous and directed features.

In this section we apply our shape distribution based prior to both a synthetic and a

real example. The real data example arises in segmentation of brain MRI. We compare

both single object and multi-object priors. The benefit of using a multi-object prior is

expected to be greater when the boundary of some objects is not well supported by the

observed image intensity gradient or when initialization is far from the true boundary.

Experiment 1

In the first experiment we apply our prior to a synthetic 2-object segmentation problem

with very low SNR, simulating two closely positioned organs. The ground truth image is

shown in Figure 6·3, panel (A). Both objects in the ground truth image have the same,

Page 142: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

120

(A) (B)

(C) (D)

Figure 6·3: Synthetic 2 shape example: (A) Bi-level noise free image; (B)Segmentation with curve length prior; (C) - shape distribution prior includ-ing only autonomous feature functions #1 and #2; (D) - shape distributionprior including directed feature function #3 along with autonomous featurefunctions. Solid black line shows the true objects boundaries; dashed whitelines - initial boundary position; solid lines - final boundary.

Page 143: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

121

23

41

5

6

7

8

lenticularnucleus

caudatenucleus

lenticularnucleus

caudatenucleus

(A) (B)

(C) (D)

Figure 6·4: Brain MRI segmentation: (a) Multiple structures and inter-actions used for feature function #3; (b) Segmentation with independentobject curve length prior. (c) Segmentation using multi-object PCA tech-nique in (Tsai et al., 2004) (d) Segmentation with new multi-object shapedistribution prior. Solid black line shows the true objects boundaries; solidwhite line - final segmentation boundary.

Page 144: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

122

known intensity. The background intensity is also known as in the model in (Chan and

Vese, 2001). Gaussian IID noise (SNR= -18dB) was added to this bimodal image to form

the noisy observed image. The data term Eint and the corresponding data term gradient

curve flow are formed according the data model in (Chan and Vese, 2001).

In Figure 6·3 we show the results obtained by segmenting this image using energy

minimizing curve evolution based on two different shape priors. Figure 6·3 (B) shows the

results of independent curve evolution for both contours using the common curve length

penalty as the regularizing term in the energy functional. Figure 6·3 (C) shows the results

with our shape distribution-based prior but using only autonomous feature functions #1

and #2 (inter-point distances and multi-scale curvatures); Figure 6·3 (D) shows the results

with our multi-object shape distribution prior including all 3 feature functions. The prior

target distributions for case (C,D) were constructed using the true objects in (A). The

regularization parameter was chosen in each case to yield the best subjective result. In

case (D), all 4 elements of the interaction matrix Z were set to 1.

The curve length prior result in (B) yields an incorrect segmentation for one of the

objects or leads to the collapse of one of the contours depending on the strength of the

prior. The shape distribution-based prior in (C) performs well for the second object (the

bent shape). The first object (the ellipse) can not be effectively extracted because the stop-

ping force at the boundary between the objects was insufficient. With the directed feature

function included in the segmentation functional (D), both objects can be correctly seg-

mented since the energy term corresponding to the feature function #3 effectively prevents

intersection of boundaries. Area based segmentation errors are summarized in Table 6.1.

Experiment 2

In our second example we apply our techniques to 2D MRI brain data segmentation. A

data set consisting of 12 normal adult subjects was provided to us by Dr. David Kennedy

at the Center for Morphometric Analysis of Harvard Medical School and Massachussetts

General Hospital. Manual expert segmentations of the subjects were provided and those

Page 145: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

123

Table 6.1: Symmetric difference (area based) segmentation error. For eachobject the error measure is computed as a symmetric difference betweenfinal segmented region and true segmented region. The values in the tableare computed as a sum of error measures for individual objects.

Curve length PCA method Our method

Experiment I 1092 146

Experiment II 1090 1437 758

of 11 of these subjects were used as training data to construct our shape prior. The

prior was then applied to segment the data of the omitted subject. The eight numbered

structures shown in Figure 6·4, panel (A) were simultaneously segmented. For the data

dependent energy term Eint, we used the information theoretic approach of (Kim et al.,

2002a) by maximizing the mutual information between image pixel intensities and region

labels (inside or outside), therefore favoring homogeneous regions.

In Figure 6·4 we present our results. Panel (B) gives the segmentation with a standard

curve length prior applied independently to each object. One can see that structures 1

and 4 are poorly segmented, due to their weak image boundaries. In panel (C) we present

the result given by the multi-shape PCA technique in (Tsai et al., 2004) using 5 principle

components defining the subspace of allowable shapes. The segmentation is sought as

the shape in this subspace, optimizing the same information-theoretic criteria (Kim et al.,

2002a) as used with our shape prior. The usage of the same data term simplifies the

comparison with our approach since only the shape model components of the method

are different. One can see that structures 2,5,6, and 7 are not segmented properly due

to the poor generalization by the PCA prior. Expanding the subspace by choosing 10

PCA components did not improve the result given by this method. Finally, our result

is shown in panel (D). We obtain satisfactory segmentation for the structures for which

PCA method failed (2,5,6,7), while performing equally well for structures 1,3,4 and 8. The

choice of initialization did not significantly influence our results. Segmentation errors given

in Table 6.1 qualitatively confirm the superior performance attained using our prior.

Page 146: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

124

6.4 Conclusions.

In this chapter we extend a shape distribution-based object prior to jointly model multiple

image object boundaries simultaneously. We apply the variational approach to analytically

compute the energy minimizing curve flow for the directed feature function #3 relating a

pair of objects. We demonstrate the application of our shape distribution prior to medical

image segmentation involving multiple object boundaries. In our experiments we achieved

performance superior to that obtained using the traditional curve length minimization

methods and a multi-shape PCA shape prior reported in the literature.

Page 147: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

125

Chapter 7

Shape and appearance modeling with feature

distributions for image segmentation

This chapter extends our shape distribution based shape modeling approach to coupled

shape and appearance modeling for use in image segmentation. The coupling, which is the

first contribution of our new framework, is expressed by a single shape distribution-based

term in the segmentation functional that accounts for both appearance and shape models.

Our new shape and appearance model has the ability to segment images with properties

posing difficulties for existing appearance models. Examples show the accurate placement

of boundaries in challenging situations; for example, in cases where an intensity boundary

does not exist. The work reported in this chapter has been presented in (Litvin et al.,

2006).

7.1 Introduction

In a typical curve evolution scheme the data-dependent forces and shape prior forces are

constructed from distinct principles, and lead to distinct terms in corresponding curve evo-

lution equations. In the previous chapters we concentrated on constructing prior shape

terms. Now we turn our attention to models capturing image appearance properties with

respect to the boundary. We review existing appearance modeling approaches in Sec-

tion 2.7. Such appearance models can be implicit, formulated as curve evolution forces,

driving the boundaries according to the underlying image properties. The simplest data

dependent force positions the boundary to maximize the image gradient magnitude at

boundary location. Alternatively, the popular Chan and Vese model (Chan and Vese,

Page 148: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

126

2001) attempts to maximize the uniformity of intensities in the different regions created

by the boundary. (Kim et al., 2002a) attempts to preserve uniformity in regions using

information theory criterion. Other models attempt to maximize the uniformity of certain

higher order statistics of the intensities in each region.

Attempts have been made to develop more complex forces capturing prior appearance

information. For example, the probabilistic appearance model in (Leventon et al., 2000b)

links appearance to the boundary through the distance function, but this model drives

the boundary everywhere towards a most likely intensity and curvature through a MAP

formulation. While not strictly a curve evolution method, the Active Appearance Model

(AAM) in (Cootes et al., 2001) creates detailed intensity and shape models sensitive to

boundary position. However, the AAM approach enforces match between the image and

the template over the entire model domain, requires identification of reliable boundary

landmarks in each image (effectively doing registration), and is based on PCA of observed

pixel values and boundary control points, resulting in generalization sensitivity.

In contrast, our proposed approach to prior construction is based on the concept of

shape distributions (Chapter 4), which we use to encode both prior shape information as

well as appearance information. In this chapter we extend these shape modeling results

from Chapter 4 to include joint shape and intensity priors. Our approach is based on

finding a solution that matches given distributions of shape and intensity features rather

than driving the shape towards a given set of mean values or seeking maximal homogeneity

of intensity or shape characteristics. Since our model is based on the distribution of inten-

sity, edges are not even needed to define the boundary location, which is useful in certain

segmentation contexts. Through careful feature choice, our model can be inherently insen-

sitive to many geometric transformations (scaling, rotation, etc). Our model is richer then

existing models because existing models assume and enforce uniformity of certain statistic

in the region or along the boundary, while our model attempts to model and match free

form distributions of image intensity defined features along boundaries.

Page 149: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

127

7.2 Shape distribution principles

The detailed derivation of the shape distribution principles can be found in Chapter 4.

Recall that the prior energy Eshape(Γ) for curve Γ based on shape distributions is defined

as:

Eshape(Γ) =M∑

i=1

wi

Λ

[

H∗ shapei (λ) − Hshape

i (Γ, λ)]2

dλ (7.1)

where M is the number of different shape distributions (i.e. feature functions) being used

to represent the object, H shapei (Γ, λ) is the distribution function of the ith shape feature

function, and the non-negative scalar weights wi balance the relative contribution of the

different feature distributions. Prior knowledge of object shape behavior is captured in

the set of target distributions H∗ shapei (λ). These target distributions H∗ shape

i (λ) can

correspond to a single prior shape, an average derived from a group of training shapes,

or can be specified by prior knowledge (e.g. the analytic form for a primitive, such as a

square).

We used two specific shape feature functions in our experiments presented in Chapter 4:

• Feature function #1. Inter-point distances

• Feature function #2. Multi-scale curvatures

7.3 Extension to combined intensity and shape priors

Now we describe our extension to include prior appearance information. In our approach,

we compute the distributions of intensity-based features measured parallel to the boundary

of the shape. We then seek a solution whose distribution matches this prior distribution,

rather than seeking a solution whose intensities occur at maximum of the distribution.

This approach does not require uniformity of region intensity properties and appears to

have good generalization properties. To find a solution, the dissimilarity of observed and

prior intensity distributions is used to create a curve evolution force using a variational

approach.

Page 150: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

128

To generate our appearance model, let us first define an orthogonal coordinate system

W. We will align this coordinate system with the tangent and normal to the boundary

and measure image intensity values in a rectangular patch defined by W, as illustrated in

Figure 7·1. Let O be the origin of this coordinate system W. Let xij ∈ W be a sample

i

j

j

i

Γ( )s

Γ( )s

O

O

Figure 7·1: Image patch based feature values measured along the bound-ary. Point O (patch coordinate system origin) is positioned at Γ(s) (currentboundary point). j-axis is aligned with local inward normal. Two instancesare shown.

point with coordinates i and j. We choose a set of such points S = x1, x2, ...xm. There is

no restriction on this set of points, but currently we make the set symmetric with respect to

the j-axis and include O in the set. Each point xk = (ik, jk) in S gives rise to an intensity

function corresponding to the trajectory of the point xk as the coordinate origin is moved

around the boundary. A typical trajectory is shown as the dotted line in Figure 7·1. Let

Φk(s) be an associated feature function that is computed from this intensity function for the

k-th point trajectory, where s is arc-length around the boundary. The simplest approach,

and what we have done to date, is to simply use the intensity values along the trajectory

themselves, so the k-th intensity feature function is given by:

Φk(s) = I(xk, s) = I(

Γ(s) + R (s,n(s)) [ik jk]T)

(7.2)

Page 151: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

129

where I is the image, Γ is the boundary parameterized by arc-length s, n(s) is the local

normal and R(s,n(s)) is the 2D rotation matrix aligning n(s) with j-axis of the coordinate

system W. Similarly to our distribution-based shape model, we then generate the CDF

of each such intensity feature H intk (Γ, λ), following eq. 4.1. The collection of these M

distributions is the basis of our intensity appearance model.

We now define the intensity (data) energy Eintensity(Γ) as the difference measure be-

tween target intensity feature distributions and intensity feature distributions correspond-

ing to the curve Γ.

Eintensity(Γ) =∑

k:xk∈S

∫[H∗ int

k (λ) − H intk (Γ, λ)

]2dλ (7.3)

where H∗ intk are computed by averaging distributions corresponding to the training seg-

mented contours and underlying images. We combine this intensity or appearance model

with the shape model to define an overall intensity and shape prior curve energy:

E(Γ) = Eintensity(Γ) + Eshape(Γ) (7.4)

where Eshape is given in eq. 7.1 and includes only shape terms.

We aim at driving the curve Γ to minimize this energy by finding an appropriate curve

flow through variational principles. Using the above definition of the feature function, the

curve flow at location s minimizing eq. 7.4 is given by (see Appendix B for derivations).

∇E(C)(s) = ∇Eshape(C)(s) + (7.5)

k:xk∈S

n(s) · ∇I(xk, s)[

H∗ intk (I(xk, s)) − H int

k (I(xk, s))]

where I(xk, s) is defined in eq. 7.2.

7.4 Results

Our approach is designed to model directly the intensity distributions around the boundary

of the object of interest. It makes no explicit assumption about the presence of edges and/or

Page 152: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

130

image gradients nor does it enforce uniformity of region statistics. Therefore, we expect

our approach to show the greatest advantage over traditional approaches when modeling

objects without prominent edges and with similar region statistics inside and outside the

region of interest. These represent some of the most difficult segmentation cases. Region

and gradient-based segmentation methods will not work on such problems.

Figure 7·2: Example 1. Segmentation with shape/intensity distributionprior. True shape - black solid line; Initial contour - dashed line; Finalsegmentation contour - solid while line.

Experiment 1

In our first example we construct a synthetic such segmentation problem without a

gradient-based boundary and with matching first and second order intensity statistics in

the interior and exterior regions. In Figure 7·2 we show the observed image with the desired

true boundary indicated by a solid black line. By construction, the intensity gradient

is uniform across the image except for the two diagonals. Therefore, any segmentation

method that relies on gradient information to localize the boundary will have difficulty

on this image. In addition, the mean intensity and variance of the intensity inside and

outside of the object boundary are the same, causing problems for typical region based

Page 153: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

131

methods. To apply our method we use an intensity appearance model constructed using

only 5 points for the set S. The coordinates (i,j) of the points in the set S are (-10,0),

(-5,0), (0,0), (5,0) and (10,0). These points are illustrated in Figure 7·3 for one position

of the coordinate system. Notice that only the central point (0,0) coincides with the

boundary. The distribution corresponding to the point (0,0), constructed on the image

and the intended boundary in Figure 7·2 contains a single impulse (the intensity along the

boundary is constant). Distributions corresponding to other four points are more complex

because the trajectories traced by these points do not follow the constant image intensity.

The shape term Eshape in eq. 7.4 was not used in this case to focus on the behavior of

intensity prior. The solid white line in Figure 7·2 shows the corresponding segmentation

result. The segmenting contour matches the true contour.

j

(0,0) (5,0) (10,0)(−10,0) (−5,0) i

Figure 7·3: Five points x(k) used to construct feature functions accordingto Eq. 7.2 in Experiment 1.

One may suggest that the distribution corresponding to the point (0,0) is seemingly

enough to yield the correct solution, since it describes the constant intensity on the bound-

ary. When guided by the single distribution difference term corresponding to the point

(0,0), boundary points in bright areas will move towards darker region, and vice-versa,

until they reach the true boundary intensity value. However, in the process of evolution,

some boundary points are positioned over the transition regions (diagonal edges), where

the average intensity matches the intensity on the intended boundary. These boundary

points will not move, preventing the convergence. This situation is corrected by other four

points used. Intuitively, more points increase the descriptive power of the model, which

helps to prevent potential ambiguities and local minima. Instead of matching the intensity

distribution along the trajectory, we are matching the intensity distributions in the image

Page 154: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

132

patch centered on the boundary and aligned with the local normal. In the current example

we use 5 points in the patch.

This example shows the effectiveness of our prior on a segmentation problem where

intensity is constant along the intended boundary and the intended boundary is not sup-

ported by edges.

Experiment 2

In our second example we construct an even more difficult problem based on the same

image as in the previous example. In Figure 7·4 we show the observed image with the

desired true boundary indicated by a solid black line. This intended boundary now crosses

the image in such a way that the distribution of intensity along the boundary is uniform.

Still, the first and second order statistics in the interior and exterior regions are the same,

causing problems for typical region based methods. Since the intended boundary does

not correspond to image gradients, boundary gradient based methods are not applicable.

Distributions of intensity perpendicular to the boundary on this image are also uniform,

so the method in (Leventon et al., 2000b), trying to maximize the distribution of intensity

evaluated on the boundary, will not move the boundary. Our result is shown in Figure 7·4.

It is notable that the resulting shape is nearly square which is a sign that our prior on

intensity features also captures some shape information through the coupling between

shape and intensity. However, the location of the resulting shape is shifted significantly

with respect to the true location. We conclude that using only intensity features is not

sufficient to capture shape and the shape/image relationship in this case. Intuitively, there

is more ambiguity in the direction to move the boundary to match uniform distributions

comparing with the previous case, where the distributions are nearly impulses and imply

the unique direction to change each feature value and hence the unique direction to move

the boundary.

Experiment 3

We now consider the same example but combine the intensity prior with the shape

Page 155: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

133

Figure 7·4: Example 2. Segmentation with shape/intensity distributionprior. True shape - black solid line; Initial contour - dashed line; Finalsegmentation contour - solid line.

prior term. For the shape prior term Eshape we use the shape distribution energy in eq. 7.1

using 2 feature functions: inter-point distances and multi-scale curvatures. The prior shape

distributions H∗ shapei (λ) were computed using the intended boundary. The result is shown

in Figure 7·5 (B). In Figure 7·5 (A) we combine only shape distribution prior (no intensity

defined distributions) with maximum mutual information data term (Kim et al., 2002a).

The solid white line in Figure 7·5 shows that the combination of our shape and intensity

prior (B) recovers the boundary quite well, while the alternative data term combined with

the shape prior produces the correct shape but at the wrong location and orientation

(A). This example stresses the effectiveness of our new intensity prior on such challenging

edgeless segmentation problem and also emphasizes the advantage of combining our shape

and intensity priors.

Experiment 4. Real data.

In our next example, we segment the lenticular nucleus structure in an axial slice of

a brain MRI. A data set consisting of 12 normal adult subjects was provided to us by

Page 156: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

134

(A) (B)

Figure 7·5: Example 3. (A) - Segmentation with shape distribution priorand maximum mutual information data term (Kim et al., 2002a); (B) -Segmentation with shape/intensity distribution prior and shape distributionprior. True shape - solid black line; Initial contour - dashed line; Finalsegmentation contour - solid white line.

Dr. David Kennedy at the Center for Morphometric Analysis of Harvard Medical School

and Massachussetts General Hospital. Segmentations of the lenticular nucleus structure

by experts often include sub-regions with different average intensity and typically does

not follow the strongest perceivable edge everywhere. For example, in Figure 7·6 we show

a lenticular nucleus from the data set that has been segmented by an expert. Inside

the object, one can distinguish areas with significantly different intensity and the expert

segmented boundary is not aligned with the strongest gradient everywhere.

We desire a segmentation that is as close as possible to the expert segmentation. For

our approach, we construct both our prior appearance and shape models based on 11

training images containing shapes segmented by experts. The intensity model is again

based on the 5 sample points (-10,0), (-5,0), (0,0), (5,0) and (10,0) shown in Figure 7·3.

The prior shape model in eq. 7.1 is again based on the two feature functions (inter-point

distances and multi-scale curvatures) introduced in Chapter 4. Both the prior shape and

intensity distributions H∗ shapei (λ) and H∗ int

i (λ) are obtained by averaging the distributions

Page 157: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

135

Lenticularnucleus

Figure 7·6: Expert segmentation of the left lenticular nucleus showing vari-ation in intensity within the structure and lack of a consistent gradient alongthe boundary.

of the 11 samples in the training set. To test the model, the segmentation is performed

on an image that is not included in the training set. In Figure 7·7 (A), we show the

segmentation result. For comparison, in Figure 7·7 (B) we show the segmentation result

obtained using an intensity term which maximizes the mutual information between the

intensities and segmentation labels (see (Kim et al., 2002a)) and our shape distribution-

based shape prior given by eq. 7.1. Our shape and intensity model results in a very close

match between the segmented boundary and the expert drawn boundary except at the

lower tip of the structure. The result using the alternative data term segments only the

darker part of the structure, resulting in the large overall mismatch, as expected. This

comparison demonstrates that in this medical example, homogeneity of region properties

is not a good metric for segmentation, while our intensity and shape prior provides a more

general and effective tool.

Experiment 5

We present yet another example of the image segmentation problem that can be ap-

Page 158: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

136

(A) (B)

Figure 7·7: Example 4. (a) Segmentation with shape/intensity distribu-tion prior. (b) Segmentation with only shape prior and intensity model in(Kim et al., 2002a). True shape - black solid line; Initial contour - dashedline; Final segmentation contour - solid line.

proached using our shape and appearance distribution prior. LADAR range images are

produced by raster scanning the ground target area from an airborne platform. At the

first step, a typical automatic target recognition system (ATR) performs preprocessing to

suppress range anomalies. At the second step, the image segmentation algorithm is applied

to the image to locate target boundaries. At the final step, the segmented boundaries are

used for classification and recognition. The overview of the LADAR system is given in

(Greer et al., 1997) and references therein.

A typical LADAR range image of a tank is shown in Figure 7·8. The range values are

coded by color with darker shade of gray corresponding to closer range. Two challenges in

segmenting such an image are apparent. First, the background intensity changes continu-

ously across the image, undermining the intensity threshold strategies. In fact, no single

or multiple threshold can be found to separate the target from the background. Second,

the edge is only present around the upper part of the target. The lower part of the target

blends with the background. This makes edge based segmentation methods inapplicable

Page 159: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

137

Figure 7·8: LADAR image of a tank.

to this problem.

A working approach to segment this range imagery is to model the background intensity

parametrically, for example using spline models. The target intensity can be modeled

using another spline model. The parameters of the background and target models can

be estimated jointly with the boundary, using EM algorithm. One drawback of such an

unsupervised approach is the high uncertainty of the resulting boundary at the lower edge

of the target, where the background and the target intensities match. Such an uncertainty

can be detrimental for the target recognition algorithm.

The idea in applying our shape and appearance prior is that the distribution of in-

tensities around the boundary can capture the existence of the discontinuity and blending

regions without parametric modeling of the background and target intensities. The down-

side is that our approach needs training.

Due to the lack of real data, we test our algorithm on semi-synthetic images. First, we

construct the synthetic background image with linearly varying range/intensity as in the

image in Figure 7·8. We generate 5 tank poses (using real tank image), and superimpose

them on the background, while preserving the intensity blending along the bottom of the

target. Finally, we add Gaussian IID noise. We obtain 5 images, similar to the real image

in Figure 7·8. We further hand segment each of these synthetic images. Four images and

corresponding segmentations are then used to construct our shape and intensity feature

distribution prior, and the fifth image is used to apply our segmentation framework. Our

Page 160: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

138

results are shown in Figure 7·9, panel (A). The result matches the true segmentation quite

well. In Figure 7·9, panel (B), we show the segmentation obtained using only shape prior

term in eq. 7.1. Shape feature functions used were the same as used in case A. For the

data term we used double threshold curve force given by

Fdata(s) = |I(s) − m| − δ (7.6)

where I(s) is the image intensity at the point s on the boundary, m is the average target

intensity and δ is half of the range of the target intensity, determined from the data. The

curve evolves outwards if the intensity on the boundary I(s) is in the range [m− δ, m + δ]

and inwards if I(s) is outside of this range. Resulting flow tends to enclose the region with

intensities matching the intensity of the target. Force Fdata is used instead of −∇Edata

in curve evolution. Obviously, this term leads to the contour leakage in areas where the

intensity of the target matches the intensity of the background. The shape prior competes

with this effect. The resulting contour approximately preserves the elongated target shape,

biased by the expansion force in the right hand side of the target. Our comparison in

case B represents the best ad-hoc region-based curve force constructed without parametric

modeling of the intensities. Our result in case A shows that the real imagery with partially

edgeless boundaries and complex backgrounds can be effectively segmented using our shape

and intensity priors.

7.5 Multivariate distributions extension

So far, we considered 1D distributions of feature functions extracted from shapes. These

distributions were used in independent terms in the prior energy formulation. Since our

feature functions were defined on different spaces Ω (such as combinations of 2 points on

the boundary or combinations of 3 points on the boundary), we were limited to consid-

ering only 1D distributions. In case of intensity dependent feature functions proposed in

this chapter, the situation is different. In fact, the values of intensity obtained as the

coordinate system moves along the contour are the random processes in the same space

Page 161: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

139

(A) (B)

Figure 7·9: Semi-synthetic tank image segmentation with intensity andshape prior: (A) - intensity and shape prior; (B) - shape prior and thresholdintensity term in eq. 7.6. True shape - solid black line; Initial contour -dashed line; Final segmentation contour - solid white line.

Ω, where Ω is the arc-length along the curve. Therefore, by considering these processes

as independent we made an unjustified assumption. Moreover, these processes are cer-

tainly correlated in general case. Therefore, we need to design the prior that is capturing

these dependencies. To this end we propose to construct m-dimensional CDFs, where each

dimension corresponds to one feature function. We now may impose the prior as the differ-

ence between m-dimensional CDF computed on prior shapes/images and m-dimensional

CDF corresponding to the evolving contour. Formally, the mD version of the energy is

E =

∫ ∫

...

λ∈Λ

[H∗ int

mD (λ) − H intmD(Γ, λ)

]2dλ (7.7)

where λ = λ1, λ2, ...λm.

In Appendix C we derive the curve flow corresponding to the intensity based feature

functions. We have implemented the 3 feature function version of this flow for the ex-

periments conducted in this chapter but did not see any significant difference in results

for these cases. However, for more feature functions and/or different feature function

Page 162: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

140

definition, more exact full mD definition of the energy functional presented here can be

advantageous. For example, shape feature function defined on the boundary (such as lo-

cal curvature) can be combined with intensity feature functions defined on the boundary

to efficiently encode such prior knowledge as “the corners must be darker then straight

boundary segments”.

7.6 Summary

We develop a novel joint shape and appearance modeling framework and present first re-

sults. The method is based on the concept of shape distributions and allows the capture of

salient shape and appearance properties in images. The method can work where current

popular approaches, based on maximizing uniformity of region properties or seeking max-

imal gradient magnitude, do not. Many challenging segmentation problems possess such

properties.

The next possible step is to combine our shape and appearance modeling framework

with multi-object shape modeling framework considered in Chapter 6.

Page 163: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

141

Chapter 8

Shape-Based Classification and Morphological

Analysis using Medial Axes and Feature Selection

In previous chapters we considered the curve evolution object-oriented boundary extraction

tasks. We now switch to boundary inference tasks. We assume that image segmentation

has been carried out and we have segmented shapes available for further analysis. We are

interested in existence of significant morphological differences in Corpus Callosum (CC)

shapes due to gender and Schizophrenia disease. We also propose an intuitive way to

investigate and visualize significant morphological differences detected. Another particular

interest is to develop, train and test classifiers on brain Corpus Callosum (CC) shapes.

8.1 Introduction

Corpus Callosum is one of the most studied human brain structure. There is interest

in understanding population-based differences as well as performing individual diagnosis.

Area or volume have been widely used for such tasks in the past, but only to capture simple

anatomical differences. Our belief is that with more detailed shape information available,

significantly more can be accomplished. Corpus Callosum shapes were previously studied in

(Bookstein, 1997) using Procrustes analysis on a landmark representation. Some inter-class

separation was reported between Normal and Schizophrenia subjects’ shapes. A support

Vector Machine approach was used for CC classification of Normal and Schizophrenia

subjects in (Galand et al., 1999). Ad-hoc features extracted from the CC shape were

used for dyslexia detection in (Duta, 2000). Elastic deformation transformation was used

to describe CC shapes in (Davatzikos et al., 1996). Male versus female differences in

Page 164: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

142

shape were reported. Analysis and visualization of morphological differences have been

approached in (Martin et al., 1998).

Our approach uses rich, skeleton-based representation of CC shape similar to ones used

in (Galand et al., 1999), while our skeleton extraction scheme is different and is described

in detail in the following sections. Such a representation contains considerably more infor-

mation than just shape area. Since our major task of interest is to understand differences

between CC shapes in different groups, we need to focus this information by reducing

the dimensionality of the feature space. Often used PCA analysis for feature reduction is

not designed to find optimally discriminative features but focuses on the directions of the

largest variation. To this end, we present statistically-based methods of measuring the class

separating properties of each feature. We use these measures to develop a new approach for

identifying and visualizing inter-class shape differences, which highlight areas of significant

shape variation between classes. Our goal is to understand how and why different groups

of shapes are different, which can drive further research (eg. a bump in a region might be

tied to a particular disease). We demonstrate our methods by analyzing shape differences

in the corpus callosum of 68 young adults due to gender and schizophrenia.

We now consider our second task of interest, automatic classification of CC shapes.

Typical approaches to shape classification can be divided into 2 groups.

• Clustering based on definition of shape distances, see (Klassen et al., 2004).

• Classification based on features encoding shape or extracted from shapes, see (Galand

et al., 1999; Galand et al., 2000; Bookstein, 1998; Yushkevich et al., 2003).

In this dissertation we use the second type of approach to classify shapes. We choose to use

reduced feature sets to construct classifiers, where we use feature set reduction methods

similar to the methods proposed to approach the first task of interest (morphological differ-

ences detection). We construct both traditional linear classifiers (Duda et al., 2001) as well

as classifiers based on ada-boosting (Duda et al., 2001; Freund and Schapire, 1999). We

compare our results in classification performance and morphological differences detection

Page 165: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

143

with previously published results.

In the following sections we first describe used medial-axis-based methods of feature

extraction, followed by the description of our feature selection approaches. Next, we present

tools to potentially aid in understanding morphological differences and their localization.

Finally, we present and analyse our shape classification results.

8.2 Data and skeleton-based feature extraction

Our data set includes 68 manually segmented Corpus Callosum boundaries in a single

sagittal MRI slice of healthy young male and female subjects and 35 segmented Corpus

Callosum boundaries of schizophrenia patients. This data set was provided to us by Dr.

David Kennedy at the Center for Morphometric Analysis (CMA) of Harvard Medical School

and Massachussetts General Hospital. Resolution of each image is about 700 by 300 pixels.

All images were acquired and segmented by CMA research group at Massachusetts General

Hospital.

We desire a shape representation that is naturally related to neuro-anatomical brain

morphology. We have focussed on shape skeletons as a rich way of characterizing such

shape features (Tari and Shah, 2000; Galand et al., 1999). If a shape is viewed as a

collection of connected ribbons in the form of protrusions and narrow necks, then the

medial axes of these ribbons constitute branches of the shape skeleton. Skeleton-based

shape representations are attractive for a number of reasons. Skeletons provide a natural

representation of the shape in terms of a series of components or parts. In addition, changes

in parts of a shape can be reflected in corresponding changes to parts of the skeleton.

This property of the representation allows focal shape differences to be parsimoniously

captured, identified, and focussed upon. Finally, the representation can easily be made

complete, providing a one-to-one mapping between a given shape and its features. The

main challenge with the use of shape skeletons is the determination of the skeleton of a

shape which is robust with respect to noise and natural variation.

We used two methods to extract the skeleton representation of the shape, namely, fixed

Page 166: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

144

topology skeleton (Galand et al., 1999) and nested local symmetry set method (Tari and

Shah, 2000). Below we provide details on both methods.

8.2.1 Fixed topology skeleton

The fixed topology skeleton extraction method, proposed in (Galand et al., 1999), fixes

the branching structure of the skeleton graph and finds the optimal skeleton given this

fixed structure. We use a single branch skeleton structure, which adequately describes the

elongated shape of the Corpus Callosum.

The method starts from the signed distance function D representation of the shape

using the fact that the skeleton is the set of ridge points of the distance map. D is

computed using a fast marching method. Let us assume that the end points of the skeleton

are fixed and the skeleton is initialized by a straight line connecting the end points. We

uniformly sample the skeleton, obtaining the set of points X = xn|n ∈ [1..N ]. We now

employ a curve evolution approach with explicit curve parameterization and fixed boundary

conditions. We evolve the curve using the following update rule for all but the first and

the last points in the set:

xit+1 = xi

t + σ∇D(xit) − ki

tni

t i ∈ [2..(N − 1)] (8.1)

where xit are points at time t, ∇D is the gradient of the distance function, σ is the dis-

tance function smoothing operator (providing stabilizing effect in the neighborhood of the

solution), kit is the curvature of the skeleton at point xi

t, and ni

tis the normal direction at

point xit. The first term in the update equation 8.1 moves the curve towards the ridge of

the signed distance function, while the second term regularizes the evolution. The curve is

resampled every few iterations to preserve the uniform sampling. The update stops when

the curve starts oscillating around a stationary point.

In order to find good skeleton end points, an additional higher level of optimization is

carried out. A criterion of skeleton goodness is devised in the following way. For a given

skeleton, for each point in the set xt we find the inscribed circle of the maximum radius,

Page 167: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

145

X

X

1

N

Xk

Figure 8·1: CC shape sketch is shown along with a skeleton found usingtrial end points. Maximum radius circle centered at xk is shown.

which is centered on the given point and is entirely contained within the interior of the

shape (see Figure 8·1). For the point xk the circle radius is given by

Rk = max

argmin

r

k∈[1..N ],||x−xk||<r

h(D(x))

(8.2)

where h(x) is the indicator function which is 1 when x > 0 and 0 otherwise. The badness

measure of the skeleton is then defined as the area of the shape that is not contained in

any circle. This measure is given by

E = A − Uk∈[1..N ]Cir(Rk) (8.3)

where A is the interior of the shape and Cir(Rk) is the inscribed circle centered at the node

xk of the skeleton. Skeleton extraction is repeated for different combinations of end points

on the boundary of the shape. The measure in eq. 8.3 is evaluated for each combination and

the skeleton with the smallest E is stored as the final result. The fixed topology extraction

algorithm was applied to our dataset and yielded a satisfactory solution in most cases. An

Page 168: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

146

example of the extracted skeleton is shown in Figure 8·2. In Figure 8·3, we show extracted

skeletons from male subjects’ segmented Corpus Callosum shapes.

Figure 8·2: Extracted fixed topology skeleton. Circles represent the sam-pled discrete points on the skeleton. Outside border is the segmented Cor-pus Callosum shape. Dark regions near the border comprise the skeletonbadness measure in eq. 8.3

8.2.2 Nested local symmetry sets method

Robust and stable skeleton extraction is notoriously difficult. This led to less interest in

skeleton after initial promise. (Tari and Shah, 2000) developed a way to produce robust

skeleton that is stable in face of noise and minor boundary variation. The method first

computes candidate skeleton branches followed by reconnection and pruning processes.

The algorithm then interpolates a smooth function inside the shape boundary by solving

Page 169: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

147

an elliptic PDE. This smoothing process controls the effects of noise and minor variation.

The ridges and the valleys of the graph of this smooth function determine the axes of local

symmetry of the shape. The ribbon structure of the shape is determined by identifying the

medial axes of the ribbons among the axes of local symmetry. An important consequence

of the smooth interpolation process is that the resulting skeleton is not connected, since

the medial axes of protrusions do not meet the main axis of the shape, as they do when the

skeleton is obtained directly as ridges of the distance transform. As a result, the task of

pruning nonessential branches, becomes considerably easier. Adaptive, top-down pruning

of the extracted skeleton to eliminate false branches and robustly focus on desired structure

is then carried out. More details of the technique used are given in (Tari and Shah, 2000).

In Figures 8·3 and 8·4 we compare extracted skeletons from male subjects’ segmented

corpus callosum shapes using fixed topology and nested symmetry sets methods respec-

tively. One can conclude that the two methods provide very close solutions for the skeleton.

We performed classification experiments using skeleton features extracted using both al-

gorithms and obtained close performance results. In the following sections we present the

classification and inter-class difference visualization results using the second, more theo-

retically sound method of extracting the skeleton of the shape.

8.2.3 Feature extraction

The fixed topology skeleton extraction approach gives one branch skeleton which is directly

used to extract features. In order to extract features using the second, nested symmetry sets

skeleton extraction approach, we only use the dominant branch. Once we have a single

skeleton branch obtained by either of the 2 skeleton extraction techniques, the skeleton

is uniformly sampled using 20 nodes (including end nodes) xn|n ∈ [1..20]. A set of 37

shape features X is obtained from 18 samples of the relative skeleton angle change θi/L, the

relative shape width Rk/L, and the overall skeleton length L, as illustrated in Figure 8·5.

The shape width features are obtained as radii of maximum inscribed circles. The relative

Page 170: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

148

Male

Figure 8·3: Skeletons obtained from male subjects using the fixed topologymethod.

Page 171: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

149

Male

Figure 8·4: Skeletons obtained from male subjects using the nested lo-cal symmetry sets method. Principle and secondary skeleton branches areshown.

Page 172: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

150

θri i

Figure 8·5: Features extracted from the medial axis

angle change features are computed as

Xk∈[1..18] = angle(

xk+1 − xk,xk − xk−1

)

(8.4)

In order to use these parameters in statistical tests, we perform 2 successive normal-

ization steps for each feature i

1. Average value over the whole dataset of M subjects is subtracted from each feature

value

(xij)

′ = xij −

1

M

M∑

k=1

xik (8.5)

where xij is the original feature value i for subject j, and (xi

j)′ is mean subtracted

feature value.

2. Each feature value is scaled with the inverse of the standard deviation computed on

the whole dataset set for that feature

(xij)

′′ =(xi

j)′

1M

M∑

k=1

[(xik)

′]2

(8.6)

where (xij)

′′ is the final mean subtracted and normalized feature value.

These 2 steps insure equal significance of the linear classifier coefficients when used for

feature selection and shape difference localization experiments.

Page 173: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

151

8.3 Inter-class shape differences. Detection and visualization.

Our first task of interest is to pinpoint significant inter-class shape differences. To this end,

we developed a method to highlight shape features showing significant differences between

classes. We first choose the criterion of individual feature inter-class variability (feature

ranking score). By plotting the mean values of the features tagged by this variability

measure we can quickly and intuitively display and identify areas of difference between

classes. The geometrical nature of the skeleton-based features makes it straightforward to

translate feature domain differences directly into intuitive morphological class differences.

We experiment with several ways of ranking the feature differences. The first approach

uses the inverse of the t-test based p-value. The second approach uses average MMSE

classifier weights. We now describe these methods in greater detail.

We use three methods to compute the feature ranking score:

1. The score is a p-value computed on feature values for 2 classes. p-value is a probability

that the observed data occurred by chance given that the observations actually come

from the distributions with equal means. A small p-value indicates a high chance

that the probability distributions of the feature observations are different between

the two classes.

2. We repeatedly split the data into training and testing subsets and train the linear

MMSE classifier on the training subset using all 39 features. The average absolute

classifier weight is then computed as am = 1N

N∑

i=0|am

i |, where N is the number of

splits, ami is the weight on the m-th feature in decision boundary computed for the

split i. Resulting a is used as the ranking score. Higher values mean stronger feature

discrimination property.

3. We first fix the number N of features to be a small number, presumably a number

maximizing the classifier performance. We repeatedly split the data into training

and testing subsets and for each split we perform the feature selection procedure

based on p-value (first method). We then compute how many times each feature was

Page 174: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

152

selected over performed multiple data splits. The normalized count for each feature

represents the relative probability of a feature being selected for the given number

N of features and is used as a ranking score. Similar method had been reported in

(Yushkevich et al., 2003).

An example of our score-based image-domain representations of inter-class shape differences

is shown in Figure 8·6 (male/female classes, p-value based ranking score). In the top panel

we show the average shapes for both classes. The skeleton for each category was computed

by averaging the angle features in each category and the shape widths (represented by

bars orthogonal to the skeleton) were computed by averaging the width features in each

category. In the bottom panel we show the average shape for both classes combined. Each

circle represents a node of the skeleton. The size and color maps on the right hand side

relate the ranking score values to the size and color of the bars and circles. The darkness

and size of each circle or bar on the average shape indicates the relative importance of that

angle or width feature, respectively.

In Figure 8·6 we show our image-domain representation for the male and female shape

classes using p-value based ranking score. In Figure 8·7 we show the same analysis for

the normal-schizophrenia discrimination problem. In Figure 8·8 we show the inter-class

differences representation obtained using MMSE ranking score for male-female and normal-

schizophrenia discrimination problems. In Figure 8·9 we show the inter-class differences

representation obtained using the third ranking score method for normal-schizophrenia

discrimination problem. The number N of selected features was set to 6, since for this

number of feature we obtain the best classifier performance (see next section).

One can notice that feature importance grading using the different methods provides

only partially consistent results. Let us first compare the first (p-values score) and second

(MMSE average weight score) methods. Fewer features are identified using the second

ranking score method. Consider the male/female classification problem. From Figure 8·6

(left panel), one can observe the significant difference in upper hook curvature of the mean

shapes. This difference is reflected in the low p-values corresponding to angle features

Page 175: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

153

−100 0 100 200 300 400

0

100

200

300

400

500

Average medial axis

malefemale

−100 0 100 200 300 400 500 600

0

100

200

300

400

500

600

700

0

0.1

0.2

0.30.4

0.6

0.8

1

0.01

size − color

p−value

p(ob

serv

e by

cha

nce

| mea

ns a

re e

qual

)

Figure 8·6: Male and female Corpus-Callosum differences and importanceof individual features using p-value based feature ranking

Page 176: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

154

−100 0 100 200 300 400

0

100

200

300

400

500

Average medial axis

normal controlschizophrenia

−100 0 100 200 300 400 500 600

0

100

200

300

400

500

600

700

0

0.1

0.2

0.30.4

0.6

0.8

1

0.01

size − color

p−value

p(ob

serv

e by

cha

nce

| mea

ns a

re e

qual

)

Figure 8·7: Normal and Schizophrenia Corpus-Callosum differences andimportance of individual features using p-value based feature ranking

Page 177: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

155

−100 0 100 200 300 400 500 600

0

100

200

300

400

500

600

700

0

0.2

0.4

0.6

0.8

1

size − color

Classifier Weight

−100 0 100 200 300 400 500 600

0

100

200

300

400

500

600

700

0

0.2

0.4

0.6

0.8

1

size − color

Classifier Weight

Figure 8·8: Feature importance visualization using linear classifier weightas the feature ranking score. Top: Male/female case; Bottom: nor-mal/schizophrenia case.

Page 178: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

156

0 200 400 600

0

100

200

300

400

500

600

700

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

size − color

selection freq.

P(n

)Figure 8·9: Feature importance visualization using feature selection fre-quency as the feature ranking score. Normal/schizophrenia case.

(right panel). On the other hand, the classifier weight method in Figure 8·8 does not seem

to show the importance of these features. Similar observations can be made regarding

the normal/schizophrenia classification case. The exact reason for such discrepancies is

unclear, but from our results it appears that the p-value based feature ranking score is

more consistent with observed average shape differences in Figures 8·6, 8·7. p-value based

score only quantifies the means difference. If the variances are different in populations for

both classes, the p-value computed here can not be directly interpreted.

Not surprisingly, the third feature ranking method provides consistent feature impor-

tance grading with that given by p-values directly (first method). The value of this former

method is in the fact that we use near optimal number of features selected from the pool

of features, rather then grading all features.

Results in gender related differences in CC shapes reported in (Davatzikos et al., 1996)

only partially matched our results. Our results and results in (Davatzikos et al., 1996)

show the difference in isthmus and anterior corpus callosum but differences in splenium

are not reflected in our results. Much smaller dataset was used in (Davatzikos et al., 1996)

Page 179: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

157

which can be a reason for the difference. We observe differences between our results and

those reported in (Galand et al., 1999) on Normal versus Schizophrenia classification. Use

of different data set might be one reason explaining these differences. It also can be argued

that feature importance visualization schemes reported here and in (Galand et al., 1999)

all use ad-hoc assumptions on what is the criteria of local shape differences.

8.4 Classification

We now present our results on classification of Corpus Callosum shapes. We examined both

the problem of distinguishing between males and female subjects as well as the problem

of distinguishing between normal and schizophrenia subjects. Since it is believed that no

significant difference in areas exist between male and female corpus callosum (Davatzikos

et al., 1996), our resulting shape based differences between these groups are especially

valuable.

We studied two types of classifiers. The first was a linear MMSE classifier, which

thresholds a linear combination of the feature values (see Section 2.9). The second classifier

was the learning-based technique ada-boost (Freund and Schapire, 1999), which creates a

classifier as a non-linear combination of a weighted sequence of “weak” classifiers. We

overview the classifiers used here in Section 2.9.

To design and evaluate both classifiers we used the cross-validation technique (Duda

et al., 2001) and repeatedly and randomly split the data (both categories combined) into

testing and training sets. We performed 300 such data splits, each time reserving 10% of

the data for the testing subset. Reported error values are the averages of these experiments

and for the results presented here do not exceed 0.02.

In order to build classifiers of shapes, we also need to intelligently reduce the dimension-

ality of the feature space. Often this is done using a technique such as principal component

analysis (PCA). However, PCA focuses on the commonalities of a group and not the dif-

ferences between groups, which are important for discrimination. We examined two direct,

statistically-based, approaches for choosing a reduced-order feature set yielding good class

Page 180: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

158

separation. We assume working on the training subset of the data in a particular cross

validation data split (as described later).

The first approach is based on the p-value associated with an individual-feature based

t-test of class difference. Assuming conditional Gaussianity and equal variances for the

observations corresponding to each class, the p-value gives the probability that the observed

data occurred by chance given that the observations actually come from the distributions

with equal means (identical probability distributions). A small p-value is the evidence

that the probability distributions of the feature observations are different between the two

classes. For a given number of features retained, we therefore select those features which

have the smallest single feature p-values. The inverse p-value is thus used as a measure of

feature importance.

The second approach is based on the average weight given to a normalized feature in

an optimal linear classifier. We start from the full set of 39 mean-and-variance-normalized

(as described in section 8.2.3) features. We then progressively eliminate one feature at a

time using the following scheme:

1. We repeatedly (N times) split the training data into sub-training and sub-testing

sets and for each split we compute a linear minimum mean-square error classifier

(MMSE), see Section 2.9. For a given sub-split i we obtain a vector of classifier

coefficients ai. Let the element aji correspond to the feature j.

2. We find the index m+ of the feature with the lowest average absolute coefficient

according to

m+ = argminm

N∑

i=0

|ami | (8.7)

where N is the total number of sub-splits.

3. We eliminate the feature m+ from the set of features. Retained features are used for

classifier training and testing for the current length of the feature set.

4. The current set of features is passed onto step 1 to eliminate the next feature. The

Page 181: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

159

process is repeated until the the set includes only one retained feature.

As the result, we have a sequence of retained feature sets of decreasing length.

We now consider the task of 2 category classification. We examine the performance of

different combinations of classifier and feature selection method. We consider two binary

classification cases: male/female and normal/schizophrenia. For each of these two cases

we use two feature selection techniques, as described above. For each feature selection

scheme and a given number of selected features we train 2 different classifiers: linear

MMSE classifier and Ada-boost classifier with misclassification penalizing criterion function

based classifier as the “weak” classifier. We report result after six Ada-boost iterations.

Therefore, for a given number of selected features we use 4 combinations of feature selection

method and classifier. In order to estimate the generalization performance we use cross-

validation technique by repeatedly randomly splitting the data into the training and testing

subsets. Testing subsets contains 10% of the data. We obtain the final testing error as an

average testing error over different cross-validation data splits. Our classification results are

summarized in Figure 8·11 for male/female classification (left) and normal/schizophrenia

classification (right). For the number of features larger then 10, the classifier testing errors

quickly approach 0.5 and are not shown here.

We would like to comment on the number of iterations chosen for Ada-boost algorithm.

We consider the male/female classification task using t-value feature selection, and run

the Ada-boost algorithm while evaluating the performance after each iteration. This is

done for different number of selected features. Resulting testing errors are visualized in

Figure 8·10. While the variation with the number of iterations is not perfectly consistent

for different number of features, one will agree that for six iteration the testing error is

nearly minimized for 10 or less features. Therefore, we chose to perform 6 Ada-boost

iterations in the experiment reported here.

For male/female classification, the linear classifier testing error obtained when using

only area is 0.45, which is also shown in Figure 8·11. We obtain approximately a 25%

improvement over only-area classifier by using well-chosen shape features. All the skeleton-

Page 182: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

160

feature based methods perform comparably and, perhaps surprisingly, in all cases the

optimal number of features to use is relatively low – between 3 and 6.

Few important observations are worth noting. First, let us compare the performances

of MMSE classifier and Ada-boost based classifier for the same feature selection method.

The MMSE classifier outperforms Ada-boost classifier for small number of features (<

5). We can explain it by the probable simplicity (unimodality) of the conditional feature

distributions for few best features. If the class conditional distributions are unimodal, linear

discriminant function can yield optimal classifier. For larger number of features (> 5), we

observe better performance given by Ada-boost classifier. We hypothesize that as the

number of features increases, MMSE classifier quickly loses its generalizability due to poor

distribution separation. In such a case, few Ada-boost iterations help to adjust decision

boundaries to refocus the discriminant function. Summarizing these results, Ada-boost

algorithm can provide slightly better performance for larger number of features. However,

the best overall performance can still be achieved using linear MMSE classifier and at the

much lower computational cost.

1 2 3 4 5 6 7 8 9 10

5

10

15

20

25

30

350.36

0.38

0.4

0.42

0.44

0.46

0.48

Figure 8·10: Male/female classification testing errors using t-test featureselection method. Testing error is shown coded by color as a function of thenumber of Ada-boost iteration (horizontal axis) and the number of featureschosen (vertical axis).

Page 183: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

161

0 2 4 6 8 100.3

0.35

0.4

0.45

0.5

# of features

test

ing

erro

r

Gender classification

0 2 4 6 8 100.3

0.35

0.4

0.45

0.5

# of features

Schizophrenia/normal classification

T−test; linearT−test; Ada BoostingWeights; LinearWeights; Ada BoostingArea only

Figure 8·11: Classification testing error for gender and schizophrenia ver-sus normal is shown for different combinations of feature selection technique,classification method, and number of features retained. “T-test; linear” - T-test method of feature selection, MMSE classifier; “T-test; Ada Boosting”- T-test method of feature selection, Ada Boosting classifier; “Weights; lin-ear” - linear weights feature selection method, MMSE classifier; “Weights;Ada Boosting” - linear weights feature selection method, Ada Boosting clas-sifier.

Page 184: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

162

Comparing both classification cases shows the connection with feature importance visu-

alization results presented previously. Let us first consider male/female classification case.

Only 2-3 angle features are most distinguishing in Figure 8·6. Results in Figure 8·11 show

the best classification performance achieved using 4 selected features. Adding more fea-

tures sharply increases the error. Considering normal/schizophrenia classification, a larger

number of width features are seen in Figure 8·7 to exhibit statistical differences. Corre-

spondingly, we achieve the best classification with 5-6 selected features. Moreover, the

testing error as a function of the number of features increases at the lower rate comparing

to the male/female classification case. Therefore, our classification results also suggest that

for the male/female classification case, the inter-class shape difference are more localized

comparing to the normal/schizophrenia case.

Comparison with (Galand et al., 1999) gives more insight into our result. In (Galand

et al., 1999) the lowest testing error was reported when 20 “features” where used. Here

“features” refers to a number of points at which the medial axis was sampled. However, it

was not stated that this representation is still over-complete and many of these features are

obsolete. The resulting error rates are very close to error rates obtained in our work. The

reason for this is that the SVM technique used in (Galand et al., 1999) enables concentration

on the important discriminative features. In (Galand et al., 1999) a better result was

not obtained for a smaller number of “features” (sampling of the medial axis) apparently

because if a coarser sampling was used, the important details of the medial axis are not

represented in the feature vector. We argue that we are able to achieve similar error

rates with MMSE classifier, rather then SVM classifier, using appropriate scheme to select

important features.

Finally, we present a byproduct of the classification experiments, we call feature selec-

tion statistics. At each cross-validation step, a set of selected features is defined for any

given number of features. Retained features can vary between cross-validation steps. For

different numbers of retained features and for each feature, we compute the probability

that a given feature is selected. This probability is computed by counting how many times

Page 185: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

163

# of features selected

feat

ure

inde

x

log(p)

5 10 15 20 25 30 35

5

10

15

20

25

30

35

−8

−7

−6

−5

−4

−3

−2

−1

Figure 8·12: Feature selection normalized probability. Male/female clas-sification with p-value feature selection. Log of the normalized probabilitythat a given feature is chosen in the set of N features. Horizontal axis: N- number of selected features; Vertical axis - feature index (1 through 37).

a given feature is selected over different cross-validation steps. The log of normalized prob-

ability so computed is shown in Figure 8·12 for male/female classification case and p-value

feature selection method. Relative feature importance can be judged by locating the most

often selected features (left part of the graph) and the least often selected features (right

part of the graph). Feature selection statistic presented here is of course closely related

to the feature importance visualization schemes presented in Section 8.3, since methods 1

and 2 use the ranking scores serving as feature selection criteria. The difference is that

in the methods 1 and 2 (Section 8.3) the whole dataset is used to compute the feature

ranking score, while feature selection in our classification experiments is carried out using

the training data subsets. One can see that feature selection probabilities in Figure 8·12

are closely correlated with feature importance visualization in Figure 8·6. Feature selection

statistic computed here is the generalization of the method 3 in Section 8.3. In fact, feature

selection probability can be computed for different number of selected features as well as

for different feature selection criteria.

Page 186: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

164

8.5 Summary

We have presented a shape-based approach to classification and intra-class analysis using

skeleton shape representation. We used a robust variational approach to find a set of

anatomical skeleton features. We present three approaches to highlight intra-class shape

differences in the original image space based on our importance measures and skeleton

representation. In addition we approach the problem of inter-class discrimination of Corpus

Callosum shapes. To this end, two methods are presented and used to reduce the dimension

of the feature space. We design both linear MMSE and ada-boost-based classifiers and

compare the performance of these classifiers on reduced feature sets.

Page 187: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

165

Chapter 9

Conclusions and future research

This dissertation contributes in three directions of research in object-based image analysis.

First, we work on novel shape and appearance modeling approaches. Second, we incorpo-

rate shape and appearance models constructed into boundary extraction tasks. Third, we

develop tools to study morphological differences of shapes.

In the shape modeling thrust of our research, we focus on shape representations by a

closed contour. We use a curve evolution framework in the boundary extraction task and

therefore seek a shape modeling approach that can be implemented using curve evolution.

Our goal is to construct an alternative shape model for the well known generic prior meth-

ods and deformable template based shape models. More specifically, we investigate ways

to construct and use a shape model that does not constrain a shape as a template, yet

contains information on the presence of certain characteristic features of the shape. We

also work on an appearance model that can be superior in situations posing problems for

state-of-the-art appearance modeling approaches.

In our first major contribution, we consider the maximum entropy shape model that

has the desirable property of reflecting the perceptual shape similarity. We are able to con-

siderably reduce the computational cost of constructing the model. We further propose a

method to use this model in a curve evolution framework for shape inferencing. Our results

show the advantages achieved by using such a model compared with current approaches in

image segmentation problems.

In the second major contribution of this thesis we propose a shape distribution-based

approach to model shapes. We propose a framework allowing the use of this type of shape

prior in curve evolution and derive curve evolution equations corresponding to our prior.

Page 188: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

166

We demonstrate the properties of our model on shape morphing, shape interpolation,

image segmentation, and image segmentation with occlusion experiments. We extend our

shape distribution based approach to model inter-relationships between different image

structures, thereby achieving segmentation performance improvements in situation when

single object shape prior alone is not sufficient. We also propose a strategy to use our

approach to model 3D objects.

In the third major contribution we propose a joint shape and appearance modeling

framework, extending our shape distribution prior concept. Our new appearance model en-

codes the intensity and image-boundary relationship through distributions of intensity de-

pendent features sampled along trajectories parallel to the boundary. Our framework pro-

vides good segmentations in very challenging situations, when region-based and boundary-

based appearance models have difficulties. Such situations arise when object boundaries do

not correspond to strong edges in the image and when region statistics inside and outside

of the segmenting boundary are similar. Our model describes the image/boundary features

along the boundary and generalizes with respect to the positions of these characteristic fea-

tures along the boundary. This property allows our model to account for such large shape

variations using small training data sets, which can be beneficial for some applications.

In the fourth major contribution, we develop tools for morphological analysis of corpus

callosum shape differences. Specifically, we investigate the localizability of significant inter-

class shape differences and the possibility to automatically classify shapes as male/female

or normal/schizophrenia patients. We use skeleton based shape descriptors suitable for

corpus callosum shapes. We construct feature ranking metrics that allow for intuitive shape

differences visualization and for feature set dimensionality reduction. We test different

classifiers on the reduced feature set and obtain classification performances similar to those

reported in the literature.

Page 189: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

167

9.1 Future research

The growing number of imaging technologies are able to acquire volumetric data. Process-

ing such volumetric data jointly should give considerable performance gains compared to

processing single 2D slices separately. Unfortunately, 3D extension is non-trivial for many

approaches. In this thesis we only briefly consider the 3D extension of our shape modeling

approach. Our single object shape distribution based framework can be extended to 3D,

as described in Chapter 4, but only one shape feature function definition considered in this

work can be formulated in 3D. Multi-scale curvature feature function can not be easily

extended to 3D because the distance between two points on the surface can not be defined

uniquely. The descriptive power of our model is limited when using only one feature func-

tion. To make matters even worse, a 3D object has more degrees of freedom and requires

more feature functions to be constrained properly. For these reasons, we were unable to

achieve significant results in 3D at this time. New feature functions should be developed to

construct an effective 3D extension of our shape distribution-based model. Another aspect

of the 3D shape distribution prior implementation is the computational cost of computing

the distributions and associated flows. We showed that for feature function #1, the cost

of computing the surface flow is O(A2), where A is the area of the surface. This challenge

can be addressed by using sampled distributions and feature flows. The 3D extension of

the multi-object framework is straightforward and computationally efficient (the surface

flow can be computed in O(A) operations, where A is the area of the surface). The ap-

pearance feature distribution model can also be extended to 3D with the same surface flow

computational cost per one feature function.

Another possible extension of our framework is to combine multiple object and inten-

sity feature distribution priors into a more general formulation including feature functions

defined on multiple boundaries and image intensities.

A potential application of our prior is the object description and predictive coding in

advanced video coding frameworks, such as MPEG4, where object description capabilities

Page 190: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

168

remain largely unused.

In certain imaging applications, there exists periodicity in observed shapes and intensity

patterns. For instance, in spine segmentation, one can exploit such periodicity to achieve

reliable spinal cord extraction and vertebra position detection. In tracking applications,

such periodicity can be encountered in road traffic surveyance, industrial machine vision,

etc. We suggest the potential of extending current shape models to incorporate such

periodicity. For instance, one can simultaneously segment several periodic structures with

constraints on shape differences between those structures. This goal can be pursued using

our shape modeling approach. One possibility is to use the same target shape distribution

for all pairs of adjacent structures. A multi-scale approach can be used to encode the

consistency between the pairs of distant (not adjacent) contours.

Page 191: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

169

Appendix A

Variational solution for the curve flow minimizing

shape distribution based prior energy

Notation:

E(Γ) - energy functional to be minimized

s, s1, s2, p - curve arc-length parameterizations normalized to the interval [0,1]

λ - variable spanning the range of the feature values

Γ - curve

~Γ(s) = x(s), y(s) - curve coordinates

~Γ(s1, s2) = ~Γ(s1) − ~Γ(s2) - vector x(s1) − x(s2), y(s1) − y(s2)

β(s) ∈ R1 - continuous differentiable function used to perturb the curve in the normal

direction

~n(s) - normal vector at the boundary location s

β(s) - deformation function (scalar) at the boundary location s

~β(s) = β(s)~n(s) - deformation vector at the boundary location s

H(Γ, λ) - cumulative distribution function defined on the curve Γ

H∗(λ) - prior cumulative distribution function

h(x) = h(x > 0) - indicator function (h is equal to 1 if x > 0 and 0 otherwise)

G(E) - Gateaux semi-derivative of the functional E

In this section we derive the gradient curve flow corresponding to the following energy

functional:

Page 192: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

170

E =

[H(Γ, λ) − H∗(λ)]2 dλ (A.1)

Steps to compute the curve flow minimizing the energy are:

1. Compute Gateaux semi-derivative of the energy E with respect to a small curve

perturbation β. Using chain rule

G(E, β) =

G[

(H(Γ, λ) − H∗(λ))2]

dλ = (A.2)

2

[H(Γ, λ) − H∗(λ)]G [H(Γ, λ), β] dλ (A.3)

2. If the Gateaux semi-derivative of a linear functional f exists, due the Rietz represen-

tation theorem

G(f, β) =< ∇f, β > (A.4)

were ∇f is the gradient flow. We use f = H and find ∇(H(Γ, λ)).

3. The flow minimizing the original functional E can be found as

∇E = 2

[H(Γ, λ) − H∗(λ)]∇(H(Γ, λ))dλ (A.5)

Therefore, in the following discussion, we concentrate on step 2: finding the Gateaux

semi-derivative G [H(Γ, λ), β] and the corresponding flow ∇(H(Γ, λ))

A.1 Inter-point distance function

Additional definitions:

d(~Γ(s1), ~Γ(s2)) - Euclidean distance between points s1 and s2 on the curve

For this feature class, the cumulative distribution function for a curve Γ is defined as:

H(Γ, λ) =

1∫

0

1∫

0

h(

d(~Γ(s1), ~Γ(s2)) > λ)

ds1ds2 (A.6)

Page 193: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

171

where h(x) is the indicator function.

Assuming an arbitrary perturbation function β and a small scalar ε, we apply the

perturbation εβ(s) to the curve Γ, resulting in the deformed curve

~Γ′(s) = ~Γ(s) + εβ(s)~n(s) (A.7)

(s ,s )Γ 1 2

’(s ,s )1 2Γ

’(s ,s )1 2Γ (s ,s )Γ 1 2

2n(s )

1n(s )

s

s

1

2

εβ( )

εβ( )

s

s

1

2

_

Figure A·1: Inter-point distance augmentation due to curve deformation

From Figure A.1 it can be seen that for a small ε the distance between 2 arbitrary

points s1 and s2 on the perturbed curve is

d(~Γ′(s1), ~Γ′(s2)) = d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1, s2)

|~Γ(s1, s2)|(A.8)

It means that the distance between 2 points on the curve is augmented by the projection

of the difference between two deformation vectors on the normalized vector ~Γ(s1, s2).

Page 194: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

172

We can now write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition

G [H(Γ, λ), β] = limε→0

H(Γ′, λ) − H(Γ, λ)

ε=

limε→0

1∫

0

1∫

0

[

h(

d(~Γ′(s1), ~Γ′(s2)) > λ)

− h(

d(~Γ(s1), ~Γ(s2)) > λ)]

ds1ds2

ε=

limε→0

1∫

0

1∫

0

[

h(

d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1,s2)

|~Γ(s1,s2)|> λ

)

ε

− h(

d(~Γ(s1), ~Γ(s2)) > λ)]

ds1ds2(A.9)

In the following derivations we introduce the new simplified notation:

d = d(~Γ(s1), ~Γ(s2))

dβ = [β(s1)~n(s1) − β(s2)~n(s2)]

dΓ =~Γ(s1,s2)

|~Γ(s1,s2)|

nx(s), ny(s) = ~n(s) - the components of the normal vector at s

Given the new notation we rewrite

G [H(Γ, λ), β] = limε→0

1∫

0

1∫

0

[h (d − λ + εdβ · dΓ) − h (d − λ)] ds1ds2

ε(A.10)

First, we use the differentiable approximation of the indicator function:

h(x) −→ φα(x) =atan

(xα

)+ 1

2(A.11)

where the parameter α defines the degree of approximation. In the limiting case of α = 0

the approximation becomes an equality. We perform the following derivations using the

approximation and at the last step consider the limiting case of α = 0.

The Gateaux semi-derivative becomes:

G [H(Γ, λ), β] = limε→0

12

1∫

0

1∫

0

atan(

λ−dα

)+ atan

(εdβ·dΓ+d−λ

α

)

ε(A.12)

Page 195: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

173

For small ε, using the Taylor expansion we obtain

atan

(εdβ · dΓ + d − λ

α

)

= atan

(d − λ

α

)

+ atan′

(d − λ

α

)(εdβ · dΓ

α

)

+ O(ε2) (A.13)

Now we can find the approximation of the Gateaux semi-derivative using the first 2

terms of the Taylor expansion as follows

G [H(Γ, λ), β] = limε→0

12

1∫

0

1∫

0

atan′(

d−λα

) (εdβ·dΓ

α

)

ε=

1

1∫

0

1∫

0

atan′

(d − λ

α

)

(dβ · dΓ) ds1ds2 =1

1∫

0

1∫

0

ds1ds2

1 +(

d−λα

)2 (dβ · dΓ) =

1

1∫

0

1∫

0

ds1ds2

1 +(

d−λα

)2 [(nx(s1)β(s1) − nx(s2)β(s2))(x(s1) − x(s2))+

+(ny(s1)β(s1) − ny(s2)β(s2))(y(s1) − y(s2))] =

1

1∫

0

1∫

0

nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))

1 +(

d−λα

)2 ds2

β(s1)ds1 −

1

1∫

0

1∫

0

nx(s2)(x(s1) − x(s2)) + ny(s2)(y(s1) − y(s2))

1 +(

d−λα

)2 ds1

β(s2)ds2 (A.14)

By changing the order of integration in the second integral we obtain

G [H(Γ, λ), β] =

1

α

1∫

0

1∫

0

nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))

1 +(

d−λα

)2 ds2

β(s1)ds1 (A.15)

According to eq. A.4, the expression in square brackets is the gradient flow

∇H(Γ)(s) =1

α

1∫

0

nx(s)(x(s) − x(t)) + ny(s)(y(s) − y(t))

1 +(

d−λα

)2 dt =

1

α

1∫

0

n(s) · dΓ(s, t)

1 +(

d−λα

)2 dt (A.16)

Page 196: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

174

Gradient flow minimizing the energy in eq. A.1 is therefore

∇E(Γ)(s) = 2

λ

dλ [H∗(λ) − H(Γ, λ)]

1

α

1∫

0

n(s) · dΓ(s, t)

1 +(

d−λα

)2 dt

=

2

λ

dλ [H∗(λ) − H(Γ, λ)]

1∫

0

αn(s) · dΓ(s, t)

α2 + (d − λ)2dt

(A.17)

For α ≈ 0, the expression under the integral is non-zero only when d = λ. By changing

the order of integration we obtain:

∇E(Γ)(s) = 2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|dt

λ

[H∗(λ) − H(Γ, λ)]α

α2 + (d − λ)2dλ =

2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|

[

H∗(|~Γ(s, t)|) − H(Γ, |~Γ(s, t)|)]

dt (A.18)

The obtained expression is simple and has an intuitive interpretation. The flow at each

point s on the curve is computed as the integral along the curve. For each point t on the

curve the expression under the integral is the projection of the normal vector at s onto the

vector pointing from s to t, scaled by the difference between the current and the target

distributions evaluated at the distance between s and t. This projection is intuitively the

projection of the “force” acting on the feature (link between s and t). In discrete case,

the computational complexity of the flow computation is O(N 2) were N is the number of

nodes of the discretized curve.

A.2 Boundary curvature feature function

Now we consider the local boundary curvature as the feature function and attempt to find

the corresponding curve flow. The curvature at the boundary location s can be defined as

κ(s) = ~dn(s) ·~Γ′

|Γ′|(A.19)

Page 197: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

175

where ~dn(s) is the infinitely small rate of change (vector) of the normal, ~Γ′ is the local

tangent vector. The cumulative distribution function for this feature function can be

written as

H(Γ, λ) =

s

h

[

~dn(s) ·~Γ′

|Γ′|> λ

]

ds (A.20)

After the small curve perturbation by εβ(s), the new tangent vector and the rate of

change of the normal are:

~dn(s) −→ ~dn(s) + εβ′′(s)~Γ′

|Γ′|(A.21)

~Γ′ −→ ~Γ′ + εβ′(s) (A.22)

The Gateaux semi-derivative is

G(H(λ), β) =

limε→0

∫ds[

h(

( ~dn(s) + εβ′′(s)~Γ′

|Γ′|)(~Γ′+εβ′(s))

|~Γ′+εβ′(s)|> λ

)

− h(

~dn(s) ·~Γ′

|Γ′| > λ)]

ε(A.23)

Assuming |εβ′(s)| < |~Γ′|

G(H(λ), β) = limε→0

∫ds [h((κ(s) + εβ′′(s)) > λ) − h(κ(s) > λ)]

ε= (A.24)

limε→0

2∫

atan(

λ−κ(s)α

)

+ atan(

εβ′′(s)+κ(s)−λα

)

ε= (A.25)

1

α

atan′

(κ(s) − λ

α

)

β′′(s)ds =

∫α

α2 + (κ(s) − λ)2β′′(s)ds (A.26)

Integrating by parts twice we get

Page 198: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

176

G(H(λ), β) = (A.27)

ds(κ(s)κ′(s) − λκ′(s))2(α2 + (κ(s) − λ)2)2(κ(s) − λ)κ′(s)−

(α2 + (κ(s) − λ)2)4(A.28)

−(α2 + (κ(s) − λ)2)2(κ′(s)2 + κ(s)κ′′(s) − λκ′′(s)β(s) (A.29)

Resulting ∇(H(Γ))(s) is

∇(H(Γ))(s) = (A.30)

2α(κ(s)κ′(s) − λκ′(s))2(α2 + (κ(s) − λ)2)2(κ(s) − λ)κ′(s)−

(α2 + (κ(s) − λ)2)4(A.31)

−(α2 + (κ(s) − λ)2)2(κ′(s)2 + κ(s)κ′′(s) − λκ′′(s)(A.32)

At this point we obtain the solution that contains derivatives of curvature of the second

order. We expect the resulting flow to be very sensitive to noise and numerically unstable.

Therefore, we do not use boundary based curvature in our experiments.

A.3 Multiscale curvatures

A.3.1 Computation of feature function

For this feature function, the values are computed as “support” angles α, see Figure A·2,

defined for 3 boundary locations (“base” point s1 and symmetrical “side” points s1 − s2

and s1 + s2). The cumulative distribution function is given by

H(Γ, λ) =

1∫

0

1∫

0

h(α(s1 − s2, s1, s1 + s2) > λ)ds1ds2 (A.33)

We choose to define the “inner” angle α as shown on figure A·2 as the angle between

vectors ~dΓ(s1, s1−s2) and ~dΓ(s1, s1 +s2) measured always in the “same” half-space, which

means that the half-space can not be flipped when the angle corosses the π/2 threshold.

This half-space is fixed to be the inside of the curve for s2 = 0. In other words, the angle

α must be a continuous function of s2, and α(s2=0) = π.

Page 199: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

177

s

(s −s )Γ 1 2

n(s −s )21

s −s21

α

b

a

γ

1

(s +s )Γ 1 2

1n(s +s )2

c

β

1n(s )

1s +s2

Figure A·2: Illustration of feature value computation for feature function#2

Figure A·3 illustrates four cases of relative locations of points on the curve. It can be

seen that in general case, unambiguous determination of the angle from rays ~Γ(s1, s1 − s2)

and ~Γ(s1, s1 + s2) is impossible from just a combination of points without any further

assumptions. We therefore determine the angle α sequentially for s2 increasing from 0

to 1 and detect the instances when the angle crosses the π/2 threshold. This operation

eliminates the ambiguity.

Figure A·3: Four cases of the relative positions of three curve points. Thesupport angle can not be determined unambiguously.

Let us define a flag function r(s1, s2). For each “base” point s1 the angle α is computed

Page 200: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

178

sequentially for s2 increasing from 0 to 1. This process is illustrated in Figure A·4. For each

1

2

3

4

1

2

3

4

r=1r=1

r=1

r=−1

Figure A·4: Sequential computation of the angles for a particular “base”point s1, starting from r = 1 (assuming inside of the curve is up-wards).

base point s1, we start from s2 = 0; r(s1, s2) = sign(κ(s1)); α(s1, s2) = π. Flag function

r(s1, s2) for s2 > 0 is defined as

r(s1, s2) =

1 if α(s1, s2) <= π

−1 otherwise(A.34)

We define the mean direction vector as

~Γm(s1 − s2, s1 + s2) =~Γ(s1, s1 − s2)

|~Γ(s1, s1 − s2)|+

~Γ(s1, s1 + s2)

|~Γ(s1, s1 + s2)|(A.35)

In the process of computing α(s1, s2), as s2 increases by ds, we capture the change of

orientation of the mean direction vector by looking at the scalar product

C = ~Γm(s1 − s2, s1 + s2) · ~Γm(s1 − s2 − ds, s1 + s2 + ds) (A.36)

If C becomes less then zero for some s2 = l we change the sign of r(s1, l) for consecutive

values of s2 > l. The angle between rays ∠(~Γ(s1, s1−s2), ~Γ(s1, s1+s2)) ∈ [0, π] is measured

as the inverse cosine. After this sequential computation is performed for all s1 we end

up with 2D functions ∠(s1, s2) and r(s1, s2). The latest is used to correct the values of

∠(~Γ(s1, s1 − s2), ~Γ(s1, s1 + s2)) that must be greater then π. Finally, the feature values

Page 201: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

179

α(s1, s2) are given by

α(s1, s2) = (2π)1 − r(s1, s2)

2+ r(s1, s2)acos

(~dΓ(s1, s1 − s2)

| ~dΓ(s1, s1 − s2)|·

~dΓ(s1, s1 + s2)

| ~dΓ(s1, s1 + s2)|

)

(A.37)

A.3.2 Curve flow computation

In order to find the flow minimizing the energy in eq. 4.29, we perform essentially the same

procedure as for the feature function #1. We refer the reader to Section A.1 for omitted

details.

First, we perturb the curve in the normal direction by εβ(s). It is important that all

3 points (“base” point and 2 “side” points) on the curve change their positions. Let α′

be the angle after the perturbation. Using the continuous approximation of the indicator

function, we obtain

G(H(λ), β) = limε→0

∫ ∫ds1ds2 [h((α′) > λ) − h((α) > λ)]

ε=

limε→0

∫ ∫ds1ds2 [φγ(α′ − λ) − φγ(α − λ)]

ε=

limε→0

12

∫ ∫ds1ds2

[

atan(α′−λγ

) − atan(α−λγ

)]

ε=

limε→0

12

∫ ∫ds1ds2atan

′(α−λγ

)(α′ − α)

ε= lim

ε→0

12

∫ ∫ds1ds2

γγ2+(α−λ)2

(α′ − α)

ε(A.38)

We must therefore compute the angle increment α′ − α resulting from the perturba-

tion. This increment consists of 3 terms (additive due to the assumption of the small

perturbation):

α′ − α = dα(1) + dα(2) + dα(3) (A.39)

where the first 2 terms result from the displacement of the “side” points and the third term

results from the displacement of the “base” point Γ(s1).

We first consider dα(1) (dα(2) is determined similarly). The local geometry of the curve

perturbation at the point s1 + s2 is shown in detail in Figure A·5. It is easy to see that

Page 202: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

180

n

p

Γd ’

θ

εβ( )s

Figure A·5: Local perturbation of the curve at point ~Γ(s1 + s2). Pertur-bation εβ(s1 + s2) is infinitely small comparing to |~Γ(s1, s1 + s2)|

the increment dα(1) can be found as follows

p

εβ(s)= sin(Θ)

p = εβ(s)√

1 − cos2 Θ

dα(1) =p

|dΓ|=

εβ(s)

1 −(

~n(s) · dΓ|dΓ|

)2

|dΓ|(A.40)

1

2

L

P

P

s1

Γm

Figure A·6: Illustration of 2 cases when the sign of the angle incrementdα(1) is different for the same curve perturbation εβ(s).

It is important to recognize that the sign of the above angle increment depends on the

relative direction of the normal ~n(s1 + s2) and ~Γm(s1, s1 + s2). Two possible cases are

illustrated in Figure A·6. In case 1, for a positive β(s1 + s2), the increment is positive and

Page 203: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

181

for case 2, it is negative. We define a flag function f(s1 + s2) as follows

f(s1 + s2) =

−1 if points L and P are on the same side of Γ(s1, s1 + s2)

1 otherwise(A.41)

Under this definition,

dα(1) =εβ(s1 + s2)

1 −(

~n(s1 + s2) ·~Γ(s1+s2)

|~Γ(s1+s2)|

)2

|~Γ(s1 + s2)|f(s1 + s2) (A.42)

dα(2) =εβ(s1 − s2)

1 −(

~n(s1 − s2) ·~Γ(s1−s2)

|~Γ(s1−s2)|

)2

|~Γ(s1 − s2)|f(s1 − s2) (A.43)

We now proceed to calculate dα(3). Let us assume that the angles α′ and α are computed

as inverse cosines. In such case, for α > π, the sign of the angle increment must be changed

as follows

dα(3) = (α′ − α)r(s1, s2) (A.44)

were α′ is the angle resulting after displacing the point s1. Using the cosine theorem and the

abbreviations from Figure A·2, we can write the angle increment due to the displacement

of the point s1 as

α′ − α = acos

(a2 − b′2 − c′2

−2b′c′

)

− acos

(a2 − b2 − c2

−2bc

)

=

acos

a2 −(

b − n(s1) ·Γ−|Γ−|β(s1)ε

)2−(

c − n(s1) ·Γ+|Γ+|β(s1)ε

)2

−2(

b − n(s1) ·Γ−|Γ−|β(s1)ε

)(

c − n(s1) ·Γ+|Γ+|β(s1)ε

)

−acos

(a2 − b2 − c2

−2bc

)

=

acos

a2 − b2 − c2 + 2β(s1)ε

(

n(s1) ·Γ−|Γ−|b + n(s1) ·

Γ+|Γ+|c

)

−2bc + 2β(s1)ε(

n(s1) ·Γ−|Γ−|c + n(s1) ·

Γ+|Γ+|b

)

−acos

(a2 − b2 − c2

−2bc

)

(A.45)

were ()′ indicates the quantity after displacing the point s1; Γ+ = ~Γ(s1, s1 + s2) and

Page 204: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

182

Γ− = ~Γ(s1, s1 − s2). Using Tailor expansion

m + ε1

n + ε2=

m

n+

ε1n

+ε2m

n2(A.46)

we obtain

α′ − α = acos

a2 − b2 − c2

−2bc+

β(s1)ε~n(s1) ·(

Γ−|Γ−|b + Γ+

|Γ+|c)

−bc−

−β(s1)ε~n(s1) ·

(Γ−|Γ−|c + Γ+

|Γ+|b)

(a2 − b2 − c2)

2b2c2

− acos

(a2 − b2 − c2

−2bc

)

(A.47)

Using Tailor expansion of the cosine function we obtain

α′ − α =

acos′(

a2 − b2 − c2

−2bc

)

εβ(s1) ×

~n(s1) ·

(

−Γ−

c|Γ − |−

Γ+

b|Γ + |−

a2 − b2 − c2

2b2c2

(Γ−

|Γ − |c +

Γ+

|Γ + |b

))

=

−1

cos(α)

εβ(s1)~n(s1)

2bc·

(Γ+

|Γ + |

a2 + c2 − b2

c+

Γ−

|Γ − |

a2 + b2 − c2

b

)

=

−1

sin(α)

εβ(s1)a

bc[cos(β) cos(n(s1), Γ+) + cos(γ) cos(n(s1), Γ−)]︸ ︷︷ ︸

K

(A.48)

We finally obtain the expression for dα(3). We only need to resolve the uncertainty

arising when α ≈ π. In such case, we assume α = π − δ, β = O(δ), γ = O(δ). Using Tailor

expansion we can write the square bracket as

K =

(1 − 1

2β2)cos(α − ∠(n(s1), Γ−)) +

(1 − 1

2γ2)cos(∠(n(s1), Γ−))

δ=

− cos(δ + ∠(n(s1), Γ−)) + cos(∠(n(s1), Γ−))

δ=

−cos(∠(n(s1), Γ−)) − sin(∠(n(s1), Γ−))δ − cos(∠(n(s1), Γ−))

δ=

sin(~n(s1), ~Γ(s1, s1 − s2)) (A.49)

Page 205: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

183

Finally,

dα(3) =

−εβ(s1)a

bc

cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α

when α 6= π

sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise

r(s1, s2) (A.50)

Combining eq. A.38 and eq. A.39, the Gateaux semi-derivative can be written as

G(H(λ), β) =

1∫

0

1∫

0

ds1ds2γ

γ2 + (α − λ)2

β(s1 + s2)

1 −(

~n(s1 + s2) ·~Γ(s1+s2)

|~Γ(s1+s2)|

)2

|~Γ(s1 + s2)|f(s1 + s2)+

β(s1 − s2)

1 −(

~n(s1 − s2) ·~Γ(s1−s2)

|~Γ(s1−s2)|

)2

|~Γ(s1 − s2)|f(s1 − s2) −

−β(s1)a

bc

cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α

when α 6= π

sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise

r(s1, s2)

(A.51)

Using Ritz representation theorem and changing variables of integration we obtain the

gradient flow minimizing H(Γ, λ)

∇H(Γ, λ)(s) =1∫

0

−γdt

γ2 + (α(s) − λ)2a

bc×

cos(β) cos(n(s1),Γ+)+cos(γ) cos(n(s1),Γ−)sin α

when α 6= π

sin(~n(s1), ~Γ(s1, s1 − s2)) otherwise

r(s1, s2) +

1∫

0

γdt

γ2 + (α(s − t) − λ)2

1 −(

~n(s − t) ·~Γ(s−t)

|~Γ(s−t)|

)2

|~Γ(s − t)|f(s − t) +

1∫

0

γdt

γ2 + (α(s + t) − λ)2

1 −(

~n(s + t) ·~Γ(s+t)

|~Γ(s+t)|

)2

|~Γ(s + t)|f(s + t) (A.52)

Page 206: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

184

Recall that the gradient of the energy is

∇E(Γ)(s) =

dλ[H∗(λ) − H(Γ, λ)]∇H(Γ, λ)(s) (A.53)

Using eq. A.52, changing the order of integration, and considering the limiting case of small

γ, we obtain

∇E(Γ)(s) =

t∈S

cos(β) cos(n(s),Γ+)+cos(γ) cos(n(s),Γ−)sin α

if α 6= π

sin(~n(s), ~Γ−)) otherwise

×

a · r(s,t)

bc−

f (s−t)

1 −(

~n(s − t) ·~Γ−

|~Γ−|

)2

|~Γ−|−f (s+t)

1 −(

~n(s + t) ·~Γ+

|~Γ+|

)2

|~Γ+|

×

[

H∗(α(s, t)) − H(Γ, α(s, t))

]

dt (A.54)

where r(s,t) and f (s+t) take values -1 and 1 and indicate the sign of change of the angle

α(s, t) = ∠(~Γ−, ~Γ+) with respect to along-the-normal perturbation of the point Γ(s) and

Γ(s + t) respectively, ~Γ+ = ~Γ(s, s + t); ~Γ− = ~Γ(s, s− t); a = |~Γ+ − ~Γ−|; b = |Γ−|; c = |Γ+|;

β = ∠(−~Γ+, ~Γ− − ~Γ+); γ = ∠(−~Γ−, ~Γ+ − ~Γ−).

We finally obtain the closed form solution for the flow minimizing the energy corre-

sponding to feature function #2. The computational complexity for the discrete curve

with N nodes is O(N2).

A.4 Feature classes with weighting function

In this section we consider the energy with non-uniform weighting of the distribution dif-

ference across the feature value range. Using the notation ω(λ) for the weighting function,

Page 207: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

185

we focus on the the energy given by

E(Γ) =

w(λ)[

H∗(λ) − H(Γ, λ)]2

dλ (A.55)

A.4.1 Inter-point distance feature function

We will present the derivation of the minimizing flow for the inter-point distances feature

function. Results for other feature functions can be obtained similarly.

Definitions:

d(~Γ(s1), ~Γ(s2)) - Euclidean distance between points s1 and s2 on the curve

The weighting function ω(λ) must be normalized as follows:

1∫

o

1∫

o

w(

d(~Γ(s1), ~Γ(s2)))

ds1ds2 = 1 (A.56)

Therefore we use the renormalized weighting function:

w(d) =w(d)

1∫

o

1∫

o

w(

d(~Γ(s1), ~Γ(s2)))

ds1ds2

(A.57)

The normalization must take place at every iteration in order to insure that distribution

sums up to 1. We assume that w(d) is differentiable.

For this definition of the feature function

H(Γ, λ) =

1∫

o

1∫

o

w(

d(~Γ(s1), ~Γ(s2)))

h(

d(~Γ(s1), ~Γ(s2)) > λ)

ds1ds2 (A.58)

Now we apply the deformation flow ε~β(s) to the curve Γ, yielding deformed curve

Γ′ = ~Γ(s) + ε~β(s) (A.59)

Page 208: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

186

One can see that for a small ε, the distance between 2 points on the perturbed curve is

d(~Γ′(s1), ~Γ′(s2)) = d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1, s2)

|~Γ(s1, s2)|(A.60)

It means that the distance between 2 points on the curve is augmented by the projection

of the difference between two deformation vectors on the vector ~Γ(s1, s2).

Now we can write a Gateaux semi-derivative G [H(Γ, λ), β] according to its definition

and using the previous result

G [H(Γ, λ), β] = limε→0

H(Γ′, λ) − H(Γ, λ)

ε=

limε→0

1∫

0

1∫

0

[

w(d′)h(

d(~Γ′(s1), ~Γ′(s2)) > λ)

− w(d)h(

d(~Γ(s1), ~Γ(s2)) > λ)]

ds1ds2

ε=

limε→0

1∫

0

1∫

0

[

w(d′)h(

d(~Γ(s1), ~Γ(s2)) + ε [β(s1)~n(s1) − β(s2)~n(s2)] ·~Γ(s1,s2)

|~Γ(s1,s2)|> λ

)

ε

− w(d)h(

d(~Γ(s1), ~Γ(s2)) > λ)]

ds1ds2

1(A.61)

In the following, we simplify the notation:

d = d(~Γ(s1), ~Γ(s2))

dβ = [β(s1)~n(s1) − β(s2)~n(s2)]

dΓ =~Γ(s1,s2)

|~Γ(s1,s2)|

Now we can rewrite

G [H(Γ, λ), β] = limε→0

1∫

0

1∫

0

[w(d′)h (d − λ + εdβ · dΓ) − w(d)h (d − λ)] ds1ds2

ε(A.62)

In order to find the limit, we use the smooth approximation of the indicator function:

h(x) −→ φα(x) =atan

(xα

)+ 1

2(A.63)

Page 209: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

187

The Gateaux semi-derivative becomes:

G [H(Γ, λ), β] =

limε→0

12

1∫

0

1∫

0

w(d)[atan

(λ−dα

)+ 1]+ w(d + εdβ · dΓ)

[

atan(

εdβ·dΓ+d−λα

)

+ 1]

ε(A.64)

Considering a small ε we use the first term of Taylor series.

atan

(εdβ · dΓ

|dΓ| + d − λ

α

)

= atan

(d − λ

α

)

+ atan′

(d − λ

α

)(

εdβ ·dΓ

α|dΓ|

)

+ ... (A.65)

w(d + εdβ · dΓ) = w(d) + w′(d)εdβ · dΓ (A.66)

Now we can finally find the Gateaux semi-derivative

G [H(Γ, λ), β] = limε→0

12

1∫

0

1∫

0

w(d)atan′(

d−λα

) (εdβ·dΓ

α

)

+ w′(d)[atan

(d−λα

)+ 1]εdβ · dΓ

ε=

1

1∫

0

1∫

0

w(d)atan′

(d − λ

α

)

(dβ · dΓ) ds1ds2 +

1

2

1∫

0

1∫

0

w′(d)

[

atan

(d − λ

α

)

+ 1

]

(dβ · dΓ)ds1ds2 =

1

1∫

0

1∫

0

w(d)ds1ds2

1 +(

d−λα

)2 (dβ · dΓ)

︸ ︷︷ ︸

I

+1

2

1∫

0

1∫

0

w′(d)

[

atan

(d − λ

α

)

+ 1

]

(dβ · dΓ)ds1ds2

︸ ︷︷ ︸

II

Part I:

Page 210: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

188

G1 =1

1∫

0

1∫

0

w(d)ds1ds2

1 +(

d−λα

)2 [(nx(s1)β(s1) − nx(s2)β(s2))(x(s1) − x(s2))+

+(ny(s1)β(s1) − ny(s2)β(s2))(y(s1) − y(s2))] =

1

1∫

0

1∫

0

nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))

1 +(

d−λα

)2 ds2

β(s1)w(d)ds1 −

1

1∫

0

1∫

0

nx(s2)(x(s1) − x(s2)) + ny(s2)(y(s1) − y(s2))

1 +(

d−λα

)2 ds1

β(s2)w(d)ds2 (A.67)

By changing the order of integration in the second integral we obtain

G1 [H(Γ, λ), β] =

1

α

1∫

0

1∫

0

nx(s1)(x(s1) − x(s2)) + ny(s1)(y(s1) − y(s2))

1 +(

d−λα

)2 ds2

β(s1)w(d)ds1 (A.68)

According to eq. A.4, the expression in square brackets is the gradient

∇H1(Γ)(s) =1

α

1∫

0

nx(s)(x(s) − x(t)) + ny(s)(y(s) − y(t))

1 +(

d−λα

)2 w(d)dt =

1

α

1∫

0

n(s) · dΓ(s, t)

1 +(

d−λα

)2 w(d)dt (A.69)

The gradient flow minimizing energy in eq. 4.29 is therefore

∇E1(Γ)(s) = 2

λ

dλ [H(Γ, λ) − H∗(λ)]

1

α

1∫

0

n(s) · dΓ(s, t)

1 +(

d−λα

)2 w(d)dt

=

2

λ

dλ [H(Γ, λ) − H∗(λ)]

1∫

0

αn(s) · dΓ(s, t)

α2 + (d − λ)2w(d)dt

(A.70)

For α ≈ 0, the expression inside the integral is only non-zero when d = λ. By changing

the order of integration we obtain:

Page 211: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

189

∇E1(Γ)(s) =

2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|w(d)dt

λ

[H(Γ, λ) − H∗(λ)]α

α2 + (d − λ)2dλ =

2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|w(|~Γ(s, t)|)

[

H(Γ, |~Γ(s, t)|) − H∗(|~Γ(s, t)|)]

dt (A.71)

Part II: Similarly we obtain:

∇H2(Γ)(s) =

1∫

0

w′(d)

[

atan

(d − λ

α

)

+ 1

]

n(s) · dΓ(s, t)dt (A.72)

The gradient flow minimizing the energy in eq. 4.29 is therefore

∇E2(Γ)(s) =

2

λ

dλ [H(Γ, λ) − H∗(λ)]

1∫

0

w′(d)

[

atan

(d − λ

α

)

+ 1

]

n(s) · dΓ(s, t)dt

(A.73)

By changing the order of integration we obtain:

∇E2(Γ)(s) =

2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|w′(|~Γ(s, t)|)dt

λ

[H(Γ, λ) − H∗(λ)]

[

atan

(d − λ

α

)

+ 1

]

dλ =

4

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|w′(|~Γ(s, t)|)dt

λ

[H(Γ, λ) − H∗(λ)] h(|~Γ(s, t)| − λ)dλ =

4

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|w′(|~Γ(s, t)|)dt

|~Γ(s,t)|∫

0

[H(Γ, λ) − H∗(λ)] dλ (A.74)

Page 212: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

190

The overall flow is

∇E(Γ)(s) = ∇E1(Γ)(s) + ∇E2(Γ)(s) =

2

1∫

0

~n(s) ·~Γ(s, t)

|~Γ(s, t)|dt

w(|~Γ(s, t)|)

[

H(Γ, |~Γ(s, t)|) − H∗(|~Γ(s, t)|)]

+

w′(|~Γ(s, t)|)dt

|~Γ(s,t)|∫

0

[H(Γ, λ) − H∗(λ)] dλ

(A.75)

A.5 Relative inter-object distances.

This feature function relates 2 shapes. Here we derive the curve flow for the curve Γ,

induced by distribution of distances between curves Γ and Ω. We call Ω the generating

curve and Γ the target curve. We assume that the signed distance transform of the curve

Ω is known at every point in the image plane. That is done by pre-computing the signed

distance function values DΩ(x, y) on the grid crossings and using linear interpolation to

obtain the values at non-integer locations. We compute the gradient ∇ ~DΩ(x, y) from grid

values using central rule of differentiation and linear interpolation to compute ∇ ~DΩ(x, y)

at arbitrary locations. We further normalize DΩ(x, y) by R(Ω) - the mean radius of shape

Ω with respect to its center of mass, so that

DΩ(x, y)′ =DΩ(x, y)

R(Ω)(A.76)

This feature function is computed as DΩ(x, y)′ measured along the curve Γ. The dis-

tribution function is given by

H(Γ, λ) =

1∫

0

h(

DΩ(~Γ(s))′ > λ)

ds (A.77)

Page 213: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

191

Now we can write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition

G [H(Γ, λ), β] = limε→0

H(Γ′, λ) − H(Γ, λ)

ε=

limε→0

1∫

0

h(

DΩ(~Γ(s) + ε~n(s)β(s))′ − λ)

ds −1∫

0

h(

DΩ(~Γ(s))′ − λ)

ds

ε

limε→0

1∫

0

[

h(

DΩ(~Γ(s)) + εβ(s)~n(s)∇DΩ(~Γ(s))′ − λ)

− h(

DΩ(~Γ(s))′ − λ)]

ds

ε(A.78)

Using the differentiable approximation of the indicator function

G [H(Γ, λ), β] =

limε→0

1∫

0

ds[

φγ(DΩ(~Γ(s)) + εβ(s)~n(s)∇DΩ(~Γ(s))′ − λ) − φγ(DΩ(~Γ(s))′ − λ)]

ε

limε→0

12

1∫

0

ds[

atan′(

DΩ(~Γ(s))′−λγ

)

εβ(s)~n(s)∇DΩ(~Γ(s))′]

ε

1

2

1∫

0

ds

[

γ

γ2 + (DΩ(~Γ(s))′ − λ)2β(s)~n(s)∇DΩ(~Γ(s))′

]

(A.79)

Using Ritz representation we obtain

∇H(Γ, λ)(s) =1

2

γ

γ2 + (DΩ(~Γ(s))′ − λ)2~n(s)∇DΩ(~Γ(s))′ (A.80)

The gradient flow minimizing the energy is

∇E(Γ)(s) =

limγ−→0

λ

dλ [H∗(λ) − H(Γ, λ)]γ

γ2 + (DΩ(~Γ(s))′ − λ)2~n(s)∇DΩ(~Γ(s))′ =

~n(s) · ~∇DΩ(s)

R(Ω)

[

H∗

(DΩ(s)

R(Ω)

)

− H

(

Γ,DΩ(s)

R(Ω)

)]

(A.81)

where DΩ(s) is the value of the signed distance function generated by curve Ω at the point

Page 214: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

192

on the curve Γ given by X(s), Y (s), and R(Ω) is the mean radius of the shape Ω relative

to its center of mass.

Page 215: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Appendix B

Curve flow for intensity based feature function

Here we consider the intensity dependent feature function, see section 7.3. Let us consider

a single feature function Φ(s) defined on the curve as

Φ(s) = I(x(s)) = I(Γ(s) + R (s,n(s)) [i j]T

)(B.1)

where I is the image, Γ is the boundary parameterized by arc-length s, n(s) is the local

normal and R(s,n(s)) is the 2D rotation matrix aligning n(s) with j-axis of the patch

coordinate system. x(s) is the trajectory traced by point i, j as the coordinate system

origin is moved along the curve:

x(s) = Γ(s) + R (s,n(s)) [i j]T (B.2)

For this feature function, the CDF for the curve Γ is given by

H(Γ, λ) =

1∫

0

h(I(Γ(s) + R (s,n(s)) [i j]T

)> λ

)ds (B.3)

Applying a small perturbation εβ to the curve, the incremented feature value is

Φ′(s) = I(Γ(s) + εβ(s)n(s)R (s,n(s)) [i j]T

)(B.4)

Using Tailor series expansion

Φ′(s) = I(x′(s)) = I(x(s)) + ∇I(x(s)) · n(s)εβ(s) (B.5)

193

Page 216: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

194

Now we can write the Gateaux semi-derivative G [H(Γ, λ), β] according to its definition

G [H(Γ, λ), β] = limε→0

H(Γ′, λ) − H(Γ, λ)

ε=

limε→0

1∫

0

[h(Φ′(s) > λ) − h(Φ(s) > λ)] ds

ε(B.6)

Using the differentiable approximation of the indicator function

G [H(Γ, λ), β] = limε→0

12

1∫

0

[

atan(

Φ(s)−λα

)

− atan(

Φ′(s)−λα

)]

ds

ε(B.7)

for a small ε, using Taylor expansion

atan

(Φ′(s) − λ

α

)

= atan

(Φ(s) − λ

α

)

+ atan′

(Φ(s) − λ

α

)∇I(x(s)) · n(s)εβ(s)

α+ O(ε2)

(B.8)

Now the Gateaux semi-derivative becomes

G [H(Γ, λ), β] = limε→0

12

1∫

0

[

atan′(

Φ(s)−λα

)∇I(x(s))·n(s)εβ(s)

α

]

ds

ε

1

1∫

0

[

atan′

(Φ(s) − λ

α

)

∇I(x(s)) · n(s)β(s)

]

ds (B.9)

Hence,

∇H(Γ)(s) =1

2αatan′

(Φ(s) − λ

α

)

∇I(x(s)) · n(s) =

1

∇I(x(s)) · n(s)

1 +(

Φ(s)−λα

)2 (B.10)

Page 217: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

195

Now the gradient flow minimizing the energy is

∇E(Γ)(s) = 2

λ

dλ[H∗(λ) − H(Γ, λ)]

1

∇I(x(s)) · n(s)

1 +(

Φ(s)−λα

)2

λ

dλ[H∗(λ) − H(Γ, λ)]

[∇I(x(s)) · n(s)

α2 + (Φ(s) − λ)2

]

(B.11)

Taking the limit of α = 0

∇E(Γ)(s) = [H∗(Φ(s)) − H(Γ, Φ(s))]∇I(x(s)) · n(s) (B.12)

Page 218: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

Appendix C

Multidimensional CDF based shape prior

We now consider the multidimensional (joint) CDF prior. This form of prior arises when

feature functions are defined on the same space Ω, and are correlated random processes. In

such a case, joint CDF contains information about both marginal distributions of individual

feature functions and inter-variable correlations. We will consider the special form of

feature functions, namely, the feature functions defined on the curve, as used in Chapter 7.

Let us have N feature functions Φ1(s), Φ2(s),..., ΦN (s), where s ∈ [0, 1] is the normalized

arc-length of the curve Γ. The joint CDF corresponding to the set of feature functions is

defined as

H(λ1, λ1, ..., λN ) =

1∫

0

h(Φ′1(s) < λ1)h(Φ′

2(s) < λ2) · · ·h(Φ′N (s) < λN )ds (C.1)

The joint shape distribution prior energy is defined as

E =

λ1

λ2

...

λN

[H∗(λ1, λ1, ..., λN ) − H(Γ, λ1, λ1, ..., λN )]2 dλ1dλ1 · · ·λN (C.2)

where H∗(λ1, λ1, ..., λN ) is the prior joint CDF and H(Γ, λ1, λ1, ..., λN ) is the joint CDF

for the curve Γ. We now follow the variational approach assuming small perturbation of

the curve εβ(s). Taking Gateaux semi-derivative of E with respect to this perturbation

G(E, β) = 2

λ1

λ2

...

λN

[H∗(λ1, λ1, ..., λN ) − H(Γ, λ1, λ1, ..., λN )] ×

G (H(Γ, λ1, λ1, ..., λN ), β) dλ1dλ1 · · ·λN (C.3)

196

Page 219: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

197

We now consider Gateaux semi-derivative G (H(Γ, λ1, λ1, ..., λN ), β).

G (H(Γ, λ1, λ1, ..., λN ), β) = limε→0

H(Γ + εβ) − H(Γ)

ε

limε→0

1∫

0

(h(Φ′1(s) < λ1) · · ·h(Φ′

N (s) < λN ) − h(Φ1(s) < λ1) · · ·h(ΦN (s) < λN ))ds

ε(C.4)

Let us now consider a single indicator function h(Φ(s) < λ). As previously, we use

smooth approximation that approaches h(Φ(s) < λ) as α approaches zero.

limα−→0

[atan (x/α) + π/2] = πh(x) (C.5)

We assume that Φ(s) is differentiable with respect to small curve perturbation at s, so the

Taylor series expansion of the updated value Φ′(s) with respect to this small perturbation

is

Φ(s, εβ) = Φ(s) +∂Φ(s)

∂nεβ(s) + O(ε) (C.6)

where ∂Φ(s)∂n

is the derivative with respect to small perturbation along the normal. Now we

can write the Taylor expansion for h(Φ′(s) < λ):

πh(Φ′(s) < λ) ≈ atan

(Φ(s) − λ

α

)

2+ atan′

(Φ(s) − λ

α

)∂Φ(s)

∂nεβ(s) (C.7)

We now use the approximation to compute the Gateaux semi-derivative of H. We sub-

stitute approximation in eq. C.7 into eq. C.4. The resulting expression is a polynomial in

ε. It is easy to see that the terms containing ε0 mutually cancel. Since ε is small we can

retain only the largest terms, those containing ε1. As a result, we have

G (H(Γ, λ1, λ1, ..., λN ), β) =

1

πNlimε→0

∫ N∑

i=1atan′

(Φi(s)−λi

α

)∂Φi(s)

∂nεβ(s)

j 6=i

(

atan(

Φj(s)−λj

α

)

+ π2

)

ds

ε=

1

πN

∫ N∑

i=1

atan′

(Φi(s) − λi

α

)∂Φi(s)

∂nβ(s)

j 6=i

(

atan

(Φj(s) − λj

α

)

2

)

ds (C.8)

Page 220: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

198

Therefore, the minimizing flow ∇H is

∇H(Γ, λ1, λ1, ..., λN )(s) =

1

πN

N∑

i=1

atan′

(Φi(s) − λi

α

)∂Φi(s)

∂n

j 6=i

(

atan

(Φj(s) − λj

α

)

2

)

= (C.9)

1

π

N∑

i=1

∂Φi(s)

∂n

1

1 +(

Φi(s)−λi

α

)2

j 6=i

h(Φj(s) < λj) (C.10)

In the limit of α ≈ 0, the gradient flow minimizing the energy in eq. C.2 is

∇E(s) =

1

π

N∑

i=1

∂Φi(s)

∂n

...

λj 6=i

[H(Φi(s), λj 6=i) − H∗(Φi(s), λj 6=i)] ×

j 6=i

h(Φj(s) < λj)dλ1,...j 6=i,...N (C.11)

According to eq. C.11, the computation of the minimizing flow at each point on the contour

requires N−1 dimensional integration. Therefore, the computational expense in using joint

CDF prior is exponential in N , where N is the number of feature functions used. How-

ever, we hypothesize that an efficient approximation can be made, reducing the required

computations.

Page 221: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

References

Alcayde, D., Blelly, P.-L., Kofman, W., Litvin, A., and Oliver, W. (2001). Effects of hotoxygen in the ionosphere: Transcar simulations. Annales Geophysicae, 19:257–261.

Alexandrov, O. and Santosa, F. (2005). A topology-preserving level set method for shapeoptimization. Journal of Computational Physics, 204(1):121–130.

Antani, S., Kasturi, R., and Jain, R. (2002). A survey on the use of pattern recognitionmethods for abstraction, indexing and retrieval of images and video. Pattern Recogni-tion, 35:945–965.

Basri, R., Costa, L., Geiger, D., and Jacobs, D. (1995). Determining the similarity ofdeformable shapes. In Proceedings of the Workshop on Physics Based Modeling inComputer Vision, pages 135–143, Cambridge, MA.

Belongie, S., Malik, J., and Puzicha, J. (2002). Shape matching and object recognitionusing shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence,24.

Berretti, S., Del Bimbo, A., and Pala, P. (2000). Retrieval by shape similarity withperceptual distance and effective indexing. IEEE Transactions on Multimedia, 2(4):225–239.

Blum, H. (1967). A Transformation for Extracting New Descriptions of Shape. MITPress.

Bookstein, F. (1991). Morphometric tools for landmark data: Geometry and Biology.Cambridge University Press.

Bookstein, F. (1998). Linear Methods for Nonlinear Maps: Procrustes Fits, Thin-PlateSplines, and the Biometric Analysis of Shape Variability. Brain Warping. AcademicPress.

Bookstein, F. L. (1997). Landmark methods for forms without landmarks: morphometricsof group differences in outline shape. Medical Image Analysis, 1(3):225–243.

Borgefors, G. (1984). Distance transformations in arbitrary dimensions. Computer Vision,Graphics, and Image Processing, 27:321–345.

Canzar, S. and Remy, J. (2005). Shape distributions and protein similarity. submitted.

Caselles, V., Kimmel, R., and Sapiro, G. (1997). Geodesic active contours. InternationalJournal of Computer Vision, 22(1):61–79.

199

Page 222: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

200

Chan, T. and Vese, L. (2001). Active contours without edges. IEEE Transactions onImage Processing, 10(2):266–277.

Charpiat, G., Faugeras, O., and Keriven, R. (2003). Approximations of shape metrics andapplication to shape warping and empirical shape statistics. Technical Report RR-4820,INRIA, Sophia Antipolis, France.

Christensen, G. (1994). Deformable shape models for anatomy. PhD thesis, WashingtonUniversity, St. Louis, US.

Chupeau, B. and Forest, R. (2001). Evaluation of the effectiveness of color attributes forvideo indexing. Journal of Electronic Imaging, 10(4):883–894.

Cohen, Z., McCarthy, D., Kwak, S., Legrand, P., Fogarasi, F., Ciaccio, E., and Ateshian,G. (1999). Knee cartilage topography, thickness and contact areas from mri: In vitrocalibration, and in vivo measurements. Osteoarthritis & Cartilage, 7:95–109.

Cootes, T., Hill, A., Taylor, C., and Haslam, J. (1993). The use of active shape models forlocating structures in medical images. In Information Processing in Medical Imaging,pages 33–47, Berlin, Germany.

Cootes, T., Taylor, C., Cooper, D., and Graham, J. (1995). Active shape models – theirtraining and application. Computer Vision and Image Understanding, 61(1):38–59.

Cootes, T. F., Edawrds, G. J., and Taylor, C. J. (2001). Active appearance models. IEEETransactions on Pattern Analysis and Machine Intelligence, 23(6):681–685.

Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley Seriesin Telecommunications, John Wiley & Sons, New York, NY.

Davatzikos, C., Vaillant, M., Resnick, S. M., Prince, J. L., Letovsky, S., and Nick, B. R.(1996). A computerized approach for morphological analysis of the corpus callosum.Journal of Computer Assisted Tomography, 20(1):88–97.

Dimitrov, P., Phillips, C., and Siddiqi, L. (2000). Robust and efficient skeletal graphs.In Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),page 1417, Hilton Head Island, South Carolina.

Dryden, I. and Mardia, K. (1998). Statistical shape analysis. John Wiley & Sons.

Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification. John Wileyand Sons Inc.

Duta, N. (2000). Predicting dyslexia based on the shape of the corpus callosum in mrimages. In Proceedings of International Conference on Pattern Recognition (ICPR),Barcelona, Spain.

Duta, N. and Sonka, M. (1998). Segmentation and interpretation of mr brain images: animproved active shape model. IEEE Transactions on Medical Imaging, 17(7):1049–1062.

Page 223: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

201

Fenster, S. D. and Kender, J. R. (2001). Msectored snakes: evaluating learned-energysegmentations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9).

Freedman, D., J., R. R., Zhang, T., Jeong, Y., and Chen, G. T. (2004). Model-based multi-object segmentation via distribution matching. In Third IEEE Workshop on Articulatedand Nonrigid Motion, Baltimore, US.

Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. Journal ofJapanese Society for Artificial Intelligence, 14(5):771–780.

Galand, P., Grimson, W. E. L., and Kikinis, R. (1999). Statistical shape analysis us-ing fixed topology skeletons: Corpus callosum study. In Proceedings of InformationProcessing in Medical Imaging (IPMI), pages 382–387.

Galand, P., Grimson, W. E. L., Shenton, M. E., and Kikinis, R. (2000). Small sample sizelearning for shape analysis. In Proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 72–82.

Greer, D. R., Fung, I., and Shapiro, J. H. (1997). Maximum-likelihood multiresolutionlaser radar range imaging. IEEE Transactions on Image Processing, 6(1):36–46.

Grigorescu, C. and Petkov, N. (2003). Distance sets for shape filters and shape recognition.IEEE Transactions on Image Processing, 12(10):1274–1286.

Herbulot, A., Jehan-Besson, S., Barlaud, M., and G., A. (2004). Shape gradient for multi-modal image segmentation using joint intensity distributions. In Proceedings of 5th In-ternational Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS),Lisbon, Portugal.

Ho, G. and Shi, P. (2004). Domain partitioning level set surface for topology con-strained multi-object segmentation. In Proceedings of IEEE International Symposiumon Biomedical Imaging, Washington DC, US.

Ip, C. Y., Lapadat, D., Sieger, L., and Regli, W. (2002). Using shape distributions tocompare solid models. In Proceedings of the Seventh ACM Symposium on Solid Modelingand Applications, pages 273–280, Saarbrucken, Germany. ACM Press.

Jaggi, S., Karl, W. C., Mallat, S. G., and Willsky, A. S. (1999). Silhouette recognitionusing high-resolution pursuit. Pattern Recognition, 32:753–771.

Jehan-Besson, S. (2003). Modeles de contours actif bases regions pour la segmantationd’images et de videos. PhD thesis, Universite de Nice - Sophia Antipolis.

Kapur, T. (1999). Model based three dimensional medical image segmentation. PhD thesis,MIT, Boston, US.

Kim, J., Fisher, J. W., Yezzi, A., Cetin, M., and Willsky, A. S. (2002a). Nonparametricmethods for image segmentation using information theory curve evolution. In Proceed-ings of International Conference on Image Processing (ICIP), Rochester, USA.

Page 224: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

202

Kim, J., Tsai, A., Cetin, M., and Willsky, A. (2002b). A curve evolution-based variationalapproach to simultaneous image restoration and segmentation. Proceeding of IEEEInternational Conference on Image Processing (ICIP).

Klassen, E., Srivastava, A., and Mio, W. (2004). Analysis of planar shapes using geodesicpaths on shape spaces. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 26(3):372–383.

Koslow, S. and Huerta, M. (1997). Neuroinformatics: An Overview of the Human BrainProject. Lawrence Erlbaum Associates, Mahwah, New Jersey.

Kullback, S. (1968). Information theory and statistics. Dover, New York, NY.

Kunttu, I., Lepisto, L., Rauhamaa, J., and Visa, A. (2004). Multiscale fourier descriptor forshape-based image retrieval. In 17th International Conference on Pattern Recognition(ICPR), pages 765–768, Cambridge, UK.

Le, H. (1991). Locating frechet means with application to shape spaces. Advances inApplied Probability, 33(2):324–338.

Lemke, P., Skiena, S. S., and Smith, W. D. (2002). Reconstructing sets from interpointdistances. Technical Report TR 2002-37, DIMACS, NJ.

Leventon, M., Grimson, W. E. L., and Faugeras, O. (2000a). Statistical shape influ-ence in geodesic active contours. IEEE Conference on Computer Vision and PatternRecognition (CVPR).

Leventon, M. E., Grimson, W. E. L., Faugeras, O., and III, W. M. W. (2000b). Level setbased segmentation with intensity and curvature priors. IEEE Workshop on Mathemat-ical Methods in Biomedical Image Analysis Proceedings (MMBIA), pages 4–11.

Li, D. and Simske, S. (2002). Shape retrieval based on distance ratio distribution. Tech-nical Report HPL-2002-251, Intelligence Enterprise Technologies Laboratory, Palo Alto,CA.

Ling, H. and Okada, K. (2006). Emd-l1: An efficient and robust algorithm for compairinghostogram-based descriptors. In European Conference on Computer Vision (accepted).

Litvin, A. and Karl, W. C. (2002). Image segmentation based on prior probabilistic shapemodels. In Proceedings of IEEE International Conference on Acoustic Speech and SignalProcessing (ICASSP), Orlando, Florida.

Litvin, A. and Karl, W. C. (2003). Levelset based segmentation using data driven shapeprior on feature histograms. In Proceedings of 2003 IEEE Workshop on Statistical SignalProcessing, Minneapolis, Minnesota.

Litvin, A. and Karl, W. C. (2004a). Using shape distributions as priors in a curve evolutionframework. In Proceeding of IS&T/SPIE 16th Annual Symposium Electronic ImagingScience and Technology, San Jose, California.

Page 225: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

203

Litvin, A. and Karl, W. C. (2004b). Using shape distributions as priors in a curve evolutionframework. In Proceedings of 2004 IEEE International Conference on Acoustic Speechand Signal Processing (ICASSP), Montreal, Canada.

Litvin, A. and Karl, W. C. (2004c). Using shape distributions as priors in a curve evolutionframework. Technical Report ECE-2004-03, Boston University, Boston, USA.

Litvin, A. and Karl, W. C. (2005a). Coupled shape distribution-based segmentation ofmultiple objects. In Proceeding of Information Processing in Medical Imaging (IPMI2005), Glenwood Springs, Colorado.

Litvin, A. and Karl, W. C. (2005b). Coupled shape distribution-based segmentation ofmultiple objects. Technical Report ECE-2005-01, Boston University, Boston, USA.

Litvin, A., Karl, W. C., and Shah, J. (2006). Shape and appearance modeling withfeature distributions for image segmentation. In International Symposium on BiomedicalImaging (ISBI), Arlington, Virginia.

Litvin, A., Konrad, J., and Karl, W. C. (2003). Probabilistic video stabilization usingkalman filtering and mosaicing. In Proceeding of IS&T/SPIE 15th Annual SymposiumElectronic Imaging Science and Technology, Santa Clara, California.

Litvin, A., Oliver, W. L., and Amory-Mazaudier, C. (2000a). Hot o and nighttime iono-spheric temperatures. Geophysical Research Letters, 27(17):2821–2824.

Litvin, A., Oliver, W. L., Picone, J. M., and Buonsanto, M. J. (2000b). The upperatmosphere during june 5-11, 1991. Journal of Geophysical Research, 105(A6):12789–12796.

Lynch, J., Zaim, S., Zhao, J., Stork, A., Peterfy, C. G., and Genant, H. (2000). Cartilagesegmentation of 3d mri scans of the osteoarthritis knee combining user knowledge andactive contours. Proceedings of SPIE Conference on Medical Imaging, 3979:925–935.

Martin, J., Pentland, A., and Sclaroff, S. (1998). Characterization of neuropathologicalshape deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(2).

Matsakis, P., Keller, J. M., Sjahputera, O., and Marjamaa, J. (2004). The use of force his-tograms for affine-invariant relative position description. IEEE Transactions on PatternAnalysis and Machine Intelligence, 26(1):1–18.

Maurel, P. and Sapiro, G. (2003). Dynamic shapes average. Proceedings of 2nd IEEEworkshop on variational, geometric and level set methods in computer vision.

McInerney, T. and Terzopoulos, D. (1995). Topologically adaptable snakes. In Interna-tional Conference on Computer Vision, Boston, Massachusetts.

Mezghani, N., Mitiche, A., and Cheriet, M. (2004). On-line character recognition usinghistograms of features and an associative memory. Proceedings of IEEE InternationalConference on Acoustic Speech and Signal Processing (ICASSP).

Page 226: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

204

Minguez, J. and Montano, L. (2005). Abstracting vehicle shape and kinematics constraintsfrom obstacle avoidance methods. Autonomous Robots, in press.

Mio, W., Badlyans, D., and Xiuwen, L. (2005). A computational approach to fisher infor-mation geometry with applications to image analysis. In Proceedings of Internationalworkshop on Energy Minimization Methods in Computer Vision and Pattern Recogni-tion, St. Augustine, Florida.

Mumford, D. and Shah, J. (1985). Boundary detection by minimizing functionals. InProceedings of Computer Vision and Pattern Recognition, pages 22–26, San Francisco.

Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., Yanker, P.,and Faloutsos, C. (1993). Querying images by content, using color, texture, and shape.In SPIE Conference on Storage and Retrieval for Image and Video Databases, volume1908, pages 173–187.

Osada, R., Funkhouser, T., Chazelle, B., and Dobkin, D. (2001). Matching 3d modelswith shape distributions. International Conference on Shape Modeling and Applications,pages 154–166.

Osada, R., Funkhouser, T., Chazelle, B., and Dobkin, D. (2002). Shape distributions.ACM Transactions on Graphics, 21(4):807–832.

Page, D., Koschan, A., Sukumar, S., Roui-Abidi, B., and Abidi, M. (2003). Shape analysisalgorithm based on information theory. In Proceedings of the International Conferenceon Image Processing, Barcelona, Spain.

Paragios, N. and Deriche, R. (2000). Coupled geodesic active regions for image segmen-tation: A level set approach. In European Conference in Computer Vision, Dublin,Ireland.

Pizer, S. M., Fletcher, P. T., Fridman, Y., Fritsch, D. S., Gash, A. G., Glotzer, J. M.,Joshi, S., Thall, A., Tracton, G., Yushkevich, P., and Chaney, E. L. (2003). Deformablem-reps for 3d medical image segmentation. International Journal of Computer Vision -Special UNC-MIDAG issue, 55(2):85–106.

Pizer, S. M., Fritsch, D. S., Yushkevich, P. A., Johnson, V. E., and Chaney, E. L. (1996).Segmentation, registration, and measurement of shape variation via image object shape.IEEE Transactions on Medical Imaging, 18(10):851–865.

Poonawala, A., P. M. R. G. (2003). On the uncertainty analysis of shape reconstructionfrom areas of silhouettes. In Fifth International Conference on Advances In PatternRecognition, Calcutta, India.

Puzicha, J., Hofmann, T., and Buhmann, J. M. (1999). Histogram clustering for unsuper-vised segmentation and image retrieval. Pattern Recognition Letters, 20:899–909.

Rote, G. (1991). Computing the minimum hausdorff distance between two point sets on aline under translation. Information Processing Letters, 38:123–127.

Page 227: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

205

Rousson, M. and Paragios, N. (2002). Shape priors for level-set representations. EuropeanConference on Computer Vision (ICCV).

Rubner, Y., Tomasi, C., and Guibas, L. J. (1999). The earth mover’s distance as ametric for image retrieval. Technical Report STAN-CS-TN-98-86, Computer ScienceDepartment, Stanford University, Stanford, CA.

Rudin, W. (1966). Real and Complex Analysis. McGraw-Hill.

Ruiz-Correa, S., Shapiro, L. G., Berson, G., Cunningham, M. L., and Sze, R. W. (2006).Symbolic signatures for deformable shapes. IEEE Transactions on Pattern Analysis andMachine Intelligence, 28(1)(1):75–90.

Sapiro, G. and Caselles, V. (1995). Histogram modification via partial differential equa-tions. International Conference on Image Processing Proceedings (ICIP).

Sapiro, G. and Caselles, V. (1997). Histogram modification via partial differential equa-tions. Journal of Differential Equations, 135:238– 268.

Sclaroff, S. and Liu, L. (2001). Deformable shape detection and description via model-based region grouping. IEEE Transactions on Pattern Analysis and Machine Intelli-gence (PAMI), 23(5):475–489.

Sethian, J. (1999). Level set methods and fast marching methods. Cambridge UniversityPress.

Shah, J. (1996). A common framework for curve evolution, segmentation and anisotropicdiffusion. In Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR), page 136.

Shah, J. (2002). Elastica with hinges. Journal of Visual Communication and ImageRepresentation, 13:36.

Shah, J. (2005). Gray skeletons and segmentation of shapes. Computer Vision and ImageUnderstanding, 99(1):96–109.

Shen, H., Shi, Y., and Peng, Z. (2005). Applying prior knowledge in the segmentation of3d complex anatomical structures. In International Conference on Computer Vision,Beijing, China.

Shen, H. and Wong, A. (1983). Generalized texture representation and metric. ComputerVision, Graphics, and Image Processing, 23(2):187–206.

Shi, Y. (2005). Object-Based Dynamic Imaging with Level Set Methods. PhD thesis,Boston University, Boston, US.

Shi, Y. and Karl, W. (2005). A fast level set method without solving pdes. In InternationalConference on Acoustic, Signal and Speech Processing (ICASSP), Philadelphia, PA.

Page 228: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

206

Siddiqi, K., Lauziere, Y., Tannenbaum, A., and Zucker, S. (1997). Area and lengthminimizing flows for shape segmentation. CVC TR-97-001/CS TR-1146.

Skiena, S. S. (1997). The Algorithm Design Manual. Springer-Verlag, New York.

Staib, L. and Duncan, J. (1992). Boundary finding with parametrically deformable models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1061–1075.

Staib, L. and Duncan, J. (1996). Model-based deformable surface finding for medicalimages. IEEE Transactions on Medical Imaging, 15(5):720–731.

Sukmarg, O. and Rao, K. (2000). B-spline curve representation of segmented object inmpeg compressed domain. In International Symposium on Wireless Personal Multime-dia Communications, Bangkok, Thailand.

Sumengen, B., Manjunath, B., and Kenney, C. (2002). Image segmentation using curveevolution and flow fields. In Proceedings of IEEE International Conference on ImageProcessing (ICIP), Rochester, NY.

Sundar, H., Silver, D., Gagvani, N., and Dickinson, S. (2003). Skeleton based shape match-ing and retrieval. In Proceedings of the International Conference on Shape Modeling andApplication (SMI), page 130.

Tagare, H. D. (1997). Deformable 2-d template matching using orthogonal curves. IEEETransactions on Medical Imaging, 16(1).

Tagare, H. D. (1999). Shape-based nonrigid correspondence with application to heartmotion analysis. IEEE Transactions on Medical Imaging, 18(7):570–579.

Tari, S. and Shah, J. (2000). Nested local symmetry sets. Computer Vision and ImageUnderstanding, 79(2):267–280.

Tasdizen, T. and Whitaker, R. (2004). Higher-order nonlinear priors for surface reconstruc-tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7):878–890.

Thayananthan, A., Stenger, B., Torr, P. H. S., and Cipolla, R. (2003). Shape contextand chamfer matching in cluttered scenes. In Proceedings of Conference on ComputerVision and Pattern Recognition, volume I, pages 127–133, Madison, USA.

Tsai, A., Wells, W., Tempany, C., E., G., and Willsky, A. (2003). Coupled multi-shapemodel and mutual information for medical image segmentation. In Information Pro-cessing in Medical Imaging, pages 185–197, Ambleside, UK.

Tsai, A., Wells, W., Tempany, C., Grimson, E., and Willsky, A. (2004). Mutual infor-mation in coupled multi-shape model for medical image segmentation. Medical ImageAnalysis, 8(4):429–445.

Tsai, A., Yezzi, A., Wells, W., Tempany, C., Tucker, D., Fan, A., Grimson, W., andWillsky, A. (2001a). Model-based curve evolution technique for image segmentation.IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Page 229: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

207

Tsai, A., Yezzi, A., and Willsky, A. (2001b). Curve evolution implementation of the themumford-shah functional for image segmentation, denoising, interpolation, and magni-fication. IEEE Transactions on Image Processing, 10(8):1169–1186.

Unal, G., Krim, H., and Yezzi, A. (2002). Stochastic differential equations and geometricflows. IEEE Transactions on Image Processing, 11(12):1405–1417.

Wang, Y. and Staib, L. (2000). Boundary finding with prior shape and smoothness models.IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(7):738–743.

Yang, J., Staib, L. H., and Duncan, J. S. (2003). Neighbor-constrained segmentation with3d deformable models. In Information Processing in Medical Imaging, pages 198–209,Ambleside, UK.

Yezzi, A., Tsai, A., and Willsky, A. (1999). A statistical approach to curve evolution forimage segmentation. Technical Report, LIDS, Massachusetts Institute of Technology.

Yushkevich, P. A., Joshi, S. C., Pizer, S. M., Csernansky, J. G., and Wang, L. E. (2003).Feature selection for shape-based classification of biological objects. In Proceedings ofInformation Processing in Medical Imaging (IPMI), Ambleside, UK.

Zhang, H. and Malik, J. (2003). Learning a discriminative classifier using shape contextdistances. In IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR), volume 1, page 242.

Zhu, S. (1999). Embedding gestault laws in markov random fields. IEEE Transactions onPattern Analysis and Machine Intelligence, 21(11).

Zhu, S. and Yuille, A. (1996). Forms: A flexible object recognition and modeling system.International Journal of Computer Vision, 20(3):187–212.

Page 230: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

CURRICULUM VITAE

Andrey Litvin

e-mail: [email protected]

EDUCATION

Boston University, Boston, Massachusetts 2000-2006PhD in E.E., May 2006 GPA 3.9/4.0Advisor: Professor W. Clem Karl

Boston University, Boston, Massachusetts 1999-2000MS in E.E., May 2000 GPA 3.96/4.0Advisor: Professor William Oliver

Saint-Petersburg State University, Saint-Petersburg, Russia 1992-1996BS in Physics, May 1996

EMPLOYMENT

Siemens Corporate Research, Princeton, New Jersey May 2005 - November 2005Temporary Technical Employee

Boston University, Boston, Massachusetts 1999-2006Research Assistant

National Polytechnic Institute of Grenoble (INPG), France 1996-1997Intern

PUBLICATIONS

1. A. Litvin, W.C. Karl, J. Shah, Shape and appearance modeling with feature dis-tributions for image segmentation, International Symposium on Biomedical Imaging(ISBI), Arlington, Virginia, 2006

2. A. Litvin and W.C. Karl, Coupled shape distribution based segmentation of multipleobjects, Information Processing in Medical Imaging (IPMI05), 2005

3. A. Litvin and W.C. Karl, Using shape distributions as priors in a curve evolutionframework, Proceedings of 2004 IEEE International Conference on Acoustic Speechand Signal Processing (ICASSP), 2004

Page 231: BOSTON UNIVERSITY - BUiss.bu.edu/students/litvin/Publications/thesis_litvin_2006.pdfvaluable advice at di erent stages of my PhD work. I would like to thank Professor Stan Sclaro for

209

4. A. Litvin and W.C. Karl, Levelset based segmentation using data driven shape prioron feature histograms, Proceedings of 2003 IEEE Workshop on Statistical SignalProcessing

5. A. Litvin, J. Konrad and W.C. Karl, Probabilistic video stabilization using Kalmanfiltering and mosaicing, Proceeding of IS&T/SPIE 15th Annual Symposium - Elec-tronic Imaging Science and Technology, Santa Clara, California, 2003

6. A. Litvin and W.C. Karl , Image segmentation based on prior probabilistic shapemodels, Proceedings of 2002 IEEE International Conference on Acoustic Speech andSignal Processing (ICASSP), 2002

7. D. Alcayde P.-L. Blelly, W. Kofman, A. Litvin, W.L. Oliver, Effects of hot oxygen inthe ionosphere: TRANSCAR simulations, Annales Geophysicae, Vol. 19, pp. 257-261, 2001

8. A. Litvin, W. L. Oliver, J. M. Picone and M. J. Buonsanto, The upper atmosphereduring June 5-11, 1991, Journal of Geophisical research, vol. 105, No. A6, pp.12789-12796, 2000

9. A. Litvin, W. L. Oliver and C. Amory-Mazaudier, Hot O and nighttime ionospherictemperatures, Geophysical Research Letters, vol. 27, No. 17, pp. 2821-2824, 2000

10. A. Litvin, W. Kofman and B. Cabrit, Ion composition measurements and modelingat altitudes from 140 to 350 km using EISCAT measurements, Annales Geophysicae,vol. 16, pp. 1159-1168, 1998


Recommended