PhD theses
Variational Methods in Machine Vision
JÓZSEF MOLNÁR
Supervisor: Prof. Dmitry Chetverikov
Eötvös Loránd University
Doctoral School on Informatics
The Basics and Methodology of Informatics
Program Director: Prof. János Demetrovics
Budapest 2011
2
I. Introduction
This short introduction summarizes the variational method’s role in the sciences in general
and in the machine vision specifically. The fundamental notions are written with italics.
A discipline, which models its objects’ non trivial interactions with mathematical
categories – usually synthesized in its fundamental equations. These equations are often
ordinary or partial differential equations, emanating directly or indirectly from the discipline’s
axioms. Mathematically, the indirect equations derived from variational principles considered
the most established basis for the fundamentals. This is because the equations stemming from
variational principles express more than local interactions, they also guarantee certain global
principles. These are usually conservation, minimization principles. The assuring method is
the extremal-finding of functionals. Consider the smallest action principles in physics to
derive motion and field equations, or the geodetics (as the generalization of straight lines) in
curved spaces.
In the machine vision, the energy-minimization analogy leads to variational principles.
Examples such as image content segmentation using active contour, where we minimize the
“energy” of the segmenting curve use this analogy: the energy consists of external – image
content dependent, and internal – shape dependent parts. Similarly, in the case of optical flow
(where the pixel-wise motion field is sought between neighboring images of a sequence), the
purpose is to determine an energy-minimisation field: this field has to satisfy the optical
constraint (e.g. intensity-conservation) besides the minimization of an internal trait of the
sought field (e.g. the field divergence). The conservation of a quantity is equivalent with the
minimization of its variability; therefore the total energy minimization idea remains valid in
this case too. Thus, in the machine vision the energy-minimization analogy often used1. The
external-internal dualism illustrated above, is also typical. The total energy consists of data
term responsible for the external influences, and a smoothness term (or reguliser) responsible
for preserving some internal characteristics.
The regular manner, computing the functionals’ extremals is the derivation of Euler-
Lagrange equations. The types of these equations depend on the problem’s dimension, the
number of unknown functions and the order of the unknown functions’ derivatives: run from
ordinary, second order differential equations to higher-order partial differential equation
systems. For solving partial differential equations iterative methods are used. The solution is
1 This is the reason, why in the literature the energy-functional expression is used.
3
the fix point of the iteration, where the difference between the successive approximations is
closer than a certain threshold. The iterative methods applied for embedded manifolds are
known as evolution. One of the most successful evolution methods is the Level Set method.
Now we summarize the variational methods’ typical fields and duties in machine vision.
The active contour (snake), active surface methods are widely used for image segmentation,
3D object and scene reconstruction. The total variation (TV) method is the variational
approach for noisy, blurred image restoration. The combination of total variation and active
contours is a possible way for image interpolation, where the problem is the substitution of
the missing information from the neighborhood of the deficiency, keeping the important
image features as edges and textures continuous. The variational optical flow is an essential
method for analyzing the motion occurring between successive images of a video.
Applications are wide: video compressing techniques by key frames, robotics, vehicles’
assistant systems, human-machine interactions etc. Also, wide range of image registration
methods based on optical flow; where the problem is the fitting of the object representations
captured by different sensors (multispectral, multimodal registration). These problems occur
frequently in aerial photo processing or in medical diagnostics.
4
II. The organization of the dissertation
After a short Introduction in chapter 2 – Variational principles, their appearances in machine
vision – we review the fields – with literature references – where the variational methods
prevalent in the machine vision. We analyze the meaning and structure of typical functionals
invoking representative examples to illustrate the usage of data and smoothness terms. These
examples will be used as references later in the document. The Level Set formalism and the
methods to derive differential equations are discussed here too. Finally we illustrate the Euler-
Lagrange equations via a specific example (Horn-Schunck optical flow), which will be
referred in chapter 3.
In the introduction of chapter 3 – Optical flow – we present the applications where the
method is applied, the properties of the method, the motivation led us to this research area
(illumination robust application), and the related researches. In the second part we specifically
discuss the Optical flow based on cross-correlation, including the (non-central) normalized
cross-correlation data term for grayscale and color images, the approximate Euler-Lagrange
equations and the principles of linearization and discretisation. The numerical formula is then
compared and discussed with the introduction’s Horn-Shunck formula. The integral part of
this subchapter the derivation of the Euler-Lagrange equation can be found in appendix A.
The next subchapter: Tests of cross-correlation optical flow describes the test circumstances,
test results; grouping the results by the input sources: synthetic grayscale, outdoor and
synthetic color data. In the conclusion we compare the method quality with the state of the art
methods and discuss the possible future improvements.
In the introduction of chapter 4 – Active contour – we present the development and types
of segmentation techniques based on active contours. In the subchapter: Segmentation using
local regions the motivation (layer segmentation of Optical Coherence Tomography images)
and the expected properties of the proposed method are presented. In the next subchapter we
introduce the Basic model, the simplest local region based model. This includes the definition
of the local regions nearby the segmentation curve, the associated energy functional, the
normal component of the derived Euler-Lagrange equation (derivation can be found in
appendix B), the normal flow’s Level Set equation and a simple statistical separator function.
The subchapter is finished with criticism of the basic model, which justifies the need for
improvement. In the Model’s refinements subchapter we discuss twofold improvements: with
the second order curve approximation the size of the local regions can be extended in
5
tangential direction (for the sake of more robust statistics), while the integration region with
optimal shape using optimal integration boundary in normal direction, can enhance the
method’s ability for separation of regions where the average intensity difference is low. We
prove that the latter problem is a local variational problem in its own. In the Applying the
model, results subchapter we describe test circumstances and test results, and a possible two-
step algorithm which can further improve the speed. We close the chapter with the possible
future improvements including 3D.
In the introduction of chapter 5 – 3D reconstruction – we shortly summarize the 3D
reconstruction methods based on functional minimization principle, discuss the most
frequently used pinhole camera model and the projective and affine transformations based on
that model. At the end, we review the limitations stem from the pinhole camera model, and
we set our objective: the deduction of a quadratic transformation which is compatible with the
Level Set method. In the subchapter Linear transformation, the deduction steps of a Level Set
method-compatible linear transformation is specified. Partly similar steps are used for the
deduction of quadratic transformation. The formula of linear transformation is also a
constitutive part of the quadratic transformation. In the Quadratic transformation subchapter
we deduce the invariant equations of the quadratic mapping between two projections
(images). Both the cameras’ projection functions and the observed surface are approximated
with their (first and) second order differential quantities. Integral parts of the deduction found
in appendixes C and D. The subchapter is closed with the quantities’ computing instructions
on fixed spatial grid (for Level Set). We also present alternative computing in appendix E. in
the subchapter The result of the quadratic transformation we discuss the meanings of the
different constitutive terms, and compare with linear mapping (that is affine homography in
the case of pinhole camera model), and illustrate the quadratic mapping’s accuracy against the
projective and affine homography. In the closing chapter: An application of the quadratic
transformation we present the multiview 3D reconstruction proposed by Feugeras-Keriven,
which was used for validation, the test circumstances and test results.
Chapter 6 – Theses – sets out the theses of the dissertation. The used notation can be
found at the beginning of the document under Notation heading, the references collected at
the end of the document under Bibliography heading.
6
III. New scientific result discussed in the
dissertation
I did research on three different areas of the machine vision. These are optical flow, active
contours and 3D reconstruction. I took particular attention for the mathematical clarification
of all topics.
figure 1: Two frames of an outdoor video sequence (upper row). Applying displacement field to the pixels of the first frame we get the reconstruction of the second frame. Parts of reconstructed images using Horn-
Schunck and cross-correlation methods (lower row).
In the case of optical flow the objective was the elaboration of a fast, illumination-robust
method, which can be applied for outdoor video sequences’ processing (figure 1), even with
ability handling the changes in color illumination. The new results attained using the
normalized cross-correlation as data term in the energy functional. The special structure of the
Lagrangian (a compound of local integrals) implied the Euler-Lagrange equations as infinite
series of integro-differential terms. I simplified the analytical formula to a well applicable
numerical form with multi step linearisation. I also developed a software component for
testing. The tests were prudently performed on standard data sets, according to the current
requirements. I tested the method on synthetic grayscale, color as well as on real outdoor data.
According to the accuracy tests the method is comparable with the state of the art methods too
(despite the fact that high accuracy was not pursued). The publications of the method and the
results can be found in: [S1,S2,S3,S5,S6].
Horn-Schunck Horn-Schunck Cross-correlation Cross-correlation
frame 1 (with artifical shadow) frame 2
7
figure 2: Few segmentation phase using the elaborated local region model. Images made by OCT technology from rodent retina.
In the case of active contour topic, the objective was to elaborate a fast, reliable method
to allow the segmentation of retinal layer images captured by Optical Coherence Tomography
(OCT) technology. The images have no real edges, image features can be interpreted as
statistical quantities (figure 2). The usage of local regions alongside the segmentation curve is
a combination of the local feature driven and fully global region based methods. The method
allows the image’s data statistical interpretation without full image regions processing,
applicable to both open and closed curves. I proposed a Lagrangian, which separates image
regions by mean intensity, and derived the respective Normal Flow and Level Set equations. I
recommended the basic model’s twofold improvements enhancing the model’s robustness and
separation capability. I developed a software component with which method’s tests were
performed. The publications of the method and the results can be found in: [S7,S10].
Segmentation of Internal Limiting Membrane (ILM)
Segmentation of Retinal Pigment Epitheliun (RPE)
8
figure 3: Correspondences established by different mappings (the leftmost image is the first projection). The observed object is given as implicit surface.
In the case of 3D reconstruction the objective was to improve the quality and to extend
the usability domain of a variational method proposed Feugeras-Keriven. The method based
on active surface evolution driven by images’ local regional correspondences (between
images taken from different views). I deduced a Level Set compatible quadratic mapping
between the corresponding image regions, which approximates both the projections and the
observed surface. The equations don’t suppose the pinhole camera model. I made the analysis
of the result: discussed the contribution of the different terms to the result, clarified the
relationship to the projective and affine homographies (figure 3). We tested the equations on
synthetic data. The test justifies that the quadratic mapping serves more reliable results in the
case of objects with big curvatures. It is important to note that the equations can be used more
generally, wherever a method based on images’ local region correspondences. The
publications the method and the results can be found in: [S4,S8], submitted: [S9*].
1st projection affine homography projective homographhy quadratic mapping
9
IV. Theses
In this dissertation we can see examples for the usage of variational methods in the machine
vision. Theses are related to these topics.
Thesis 1: Equations of cross-correlation based variational optical flow
and their application
1.1 I introduced to the variational optical flow field the normalized cross-correlation data
term for grayscale and color images. I derived the local integral’s Euler Lagrange, and applied
the result to the functionals using normalized cross-correlation.
1.2 For practical applications, I elaborated the approximate linearized numerical formula:
first, I determined the approximate analytical equations for small local integration window,
second I performed numerical linearization on the analytical formula given by the first step.
1.3 I developed a software component for the practical use and test the method. I
performed the tolerance tests for intensity change, and accuracy test according to the
literature’s requirements.
Thesis 2: Usage of local region based active contour, proposal for
Lagrangian, recommendation for further improvements
2.1 I introduced the local regions alongside curves for segmentation purposes, allowing
the statistical interpretation of image features for open as well as closed curves. I proposed a
Lagrangian for region separation in statistical sense.
2.2 I improved the basic model twofold. First, a higher order curve approximation
allowing the enhancement of the integration area size alongside the separation curve. Second I
defined the optimal size (shape) of the integration domain, which maximizes the degree of
separation, augmenting the method’s precision. I presented that this latter improvement is a
local variational problem in its own. I recommended Lagrangian for the improved model.
2.3 I derived the models’ Euler-Lagrange and Level Set equations. I developed a software
component for the practical use and test the method. The software was tested on practical
examples. The method was able to improve presegmentation results according to experts’
analysis.
10
Thesis 3: Deduction of quadratic transformation for planar mapping of
implicit surfaces with invariant (intrinsic) quantities
3.1 I deduced images’ local regions’ quadratic mapping for correspondence purposes. First
I prescribe linear mapping with invariant quantities: containing the projections’ gradients and
the observed surface’s normal unit. Second, I deduced the equations for quadratic mapping in
parametric form as well as in invariant form.
3.2 I gave practical methods/formulas for practical use: a construction, allowing the use of
the formulas in any environment (e.g. for finite elemet methods), and a specific, Level Set
compatible formula defined on fixed spatial grids. We tested the formula’s applicability for a
multiview variational 3D reconstruction.
3.3 I discussed the relationship between the quadratic mapping and the affine as well as
projective homography. The test results justified the usefulness of quadratic mapping
wherever the input data does not favor the homographies (surfaces with high curvatures
and/or sparsely textured models). The quadratic mapping allows enhancing the
correspondence’s accuracy, therefore the robustness for any methods based on images’ local
regions’ correspondences.
11
The author’s publications
[S1] Molnár József, Csetverikov Dmitrij: "Kereszt-korrelációs optikai áramlás variációs sémája: megvilágítás-változásra invariáns egyenletek", Proc. KÉPAF 2009: 7th Conference of Hungarian Association for Image Processing and Pattern Recognition, CD, Budapest, 2009.
[S2] J. Molnar and D. Chetverikov: "Illumination-robust variational optical flow based on
cross-Correlation", Proc. 33rd Workshop of the Austrian Association For Pattern Recognition, Stainz, Austria, 2009, pp.119-128.
[S3] S. Fazekas, D. Chetverikov, and J. Molnar: "An implicit non-linear numerical scheme
for illumination-robust variational optical flow", Proc. British Machine Vision Conference 2009.
[S4] J. Molnar, D. Csetverikov: "Másodfokú közelítés implicit felületek síkbeli
leképezésére", Proc. Fifth Hungarian Conference on Computer Graphics and Geometry, Budapest, pp. 118-124, 2010.
[S5] D. Chetverikov, J. Molnar: "An experimental study of image components and data
metrics for illumination-robust variational optical flow", Proc. International Conference on Pattern Recognition, Istanbul, pp. 1694-1697, 2010.
[S6] J. Molnar, D. Chetverikov, and S. Fazekas: "Illumination-robust variational optical
flow using cross-correlation", Computer Vision and Image Understanding, vol.114, pp.1104-1114, 2010.
[S7] J. Molnár, D. Chetverikov, D. Cabrera DeBuc, Wei Gao, and G.M. Somfai:
"Segmentation of rodent retinal OCT images", Proc. KÉPAF 2011: 8th Conference of Hungarian Association for Image Processing and Pattern Recognition, Szeged, 2011, pp.140-154.
[S8] J. Molnár and D. Chetverikov: "Multiview Reconstruction Using Refined Planar
Mapping of Implicit Surfaces", Proc. KÉPAF 2011: 8th Conference of Hungarian Association for Image Processing and Pattern Recognition, Szeged, 2011, pp.221-232.
[S10] J. Molnár, D. Chetverikov, D. Cabrera DeBuc, Wei Gao, and G.M. Somfai: "Layer
extraction in rodent retinal images acquired by Optical Coherence Tomography", Machine Vision and Applications. Accepted for publication. DOI: 10.1007/s00138-011-0343-y. 2011.
Under review:
[S9*] J. Molnár, D. Chetverikov: ”Quadratic Transformation for Planar Mapping of Implicit Surfaces”, Journal of Mathematical Imaging and Vision