Algorithms of Automatic Reconstruction of Neurons from the …asadeghi/pub/... · 2011. 12. 18. ·...

Diploma Thesis

Algorithms of Automatic Reconstruction

of Neurons from the Confocal Images

Amir Sadeghipour

October 2008

Berlin University of Technology

Faculty of Electrical Engineering and Computer Science

Neuronal Information Processing Department

Prof. Dr. rer. nat. Klaus Obermayer

Acknowledgements

This work was carried out by the Neuronal Information Processing Group under the supervision of Prof.

Obermayer, at the Technical University of Berlin. I would like to express my sincere gratitude to Prof.

Obermayer, who has given me the opportunity to work on this scientific project in order to fulfill the

requirements of my diploma thesis.

My special thanks go to Stephan Schmitt, who has always taken the time to give me invaluable advice and

useful suggestions. Even during his busiest times, he was so kind as to answer my questions thoroughly and

efficiently. His ideas encouraged me and pointed me in the right direction. Without his help, this work

would have been impossible.

My heartfelt appreciation and deepest thanks go to my girlfriend, Dorsa, not only for her moral support but

also for her proofreading and valuable advice.

I also wish to thank my parents for their unwavering support during the years of my study.

Zusammenfassung (Summary in German)

Die Morphologie der Nervenzellen (Neuronen) spielt eine entscheidende Rolle bei der

Informationsverarbeitung im gesamten Nervensystem. Daher kommt der dreidimensionalen Rekonstruktion

von Neuronen bei neurobiologischen Untersuchungen eine große Bedeutung zu. Die Vielzahl der

vorgeschlagenen halb- und vollautomatischen Rekonstruktionsmethoden zeigt den Bedarf an der

dreidimensionalen Modellierung von Neuronen. In der vorliegenden Diplomarbeit wurde versucht, mit

Hilfe von Algorithmen des maschinellen Lernens und des maschinellen Bildverstehens eine automatische

dreidimensionale Rekonstruktion der Neuronen zu entwickeln. Die Eingabe ist ein Bildstapel bestehend aus

mehreren Schnittbildern des Neurons, die mit Hilfe von konfokaler Mikroskopie aufgenommen wurden.

Der automatische Rekonstruktionsprozess, der aus mehreren, aufeinander folgenden Algorithmen besteht,

konstruiert ein dreidimensionales geometrisches Modell, das die in den Schnittbildern dargestellte

Neuronstruktur mit großer Präzision repräsentieren soll. Das Modell soll topologische und metrische

Eigenschaften der Neuronstruktur aufweisen, um weitere morphologische Analysen zu ermöglichen.

Die Unterschiedlichkeit der Neuronstrukturen und das Rauschen der mikroskopischen Bilder erfordern

einerseits Flexibilität des Modells und andererseits Robustheit gegen Rauschen. Da sich diese beiden

Anforderungen entgegenstehen, ist eine perfekte Rekonstruktion nicht möglich. Aus diesem Grund kann das

rekonstruierte Modell mit Hilfe der entwickelten halbautomatischen Anpassungsalgorithmen und der

implementierten grafischen Benutzerschnittstelle manuell korrigiert und erweitert werden.

In der vorliegenden Arbeit werden zunächst Grundbegriffe, Theorien der angewendeten Algorithmen und

verwendete Werkzeuge und Bibliotheken beschrieben. Nachdem das Konzept und die Ideen zu einzelnen

Schritten des Rekonstruktionsprozess vorgestellt wurden, wird die Umsetzung des Konzeptes mit Hilfe der

vorgestellten Algorithmen und Bibliotheken ausführlich beschrieben. Schließlich werden die Ergebnisse des

Rekonstruktionsprozesses durch Bilder veranschaulicht, der Einfluss von verschiedenen Parametern

analysiert, und die Stärken und Schwächen der Algorithmen erläutert.

Abstract

The morphology of nerve cells (neurons) plays a decisive role in information processing carried out by the

entire nervous system. Therefore, the three-dimensional reconstruction of neurons gains in importance for

neurobiological analysis. The large quantity of proposed fully- and semi-automated reconstruction methods

indicates the demand for the three-dimensional modeling of neurons. In this diploma thesis, we attempt to

develop a method to generate a fully automated three-dimensional reconstruction of neurons with the aid of

algorithms of machine learning and computer vision. The input for the algorithms is image stacks of the

neurons, consisting of several slices, which are recorded by means of confocal microscopy. The automatic

reconstruction process, which consists of several successive algorithms, constructs a 3D geometrical model,

that will ultimately represent the displayed neuron’s structure in the image slices, with high accuracy. The

model should possess the topological and metric properties of the neuron structure so as to enable further

morphological analysis.

The variety of neuron structures and the noise of microscopy images require, on the one hand, flexible

models, and on the other hand, models that exhibit robustness against noise. Due to these conflicting

requirements, a perfect reconstruction is impossible. Hence, the automatic reconstructed model can be

corrected and enhanced through the use of developed semi-automated fitting algorithms and an

implemented graphical user interface.

In this study, basic principles, theories of applied algorithms, and used tools and libraries are introduced.

Furthermore, after introducing the central concept and ideas for each step of the reconstruction process, the

implementation of each concept, with the aid of introduced algorithms and libraries, is described in detail.

Finally, the results of the reconstruction process are illustrated, the effects of different parameters are

analyzed, and the advantages and limitations of relevant algorithms are discussed.

Contents

1 INTRODUCTION..................................................................................................13

2 BASICS................................................................................................................14

2.1 Neuron Morphology .............................................................................................................................14

2.2 Confocal Microscopy Image Stacks ....................................................................................................16

2.3 3D Model of Neuron Structures ..........................................................................................................18

2.4 Image Processing ..................................................................................................................................20

2.5 Algorithms .............................................................................................................................................23

2.5.1 Support Vector Machine .................................................................................................................23

2.5.1.1 Fitting Problems..............................................................................................................................23

2.5.1.2 Linearly Separable ..........................................................................................................................25

2.5.1.3 Nonlinearly Separable ....................................................................................................................26

2.5.1.4 Soft Margin .....................................................................................................................................27

2.5.1.5 Regression.......................................................................................................................................28

2.5.2 Clustering........................................................................................................................................28

2.5.2.1 Clustering Techniques.....................................................................................................................29

2.5.2.2 Mean Shift Clustering.....................................................................................................................29

2.5.3 Image Registration..........................................................................................................................32

2.6 Tools and Libraries...............................................................................................................................35

2.6.1 Amira ..............................................................................................................................................35

2.6.2 ITK..................................................................................................................................................36

2.6.3 LIBSVM .........................................................................................................................................37

3 METHODOLOGY.................................................................................................37

3.1 Concept ..................................................................................................................................................37

3.1.1 Finding the Representative Spheres................................................................................................38

3.1.2 Grouping the Spheres .....................................................................................................................39

3.1.3 Connections ....................................................................................................................................40

3.1.4 Corrections......................................................................................................................................41

3.1.5 Data Structure .................................................................................................................................42

3.2 Implementation .....................................................................................................................................43

3.2.1 Representative Sphere Extraction with the Aid of SVM ................................................................44

3.2.1.1 Toy Data..........................................................................................................................................46

3.2.1.2 Image Scaling..................................................................................................................................48

3.2.1.3 Parameter Selection ........................................................................................................................52

3.2.2 Grouping the Representative Spheres Using Mean Shift Clustering..............................................52

3.2.2.1 Significant Features ........................................................................................................................53

3.2.2.2 Parameters.......................................................................................................................................54

3.2.2.3 Spatial Distance between Point and Line Nodes ............................................................................56

3.2.3 Connecting and Enhancements .......................................................................................................57

3.2.3.1 Line Node Connections...................................................................................................................57

3.2.3.2 Line Node Enhancements ...............................................................................................................59

3.2.3.3 Finding Connection Nodes .............................................................................................................61

3.2.3.4 Summary .........................................................................................................................................62

3.2.4 Fine Adjustments with the Aid of Registration...............................................................................63

3.2.4.1 Registration’s Framework...............................................................................................................64

3.2.4.1.1 Interpolator .......................................................................................................................64

3.2.4.1.2 Metric ...............................................................................................................................66

3.2.4.1.3 Optimizer ..........................................................................................................................67

3.2.4.1.4 Transform .........................................................................................................................68

3.2.4.2 Smoothness .....................................................................................................................................69

3.2.4.3 Summary .........................................................................................................................................70

3.2.5 User Interactions .............................................................................................................................70

4 RESULTS............................................................................................................ 72

4.1 Classification and Regression Results of SVM...................................................................................72

4.1.1 Classification Errors........................................................................................................................73

4.1.2 Regression Errors............................................................................................................................75

4.1.3 Noise vs. Errors...............................................................................................................................78

4.1.4 Results of Toy and Real Data..........................................................................................................80

4.2 Grouping Results of Mean Shift Clustering .......................................................................................83

4.2.1 Toy data...........................................................................................................................................83

4.2.2 Real data..........................................................................................................................................86

4.3 Results of Line Node Enhancements and Connection Nodes ...........................................................87

4.3.1 Toy data...........................................................................................................................................88

4.3.2 Real data..........................................................................................................................................89

4.4 Results of Registration .........................................................................................................................90

4.4.1 Smoothness Parameters ..................................................................................................................91

4.4.2 Quantitative Performance ...............................................................................................................92

4.4.3 Real data .........................................................................................................................................93

5 CONCLUSION.....................................................................................................95

6 REFERENCES.....................................................................................................97

13

1 Introduction

The nervous system of almost all animals consists of nerve cells, known as neurons. Different information

propagation and processing occurs as results of signal processing within and among neurons. How neuronal

signals are processed depends, among other factors, on the neuron morphology. Thus, an accurate structural

analysis of neurons helps neurobiologists to better understand and analyze signal processing. In this context, full

reconstruction of neuron morphology (specially the neurites) is of fundamental interest for the analysis and the

understanding of neurons’ functional characteristics. There are many suggested algorithms for three-dimensional

reconstruction of neurons from fluorescence confocal microscopy data. The required input for these algorithms is

a set of successive two-dimensional image slices that have been acquired using a confocal scanning laser

microscope. The output is a three dimensional geometrical model that is a representation of the neuronal

topology and is spatially aligned with the input image stack. A variety of topological, metric and statistical

analyses can be carried out using this reconstructed 3D model. For instance, features like volume, diameter,

length and position of the neurons’ dendrite branches can be calculated from the model.

Because of the complex morphology of neuronal cells, reconstruction of a single neuron with the aid of semi-

automatic systems is a time consuming task. Furthermore, such methods are still subject to user-error and are not

objectively quantifiable. In addition, due to the large variety of existing neuron types and characteristics, the goal

of automatic 3D-reconstruciton of neurons is hard to achieve and no general solution can be given. The adaptive

approaches outlined in this thesis are developed with the goal of enabling fully automated reconstructions of

neuronal morphologies from confocal microscopy data.

The algorithms for the reconstruction of neurons that have been proposed previously use different approaches.

On the one hand, Voxel-based methods determine the voxels representing the neuron’s skeleton (centerline

model) or surface, using segmentation algorithms (such as watershed and threshold) which consider voxels’ gray

values (see [ZZL08], [SZM07],[HHC03], [KLZ02], [DeC02], [ESS05], [HFF05], [BST05] and [DSO02]). The

most challenging and critical step of reconstruction using these methods is segmentation. Thus, these methods

are primarily well suited for images with a minimum of noise or other acquisition artifacts. On the other hand,

the model-based methods assume a geometrical model (usually tubular or cylindrical) for the morphology of

neurons, based on prior knowledge of their shapes (see [ALS02], [REK03], [WHW04], [WRE05], [BSR04] and

[SRS01]). The limitation of these approaches is that, despite the highly irregular shapes of neurons structures,

they assume a uniform model for all shape of neurites. The use of a tracing algorithm is also an important step of

some of the proposed reconstruction methods. These methods follow the determined voxels, or segments,

successively; starting at the root and eventually generating a skeleton of the model (see [ART02], [ALS02],

[CST99] and [SRS01]). The limitation of tracing algorithms is that, they are not robust against the degree of

noise present in an image. This is because noise makes it increasingly difficult to follow a disconnected structure.

Moreover, in contrast to model-based methods, tracing cannot involve both local and global characterization

while modeling neurites.

In this study, we propose algorithms for automatic three-dimensional reconstruction of neurons that are imaged

by fluorescence confocal microscopy. User interaction, by means of semi-automatic algorithms, is also provided

for manual corrections and enhancements after performing the automatic reconstruction process. This helps the

user to ensure a high quality reconstructed model.

14

The whole reconstruction process is divided into several steps that are invoked successively. Every step uses the

output of the previously performed algorithm as input. In this way, the task of each algorithm is clearly defined

and mainly independent from other steps. Consequently, individual tasks can be modified and upgraded without

affecting other algorithms. Furthermore, by recognizing the limitations of each algorithm, we are able to improve

results significantly. This is accomplished in the following manner. The next invoked algorithm is developed so

that the weaknesses of previous algorithms are recovered and their errors are eliminated.

We propose a model-based approach for the automatic reconstruction of neurons based on the use of a tubular

shape as the model assumption. The model is constructed by composing the partially recognized parts of

neurons’ structures through successive processing steps. This process leads to a systematically and hierarchical

reconstruction. Firstly, the voxels of the image, which represent the neuron’s structure, are recognized by the

support vector machine algorithm. This initiates the reconstruction process. Using the support vector machine as

a machine learning algorithm guarantees the adaptability of this initial step. The fact that the parameters of the

algorithms can be adapted to the given image, depending on the represented neuron structure or image’s noise,

allows for this adaptability. Secondly, the resulting representative voxels are grouped together with the aid of a

two-stepped mean shift clustering algorithm. For each of the two steps the output clusters compose the

previously recognized parts into more complex components. Thirdly, the clustered segments of the neurites are

connected together in order to accomplish the model of the skeleton of the neuron. In all aforementioned

recognition and composing algorithms, different geometrical and topological model assumptions of the neurites’

tubular shape are taken into account in varying degrees. Finally, in the last step, a registration algorithm, which

is regularized by a smoothness constraint, fits the reconstructed model onto the given image with high accuracy.

Thus, the resulting reconstructed model consists of many connected, fitted spherical components, that can also

represent, to a large extent, the irregularities of the neurites’ structures.

2 Basics

2.1 Neuron Morphology

A neuron is a responsive cell nervous system of almost all animals. Neurons are responsible for transforming

information through chemically generated signals in the central and peripheral nerve systems. The neuron was

first recognized in 1899 through the work of Cajal1, a Spanish anatomist. There are different types and sizes of

neurons that are classified by their morphology and functional characteristics. Figure 2-1 shows several different

types of neurons. These representations of neurons were taken from Cajal’s drawings (1894-1904).

1 Santiago Ramón y Cajal (1852 -1934) was a Spanish histologist, physician, and Nobel laureate. He is

considered to be one of the founders of neuroscience [Source: Wikipedia, Santiago Ramón y Cajal,

http://en.wikipedia.org/wiki/Santiago_Ram%C3%B3n_y_Cajal (as of Sep. 22, 2008, 12:07 GMT)].

15

Figure 2-1: Different neuronal cell types.

Generally, all types of mammals’ neurons consist of three main parts:

1. Soma is the cell body and has a spherical shape with a diameter varying from 4!m to 100!m. The main

metabolism2 process happens in soma.

2. Axon is a thin long extension of the neuron that propagates the nerve signals away from soma. Many

neurons have only one axon, which can branch out.

3. Dendrite is a branched projection of neuron that propagates the nerve signals from other neurons

towards soma.

Figure 2-2: The structure of a typical neuron [Source: Wikipedia, Neuron, http://en.wikipedia.org/wiki/Neuron,

(as of Sep. 20, 2008, 10:52 GMT)].

2 Metabolism is the set of chemical reactions that occur in living organisms in order to maintain life [See

Wikipedia, Metabolism, http://en.wikipedia.org/wiki/Metabolism (as of Sep. 18, 2008, 14:01 GMT)].

Dendrite

Soma

Nucleus

Axon terminal

Schwann cell

Node of

Ranvier

Myelin sheath

16

The axons and dendrites compose the corpus of a neuron. Together, they are both referred to as the generic term

neurite.

2.2 Confocal Microscopy Image Stacks

Using an ordinary light microscope requires that the tissue be cut into slices. This creates a disadvantage when

conducting neuroanatomical experiments. Furthermore, scans using ordinary light microscopes are very noisy

and have artifacts due to back-scattered light coming from objects lying outside the focused plane. Confocal

microscopy is an accurate optical imaging technique that can three dimensionally scan a stained specimen with

high resolution. In this technique back-scattered light is gathered confocally by using a spatial pinhole in an

optically conjugate plane in front of a detector. Through this process most of the photons coming from out of

focus planes are filtered and only the light within the focal plane can be detected. Thus, the quality of images

acquired by confocal microscopy is much better than images produced by using a conventional fluorescence

microscope. Confocal imaging is a raster scanning technique that illuminates only one point at a time and scans

over a regular 3D raster in order to record a 3D image. After scanning a two-dimensional layer of the specimen,

the laser beam focuses one layer deeper to scan the next 2D layer. In this way, the confocal microscope

constructs a 3D image of the specimen, which is referred to as an image stack. The accuracy of the acquired

image depends on the raster scan, i.e. the spatial steps of changing focuses. The shorter the steps are, the bigger

the resolution of the recorded image, represented with gray values of its voxels3, is. Each voxel of the image

represents a focused point in the scanned specimen. The steps of confocal focusing of three dimensional objects

are generally saved in such images, and are referred to as spacing of the image. The steps can be different along

x-, y- and z-axes. Thus, the spacing is indicated as a 3D vector. However, the spacing of recorded images, using

confocal microscopy along the z-axis, is usually bigger than x- and y-axis. This affect is called z-axis strain.

Figure 2-3: The scheme of a confocal microscope (Source: [Paw95]).

3 A voxel is a three dimensional pixel of a 3D image.

17

The voxels in the image coordinate system can be easily transformed into the real world coordinate system by

multiplying each voxel grid position by the spacing value between voxels in each direction.

Different images of neurons, recorded with the aid of confocal microscopy, are illustrated in the following way.

Images of neurons represent the structure of the neuron by using a bright gray color. Whereas the background

voxels are darker and, except for noisy voxels, are usually black in color. The voxel colors are saved by using a

number with a different range of values for each image. The bigger the voxel value is, the brighter is its gray

color (see Figure 2-4).

74th layer

59th layer

50th layer

Figure 2-4: The 3D confocal image of a neuron, with relative little noise. The bigger image illustrates the

projection onto the three spatial planes and the small images show three different 2D layers of the image.

Spacing of the image is 0.39 along x- and y-axis; and 0.75 along z-axis. The resolution of the image is

385"271"101 voxels (i.e. 101 slices along z-axis) with the gray values from the interval [1, 255]. The image is

provided by courtesy of Jan Felix Evers.

18

Figure 2-5: Sample layers of different noisy images of various neuron types acquired through confocal

microscopy. The images are provided by courtesy of Daniel Eicke (left-top), Ulrich Bartsch (right-top and left-

buttom) and Annette Schenck (right-buttom).

2.3 3D Model of Neuron Structures

The goal of this study is to create an accurate three-dimensional model of a neuron from a confocal image stack.

The resulting 3D model will possess the neuron’s major properties that are commonly used by neurobiologists

for different analyzing purposes in different types of analyses. Hence, in order to design an accurate model, it is

necessary to consider both the neurobiological requirements and the morphological attributes of the neuron.

Since a neuron mainly consists of dendrites and axons, the essential task in creating a viable model is to produce

an exact reconstruction of the neurites. Neurites have tubular shapes with different radii located at various

positions along their structure. Therefore, we try to represent them through varied spheres, with diverse radii,

connected in a sequence. Depending on the density (the number of spheres per spatial distance unit) and the radii

of the spheres, we are able to represent any neurite with good approximation. A well-constructed model of a

neurite consists of connected spheres locating on the middle axis of the tubular neurite structures with matching

radii. The graph formed by the connections between the spheres is called the skeleton of the neurite. Optimally,

the skeleton should be located in the exact middle of the structure. Since neurites have several branching points,

each sphere can share many connections with its neighbors. In this study, a segment indicates a part of the model

consisting of several spheres, which are sequentially connected and do not include the neurite’s branching out

connections (Figure 2-6).

19

(a)

(b)

(c)

(d)

Figure 2-6: (a), (b) and (c): A model segment with different densities and as result different accuracy. (d) The

skeleton of the segment, illustrated in (c).

Neuron Models are suitable for different types of analyses and for information extraction, such as statistical

reports and 3D visualization and exploration. The position and radius of the model at each point can be easily

changed by altering the radius of the corresponding sphere with the aid of user interaction. By changing the

density of the model, the accuracy of the model can be improved (Figure 2-7).

Figure 2-7: High density and accurate model of a part of a neruon with variable radii along the neurites’

strcutures.

20

The image of a neuron displays not only the neurites, but sometimes also shows the soma. The soma is easily

modeled by a big sphere (Figure 2-8).

Figure 2-8: Model of a neuron structure including the soma from different points of view.

2.4 Image Processing

The neuron images used in this study are recorded through confocal microscopy. As described in the previous

section, they are three dimensional black-and-white image stacks. In these image stacks the gray color of each

voxel is represented by a number, called its intensity value (or just intensity). The range and type of the intensity

values are variable from one image to another. Usually, the smallest value is indicated by a white color and the

largest intensity value is represented by a black color.

Each 3D image can be considered as a 3D matrix, where each component of the matrix represents the intensity

value of a voxel. In this context, the image is a function I(x,y,z) that maps voxel coordinates onto their intensity

values. The first-order partial derivative of this function is called gradient of the image. The gradient image can

also be represented as a matrix with the components given by the derivatives towards three axes: x, y and z. This

means that each component of the gradient image is a vector, called gradient vector, which points towards the

largest intensity changes around the corresponding voxel. Thus, the length of each gradient vector corresponds to

21

the rate of change in that direction. The gradient of the 3D image I(x,y,z) with respect to the spatial vector (x,y,z)

is defined as:

. (Equation 2-1)

As illustrated in Figure 2-9, the longest gradient vectors belong to voxels located along the neurite’s borders. It

is precisely along these borders where the intensity values change the most. Since the intensity values in

background voxels do not change in regard to the neighboring voxels, the gradients are represented by null

vectors. Furthermore, because the neighbor voxels have similar intensities, the voxels located inside the neurite

structure are also small vectors. Hence, the gradient image provides necessary information in order to determine

the boarders of the neurite’s structure.

(a)

(b)

(c)

Figure 2-9: (a) A layer of a 3D image of a neurite structure. (b) The gradient vectors of the image (resolution

1:50). (c) The image of the 2D gradient values, where each pixel represents the sum of the gradients along both

x- and y-directions.

In this work, however, we aim to determine voxels located in the middle of the neurite’s tubular structures.

Therefore, we compute the second-order partial derivative of the image (intensity function), called the Hessian

matrix. The Hessian matrix of an 3D image I(x,y,z) with respect to its spatial components x,y and z is symmetric

and is defined as:

22

. (Equation 2-2)

As the first-order derivative (gradient) of each voxel points towards the largest intensity changes, regarding its

neighbors, the second-order derivative (Hessian) indicates the curvature of the neighboring voxels’ intensities.

Figure 2-10 applies this phenomenon to a cylindrical shape. The first-order derivative curve reaches its

significant values at the point of the biggest intensity increase or decrease. This occurs on the borders of a

cylindrical structure. The second-order derivative has its most significant values in the middle of the structure,

where the intensity values have increased curvature.

(a) (b)

Figure 2-10: (a) 2D image illustrating the intensity values of a cylindrical structure. (b) The intensity values of

the pixels, which are located along the red line in the image (a), are presented with the blue curve. The red curve

shows the first-order derivative and the green curve is the second-order derivative of the blue intensity function.

With the aid of eigenanalysis4 of the Hessian matrix, the behavior of the intensity curvature can be analyzed.

Eigenanalysis is applied widely as mathematical model in order to delineate different physical characteristics. In

this study, we compute the eigenvectors5 and eigenvalues

6 of the neuron image’s Hessian matrix due to their

considerable geometric meanings in tubular (cylindrical) structures such as neurites. Since the aforementioned

Hessian matrix is a 3"3 quadratic and symmetric matrix, it can be decomposed and specified by its three

eigenvectors and corresponding eigenvalues. The eigenvector corresponding to the smallest eigenvalue points

towards the lowest intensity curvature, i.e. it points toward the run of the neurite (its direction), where hardly any

change at all in intensity values is perceptible. In contrast, the eigenvectors corresponding to the largest

4 Eigenanalysis indicate the process of computing the eigenvectors and eigenvalues (see footnotes 2 and 3) of a

matrix. 5 The eigenvectors of a matrix are nonzero vectors which, when multiplied with the matrix, changes in length,

but not direction. 6 For each eigenvector of a matrix, there is a corresponding scalar value called an eigenvalue for that vector,

which determines the amount the eigenvector is scaled by the multiplication with the matrix.

23

eigenvalues are aligned orthogonal to the neurite, across the neurite’s border, where the intensity values change

noticeably and reach the background dark gray values (see Figure 2-11).

Figure 2-11: Three Hessian’s eigenvectors of a voxel located exactly in the middle of a spatial cylinder. The

corresponding eigenvalues are e1, e2 and e3; where ).

Each of these eigenvectors has a corresponding eigenvalue, as a scalar real value, however, with the same

characteristics as the eigenvectors in terms of the neighboring voxels’ intensities. In other words, the eigenvalues

indicate also the curvature’s significance in regard to the neighboring voxels.

As we will see, the eigenanalysis of a neuron’s image provides significant and useful information when

determining the neuron’s structure. These values are used by different algorithms in this study (see sections 3.1.1

and 3.2.2.1).

2.5 Algorithms

2.5.1 Support Vector Machine

Support vector machine, in short SVM, is a classification and regression algorithm in the field of statistical

learning theory7 . This field began with Vapnik and Chervonenkis (1974) in [VpC74]. After further

developments, the SVMs can be said to have started by Vapnik in [Vpn79] in 1979. The algorithm had been

developed and extended until the nineties when it reached the current form [BGV92] and included the case of

regression [Vpn95].

2.5.1.1 Fitting Problems

The goal of training a classifier is to construct a separating hyperplane in the feature space in order to classify

input data points. The training process is a successive procedure that minimizes a predefined classification error

(also called risk), e.g. mean square error, after training with each data point. This process is referred to as

empirical risk minimization (ERM). The performance of a trained classifier can be tested by predicting the

7 The goal of statistical learning theory is to provide a statistical framework that studies how algorithms can learn

from data. This concept is a part of Vapnik-Chervonenkis theory (also known as VC theory).

24

class of new data points, which were not included in the training set. In other words, it should be measured how

well the classifier can be generalized, or applied, to other similar data sets. The corresponding error is called

generalization risk. Since too small empirical risk can lead to a big generalization error, the main challenge of

training and test processes is balancing the empirical and generalization risk minimization in regard to

classification and regression problems. The balancing problem is illustrated in Figure 2-12. Given some noisy

training data in feature space, one can construct a hyperplane that solves the classification problem perfectly with

either no error or a very small one. In this case, however, the hyperplane is fitted too much to the training data

and leads to a poor performance with testing data. This problem is referred to as overfitting. In contrast, if the

training data set does not contain enough data points, or the hyperplane does not fit all training data, we have the

problem of underfitting.

(a)

(b)

(c)

Figure 2-12: (a) Underfitting and (b) overfitting classification problems, comparing to (c) a sample of relatively

good classified case.

All statistical learning algorithms attempt to classify the data points with the aid of a class of functions (also

called set of models) such as polynomials of degree n, neural networks having n hidden layer neurons, a set of

splines with n nodes or n radial basis functions. Each of these models, also depending on n, has a specific

complexity. For instance, by increasing the number of hidden units of neural networks their complexity

increases as well. As the complexity increases, the training errors usually decrease, but the risk of overfitting the

data correspondingly increases as well [STC99] (see Figure 2-13).

The method of structural risk minimization8 (SRM) provides a trade-off between the complexity of the model

and the quality of fitting the training data (empirical error) [VpC74]. The complexity of a model is generally

given by the number of free parameters or VC dimension9. The SRM uses a set of models ordered in terms of

their complexities (VC dimension). Model selection by SRM corresponds to finding the model, which is

simplest in terms of order and best in terms of empirical error on the test data. In this way, the generalization risk

is minimized structurally.

8 The structural risk minimization principle was first set out in [VpC74] by Vapnik and Chervonenkis. The SRM

has been originally applied for classification, but it is also applicable to any learning problem as an optimization

problem. 9 VC dimension (stand for Vapnik-Chervonenkis dimension) is a measure of the capacity of a statistical

classification algorithm. The VC dimension of a classifier is the maximum number of data points (considering all

possible class labeling) in a feature space that can be classified by the algorithm.

25

Figure 2-13: Scheme for the prediction error as a function of the model complexity. The bounds indicate the sum

of empirical risk and the model’s capacity that is proportional to VC dimension.

2.5.1.2 Linearly Separable

SVMs are based on the principle of structural risk minimization. They were originally designed for a linearly

separable binary case of classification with a margin [VpL63]. In this context, the margin is the minimal distance

between the separating hyperplane and the closest data points of two classes. SVM looks for an optimal

separating hyperplane with the largest margin. Considering l labeled training data {xi, yi}, i={1,…, l}, yi {-1,1}

and xi Rd (d-dimensional feature space), the points x which lie on the hyperplane satisfy the equation x.w + b =

0, where w is normal to the hyperplane (see Figure 2-14).

Figure 2-14: Separating hyperplane with margin in a linearly classifiable case (Source: [STC99], page 9, Figure

5).

The margin can be defined as two parallel hyperplanes, H1 and H2, on both sides of main separating hyperplane,

that are as far apart as possible while still separating data of both classes. These hyperplanes are defined as x.w +

b = ±1. Hence, all training data should satisfy the following constraints for a correct classification:

Empirical risk

Model’s capacity

Bounds

Model complexity

Prediction error Underfitting

Optimum

Overfitting

26

. (Equation 2-3)

The distance between H1 and H2 (i.e. the margin thickness) is equal to . Thus, the margin can be maximized

by minimizing which is proportional to decreasing the VC dimension [Vpn95]. Thus, we have an

optimization problem with a side condition:

. (Equation 2-4)

With the aid of Lagrange formulation of this problem, as defined in (Equation 2-5), the hyperplane can be

specified.

(Equation 2-5)

where are the Lagrange multipliers and p is the number of data points.

As previously mentioned, the separating hyperplane with a margin can be specified with the aid of two parallel

hyperplanes H1 and H2, which define the limits of the margin. These hyperplanes can be specified with the aid of

data points located on them, i.e. closest to the separating hyperplane. These data points (i.e. vectors in feature

space) are called support vectors. The SVM classifies the data points as following, with the aid of support

vectors and calculated ! and b as result of a solved optimization problem (Equation 2-4).

(Equation 2-6)

where SV is the set of support vectors.

2.5.1.3 Nonlinearly Separable

The linear SVM can be also extended to a nonlinear one. If the feature space is not linearly separable, all data

points can be transformed into a higher dimensional space to make them linearly separable (see Figure 2-15).

(a)

(b)

Figure 2-15: (a) Nonlinearly separable classification problem and (b) the same case after transforming into a

higher dimensional space that makes the problem linearly separable.

27

However, transforming data points into a new space and computing the dot product of them (Equation 2-6) can

be complex or, in some cases, impossible. Thus, for problems that cannot be linearly separated in the feature

space, the kernel trick, which was applied for the first time in [BGV92] and originally proposed by Aizerman et

al. in [ABR64], is used. A nonlinear problem can be mapped from its feature space to a new corresponding

higher-dimensional space by doing nonlinear transformation. The problem can then be solved by using a linear

model in the new feature space. This method is used in both classification and regression problems and known as

the kernel trick. The kernel trick uses Mercer's theorem10, which states that any continuous, symmetric, positive

semi-definite weighting function11 K(x, y), can be replaced with a dot product in a higher-dimensional space. In

the aforementioned case of linear classification (Equation 2-6), the dot product of the input data x Rd is used to

determine the class. In the case of nonlinear classification, each data point is transformed into a higher-

dimensional space, using the mapping function . According to the kernel trick, we have:

. (Equation 2-7)

Consequently, the dot product in (Equation 2-6) can be replaced with a kernel function K(.,.) instead of

calculating the dot product of transformed data points. In this way, no transformation is necessary and,

consequently, the decision function is defined as:

. (Equation 2-8)

There are different kernel functions to use in this case: linear, polynomial and RBF kernel. The latter is used in

this study and is defined as follows:

. (Equation 2-9)

2.5.1.4 Soft Margin

The aforementioned case of classification using hyperplane with margin (Equation 2-3) cannot tolerate noises in

the feature space. However, in almost all models of real system, the data points are noisy and the problem is not

separable (as well the case illustrated in Figure 2-12). In [CVp95] an idea is suggested to modify the margin so

that the classifier allows for mislabeled data points. The so-called soft margin uses a new variable "i, which

measures the degree of each misclassified data point i and attempts to separate them as cleanly as possible. In

this case, the optimization problem has the following form:

(Equation 2-10)

10 Mercer's theorem was presented by James Mercer in 1909. 11 Weighting function in this context is also called kernel function, which is used e.g. for non-parametric

estimation techniques by kernel density estimators.

28

This algorithm, also called C-SVC, is used in this study for the classification problem. The affect of parameter C

on the classification results is discussed in section 3.2.1.3.

2.5.1.5 Regression

The regression problem using SVM is solved in a similar fashion to the case of classification. In 1995, after

introducing the soft margin by Cortes and Vapnik in [CVp95], the algorithm was extended to the case of

regression by Vapnik in [Vpn95]. The regression version of SVM (also called SVR) maintains the main feature

of classification that characterizes the maximal margin algorithm. A nonlinear function is learned by a linear

learning machine using the kernel trick similar to SVC.

An extended version of SVR, referred to as #-SVR, was introduced in [SWB00]. It uses the soft margin with the

parameter # for controlling the tolerance of the regression algorithm against the noise. The optimization problem

of #-SVR is defined as follows:

(Equation 2-11)

where

# specifies the thickness of the regression margin, where the cost (or error) is equal to zero,

" and "* are the soft margin variables defined for two sides of the estimating function.

#-SVR is used in this study for the regression problem discussed in section 3.2.1.

2.5.2 Clustering

The human brain can easily group similar objects by finding regularities between them. This grouping process is

called clustering. Clustering is used in many fields such as data mining, machine learning, image analysis and

pattern recognition. From the machine learning perspective, clustering is an unsupervised learning algorithm that

does not need any prepared examples to learn how to group the input data. It divides a data set into subsets,

called clusters. These data subsets, or clusters, consist of data that are both similar to one another and dissimilar

to data points in other clusters. Each cluster can be represented by a data point, called a prototype (usually the

weighted average of points within a cluster). Representing several data points by a single prototype can also be

used as an approach to data compression in information theory. In data compression, by representing a group of

data through a single prototype, some fine details are lost. However, simplification is achieved and,

consequently, fewer bits are needed in order to save the information. A second type of clustering application is

achieved, when the type of a given data (cluster) is predicted based upon its properties. This type of clustering is

also used in market research in order to partition consumers into market clusters. Consequently, the relationships

between different groups of consumers can be understood better. Generally, in all clustering applications, the

primary goal of the algorithm is to determine similarities between data.

29

2.5.2.1 Clustering Techniques

The general concept of clustering is grouping N data points in an I-dimensional feature space into K clusters with

the aid of a distance measure. Distance measure is a function, defined in the feature space, that computes

similarities between data points based on their feature values. For example, one of the most common distance

functions used in clustering algorithms is the Euclidean distance. The Euclidean distance in a n-dimensional

space between points a=(a1, a2, …, an) and b=(b1, b2, …, bn) is defined as:

. (Equation 2-12)

Partitional clustering, also used in this study, is an optimization algorithm that reassigns N data points between K

clusters iteratively, in order to find the best assignments.

To start the clustering process, each of the K clusters is parameterized by a prototype m(k), i.e. as a random vector

in the feature space. A partitional clustering algorithm consists of two iterative steps:

I. Assignment: The n-th data point x(n) is assigned to a prototype m(k) by means of a distance measure.

II. Update: The prototypes are updated in respect to the data points, they are representing.

The algorithm iterates, until the prototypes’ movement in the update step does not exceed a predefined threshold.

A K-mean clustering12

, performed on some data points in a 2D feature space, is illustrated in Figure 2-16.

Figure 2-16: The result of a K-mean clustering process (left to right) applied to a data set of 40 data points with

K = 4 prototypes (Source: [McK03], page 287, figure 20.4).

2.5.2.2 Mean Shift Clustering

One of the disadvantages of many clustering algorithms is that the number of clusters (k) is required to be set as

a parameter during the initializing step. However, in some applications, it is not possible to determine how many

clusters are needed. Therefore, it is necessary to use additional methods in order to set this parameter. Some non-

parametric clustering methods do not rely on embedded assumptions. Another restriction encountered in some

clustering algorithms is that they implicitly assume that there is an identical shape for all clusters. For example,

in the case of K-mean clustering, the update step implies that the clustering process leads to a Voronoi

diagram13 and that each cluster is one of the Voronoi cells

14. This is very restrictive (see [Est96]). For our study,

12 K-mean clustering is a type of partitional clustering algorithm, that assigns a data point to the cluster, whose

center (the arithmetic mean overall its data points) is the nearest. 13 A Voronoi diagram in mathematics is a special kind of decomposition of a metric space, determined by

distances, to a specified discrete set of points in the space [Source: Wikipedia, Voronoi diagram,

http://en.wikipedia.org/wiki/Voronoi_diagram (as of Sep. 22, 2008, 12:08 GMT)].

30

we use a density estimation-based non-parametric clustering algorithm which does not require the

aforementioned prior knowledge. Thus, the feature space is regarded as an unknown probability density

function (p.d.f.) and the prototypes imply the modes15 of the unknown density. Estimation-based parametric

clustering algorithms assume that the shapes of the p.d.f. are known and that only the parameters have to be

estimated. In order to avoid such assumptions by non-parametric algorithms, some additional methods to try to

estimate the p.d.f, such as the Parzen window technique in, known as the kernel density estimator (see

[TPo89]), or scale-space analysis in [WSp90], are used.

In this study, we use the mean shift clustering algorithm proposed by D. Comaniciu and P. Meer in [CoM02]

that uses the mean shift procedure to detect the modes. Mean shift is a non-parametric feature-space analysis

technique, proposed in 1975 by Fukunaga and Hostetler in [FuH75]. It is based on the kernel density estimator,

which is a non-parametric approach to estimate a continuous p.d.f. with the aid of several samples of the

population. Each sample of the population is weighted by a function, called kernel. By superposing the kernels’

outputs, the density is approximated. An example is illustrated in Figure 2-17.

Figure 2-17: Probability density function estimation (blue) by means of Gaussian-kernels (red dashed lines) with

the aid of six samples [Source: Wikipedia, Kernel density estimation, http://en.wikipedia.org/wiki/

Kernel_density_estimation (as of Sep. 22, 2008, 12:03 GMT) ].

Given n data points xi , i=1,2,…,n as IDD16 samples from a d-dimensional space, the kernel density estimation,

obtained with kernel k(.), is defined as:

. (Equation 2-13)

Where h is the bandwidth parameter, which affects the approximation quality, too small a bandwidth results in

spiky density estimates. Values that are too large lead to over-smoothed estimations.

The most common used kernel functions are Gaussian, Couchy or Picard. We use a radially symmetric kernel, in

order to satisfy the following profile:

. (Equation 2-14)

Where ck,d is a normalization constant, which makes K(x) integrates to one, we use the multivariate normal

kernel, which already satisfies the profile and is defined as follows:

14 The group of all points closer to a specified point from a set than to any other ones in the set is the interior of

a convex polytope called a Voronoi cell [Source: Wikipedia, Voronoi diagram,

http://en.wikipedia.org/wiki/Voronoi_diagram (as of Sep. 22, 2008, 12:08 GMT)]. 15 Mode is the value that occurs most frequently in a data set or in a probability distribution [Source: Wikipedia,

Mode (statistics), http://en.wikipedia.org/wiki/Mode_(statistics) (as of Sep. 22, 2008, 12:10 GMT)]. 16 Independent and identically-distributed random variables

31

. (Equation 2-15)

The next step is to find the modes of density estimation (Equation 2-13). The modes of a p.d.f. are the local

maxima of the distribution. In other words, they are located among the zeros of the gradient . We

define the function and calculate the gradient of the density estimator as follows:

. (Equation 2-16)

The first term is proportional to a density estimated at x calculated with kernel , which

satisfies the aforementioned profile (Equation 2-14). The second term is the mean shift:

. (Equation 2-17)

The mean shift vector pointed towards a weighted mean, using the kernel G. x is the center of the kernel

(window) and is updated by the mean shift vector.

Since the mean shift vector always points towards the direction of the maximum increase in the density, it leads

to a local maximum of the kernel density estimation, i.e. its mode. The whole procedure can be defined by the

following iterative steps:

1. Computation of the mean shift vector mh,G(x).

2. Translation of the kernel (window) G(x) by mh,G(x).

Figure 2-18: Example of a 2D feature space analysis, showing the mean shift procedure (Source: [CoM02], Page

7, Fig. 2).

32

The Figure 2-18 illustrates a 2-dimensional feature space with the normalized density as a third dimension. The

black lines show the path of a mean shift iterative procedure used to reach the modes which are shown on the

diagram as red points.

The only parameter used in mean shift clustering is bandwidth. Bandwidth is denoted by h in the calculation of

the new mean shift weighted mean. For more details about the use of this algorithm is this study, see section

3.2.2.

2.5.3 Image Registration

“Image registration is the process of overlaying two or more images of the same scene taken at different times,

from different viewpoints, and/or by different sensors” [ZiF03].

Different approaches to image registration are widely used in remote sensing, medical imaging, computer vision

etc. Generally, image registration can be defined as a mapping process between two images with respect to their

intensities. In a simple case, illustrated in Figure 2-19, the goal is to determine a transform T that is able to map a

point p from one image to the homologous point q in the other image.

Figure 2-19: Image registration task (Source: [ITK05], Page 315, Figure 8.1)

In a 2-dimensional case, we define the images as two matrices I1 and I2, involving the intensities as components.

The mapping between these images can be expressed as:

(Equation 2-18)

where f(.,.) is called mapping function.

In this case, the mapping function f is a 2D spatial transformation that should match the image I2 (referred to as

moving image) on the image I2 (called fixed image). Image registration techniques can be classified into two

categories: area-based and feature-based methods. In the case of area-based methods, the intensities of all of

the images’ pixels are considered in order to register them. The structures of the images are identified by means

of structural analysis e.g. correlation metrics or Fourier frequency-domain properties. The identified structural-

objects are mapped together in order to find correlations between them. Feature-based methods, in contrast to

area-based methods, do not consider the overall structure of the images, but focus instead on correlations

between a few defining structural features such as lines, points, intersections etc.

Since registration methods are dependent upon applications and the type of images used in a particular study,

there is no universal method applicable to all registration tasks. The majority of registration methods, however,

consistently require the completion of the following four steps:

33

1. Feature extraction: This step is only performed in feature-based registration in order to automatically or

manually determines pre-defined features of images.

2. Feature adjustment: As the result of feature extraction in feature-based registration we can determine

important characteristics of the images that should be mapped together. When using the area-based

method, the correlations between the structures of the images are established by means of the

previously discussed structural analysis.

3. Transform estimation: Transformation is the mapping function that aligns a moving image with a fixed

image. The function is specified via the values of its parameters. These values are estimated by means

of established feature correlations, completed during the previous step.

4. Resampling and transforming: The moving image is transformed using calculated parameter values in

real coordinate space. The image intensities at non-integer coordinates are approximated by

interpolation17 techniques.

Since we use a type of area-based registration algorithm in this work, we will exclude the first step and attend to

the last three steps in the following discussion.

In order to estimate transformation parameters, one needs to use a similarity metric. This measurement allows

for quantification of the degree of correspondence between pixel intensities in both image spaces. Maximizing

the measured image similarity allows us to find a proper transformation parameter that can be used for the

mapping function. Therefore, image registration is regarded as an optimization problem. The type of spatial

transformation used in a particular study is one of the determining characteristics of each registration technique.

Choosing the transformation model depends on typical differences between input images and the expected

registration accuracy of mapping. Transform models are classified in two main categories, local (rigid) and

global (non-rigid). In the case of global transformations, the overall geometric relationships between points of

the image do not change. Translation, affine transform,18 and all linear maps (e.g. rotation and scaling) are global

transformations. A combination of translation, rotation, and scaling (called similarity transform) of a pixel at the

coordinate in an image is defined as:

(Equation 2-19)

where

tx and ty are the translation parameter,

s is the scaling factor and

$ is the rotation’s angle.

17 Interpolation is a method of constructing new data points within the range of a discrete set of known data

points [Source: Wikipedia, Interpolation, http://en.wikipedia.org/wiki/Interpolation (as of Sep. 22, 2008, 12:11

GMT)]. 18 Affine transform is a linear transform including scaling, shear mapping, rotating and translation.

34

In contrast to global transformation, local transformations may cause deformities in an image. Therefore, local

transformation is also called elastic transformation. In this case, the image is divided into segments that are

transformed by using different transformations and parameters.

After choosing the transformation model and constructing a metric, including the similarity measure, the

transformation’s parameter can be estimated by solving the optimization problem (maximizing the images

similarity). The initialized transformation is used to register the moving image onto the fixed image. Since the

transformation, as a continuous function, does not map the pixels exactly on the destination image grid, an

interpolation technique is needed in order to resample19 the image. It is important to note, however, that

transforming and resampling pixels from the moving image to the fixed image space does not guarantee that

every pixel in the transformed image receives a single intensity value. Resampling, in this way, frequently causes

holes or overlapping pixels. For example, after transformation, two pixel intensities may accidentally be

resampled on the same pixel on the fixed image grid and some pixels may remain unassigned. Problems arise

because we need the assurance that each pixel in the transformed image is assigned with a unique intensity value

in order to be able to measure the similarity between the output image and the fixed image. To solve these

problems, the inverse transformation is used in order to map the fixed image onto the moving image space.

Afterwards, the resampling algorithm goes through every pixel of the destination image grid and calculates the

intensity that should be assigned to each pixel after mapping the fixed image. In this fashion, each pixel in the

transformed image is assigned to a singular interpolated intensity value.

(a)

(b)

(c)

(d)

(e)

Figure 2-20: (a) A T1 MRI (fixed image) and (b) a proton density MRI (moving image) are provided as input to

the registration method. (c) Composition of fixed and moving images before registration. (d) Registered target

image. (e) Composition of fixed and moving images after registration (Source: [ITK05], Page 338, Figures 8.9

and 8.10).

Model based registration20 is a special type of registration that allows us to adjust a geometrical model onto a

fixed image. The model is constructed by means of geometrical spatial objects. The task of registration, in this

case, is to find the proper transformation parameters that can make the model a good representation of the

structure depicted in the fixed image (see Figure 2-21).

19 Resampling, in this case, is defined as assigning the intensities of each image’s pixel with respect to its

neighboring pixels, after transformation. 20 This terminology is used in [ITK05].

35

(a) (b)

Figure 2-21: Basic concept of model based registration. A geometrical model (ellipse) is registered against an

anatomical structure (skull) (Source: [ITK05], Page 477, Figures 8.60).

To read more about the use of registration algorithm in this work, see section 3.2.4.

2.6 Tools and Libraries

2.6.1 Amira

Amira21 is a 3D visualization and modeling modular software, developed by Mercury Computer Systems GmbH

22. It allows the user to visualize 3D images as well as to create and visualize polygonal models. The developer

version of Amira is extendable with additional modules for data processing by C++ programming.

The GUI of the Amira software (see Figure 2-22) is split into three main windows: i) 3D view, ii) the pool, and

iii) properties window.

The 3D view window is used to display images and spatial objects. Interactive exploration of loaded data is

provided by different spatial control functions such as rotation, transform, zoom, etc.

As a modular and object-oriented software system, Amira consists of modules as basic components. Each of

these components can be loaded into the pool window. There are two kinds of modules available: i) display

modules, which represent data objects and visualize 3D images and models and ii) computational modules,

which perform different operations on loaded data. To execute tasks, which need several dependent and

interactive processes, a network can be created among different involved modules by connecting them together.

When a loaded module in the pool is selected, its settings and functions are shown in the properties window. The

user can then set associated parameters or call functions in order to produce new data objects (modules).

In this work, we used Amira 4.1.1 to visualize the 3D images of neurons and the reconstructed models as spatial

objects. By developing additional modules, we provide a comfortable user with interactions for creating,

modifying, and fitting models of neurons.

21 Advanced 3D Visualization and Volume Modeling (http://amira.zib.de) 22 Mercury Visualization Sciences Group (http://www.tgs.com)

36

Figure 2-22: The GUI of Amira 4.1.1 displaying a 3D image of a neuron structure, scanned by confocal

microscopy

2.6.2 ITK

ITK23 is an open-source library used in performing registration and segmentation within a multi-dimensional

space. ITK is implemented by C++, using generic programming24 principles.

ITK provides a registration framework (see Figure 2-23) that consists of various interchangeable pluggable

components which perform the registration steps described in section 2.5.3. The inputs used are the previously

discussed fixed and moving images. The metric measures the similarity between the fixed image and the

interpolated pixels of the moving image within the fixed image’s space. The quantified fitness value, calculated

by the metric, is sent to the optimizer which maximizes the similarity by means of transform parameters. The

initialized transformation maps and resamples the moving image in order to provide the metric with intensity

values for calculating the correlations between the images. The loop continues until a predefined termination

condition is fulfilled. The termination condition can depend on the metric’s fitness value. This value indicates if

the images are fitted enough. The termination condition can also depend on transform parameters which

ascertain if significant transformation is still needed.

Each component of the image registration framework can be chosen from different algorithms. This allows the

framework to be very flexible. Depending on the following criteria, the appropriate algorithm is chosen for each

individual component of the framework to solve the optimization problem of registration:

1. Type of input data

23 National library of medicine insight segmentation and registration toolkit (ITK) (http://www.itk.org) 24 Generic programming is a method of software development to create re-usable software libraries. The

functions are implemented generally without specifying the types of data and structures. In C++ this style is

developed by templates.

37

2. Structure of the inputted images

3. Type of registration algorithm

4. The expected registration accuracy

Figure 2-23: Image registration framework in ITK ([ITK05] Page 316, Figure 8.2). Fixed and moving images are

treated as the input data. Inside the framework, we have four autonomous but interactive components: i) Metric,

ii) Optimizer, iii) Transform and iv) Interpolator.

In this study, we use the model based registration algorithm in order to map a geometrical model of an image.

For our purposes, the geometrical model is our reconstructed 3D model of a neuron and the image is the confocal

microscopy image stack of the neuron. During the registration process, the model is specified with parameters

that can be changed and applied by Transform module in order to make the model a good representation of the

structure shown in the image of a neuron. Thus, we have two inputs: fixed model and moving image. ITK

provides various kinds of spatial objects that allow us to reproduce the reconstructed 3D model of neuron as a

spatial geometrical model with the necessary parameters. This type of model is suitable to use as an input for the

ITK registration framework. See section 3.2.4 for more detailed information about this framework and its use in

this study.

2.6.3 LIBSVM

LIBSVM25 is an open-source library for support vector classification and regression (see [CCL01]). This library

also supports different SVM approaches such as C-SVM, #-SVM and probability estimator, all of which are used

in this study. For more information about “support vector machines” and its approaches, see section 2.5.1

3 Methodology

3.1 Concept

Given an image stack of a real neuron (illustrated in section 2.2), a 3D model of the neuron structure (introduced

in section 3.4) can be reconstructed. This chapter begins with discussion of the main reconstruction of a neuron

structure as a systematic process. The reconstruction process is divided into partial problems, each with a

25 Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001. Software

available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

38

specific goal. To reach each of these goals and solve the corresponding problems, we attempt to extract

informative data from the image stack. Then, regarding the extracted information from the image stack as the

inputs of each partial problem, we look for an appropriate algorithm that can reach the goal using these inputs.

Finally, we describe the task and the expected result of the chosen algorithm. Section 3.2 will discuss the detailed

formulation and implementation of these algorithms.

3.1.1 Finding the Representative Spheres

The first step of modeling a neuron from its image is to determine which voxels represent the middle axis of the

neuron structure. As described in section 2.1, neurons consist mostly of neurites. Neurites are tubular

components that can be represented by many spheres with different radii. These spheres are optimally located on

the centerline of the tubular structure and have the same radius as the tubes. Therefore, the main goal of the

whole reconstruction process is to build spheres at proper locations with matched radii.

To find the proper location for each sphere, we should be able to distinguish the voxels representing the points

located at the axis, or center points, from the others. In confocal microscopy image stacks, the voxels

representing the neuron structure are indicated by brighter gray values than the background voxels. However,

distinguishing the center points based on the voxels’ gray values (intensities) alone is impossible. On one hand,

an image stack recorded with confocal microscopy is usually noisy. Hence, there could be some bright gray

voxels, which are not part of the neuron. On the other hand, depending on the thickness of the neurite, the neurite

structure can be shown in the image as several voxels with the same gray values. Furthermore, considering the

voxel intensities alone, without taking into account the configuration of its neighboring voxels provides a limited

worm's-eye view of the neuron. Therefore, it is necessary to consider the local context in order to make a better

decision regarding the neuron structure. In this context, we define features for image voxels in regard to

neighboring voxels, and attempt to determine which voxels represent the center points with the aid of their

features’ values. Since we consider a wide range of the surrounding voxels, the context provided by the neurite

structure influences the decision-making. As described in section 2.4, the eigenvalues of the image’s Hessian

matrix are helpful characters in determining the tubular structures of the neurites. These eigenvalues are

calculated for each voxel with respect to the intensities of its neighbors. Thus, they fulfill the mentioned criteria

for significant features. Making a decision based on quantitative features is the task of classification algorithms.

In this work, we use 3D image stack. Thus, we consider three Hessian eigenvalues for each voxel, in order to

make a yes-or-no (binary) decision for it.

After determining which voxels are located at the axes of the tubular neurites, or center voxels as centers of

representative spheres, their radii should be identified as well. The radius at a certain point of the axis is

proportional to the number of bright voxels located inside the neurites’ tubular structure and perpendicular to the

axes. These structural voxels represent the neighbors of the center voxels. Thus, the neighbors should be

considered when calculating the radii for each sphere. This is done when the eigenvalues are calculated. In this

way, neighboring voxels can be used to determine the radii of the spheres. Determining the radii as continuous

real values, dependent on some features, is the task of regression algorithms.

In machine learning, there are many classification and regression algorithms such as neuronal networks, radial

basis functions, support vector machines etc. As described in section 2.5.1, the “support vector machines”

(SVMs) are able to minimize the generalization error following the principal of structural risk minimization.

39

This causes a robust classification and regression not only of training data, but also of test data. The SVM has

established itself as one of the most reliable supervised learning algorithms. For more details about the use of

SVM in this work, refer to section 3.2.1.

Since SVM is a supervised learning algorithm, it needs some training data to define the classification boundary,

or the regression function, in the feature space. Therefore, we need to use some examples as training dataset, i.e.

some neuron images with given center points and radii of the neurite structure. Accordingly, we first create

spatial 3D models similar to a real neuron structure and then create a 3D image stack from these models. We

then have an image and its model, from which the exact position of the center points and their radii can be

identified. These data, called toy data, can be used for training. After training, the SVM has the necessary

parameters for classification and regression and is able to classify voxels of neuron images in order to find the

locations of the center points. In almost the same manner, the regression method of SVM can assign each

classified center point to a matching radius. Of course, the quality of the results depends on the similarity

between the toy data and the tested image. In this context, it is important to create images from toy data, which

are similar to the neuron image stacks recorded with confocal microscopy. It is important to note, however, that

the SVM, due to its generalization ability, can tolerate some variation.

3.1.2 Grouping the Spheres

SVM as a classification and regression algorithm can be used to determine the many spheres representing the

neurite’s structure. Although all voxels located at the tubular neurites’ axes should be classified and represented

by spheres with fitting radii not all of them are essential in creating an adequate reconstruction. The presence of

many spheres makes the reconstruction process slow, inexact, and complex. Thus, it is necessary to decrease the

number of spheres. Therefore, we replace the spheres located in close proximity, and sharing the same properties,

with a single sphere. The new sphere, as an adequate substitute, should own the average properties of the old

ones. The use of these substitute spheres enables us to deal with a less complex model and still provides enough

spheres for further modifications and exact modeling.

It is significant that, by means of this substitution process, we can solve the problem of complexity but not that

of noise in the neuron’s image. Moreover, the SVM, due to the generalization error and image noise, is expected

to classify some voxels near the target points as center points. The noises in an image stack cause both the

creation of spheres at incorrect positions and/or ones with imprecise radii. The new substitute sphere has the

average location and radii of all spheres, it is representing. Therefore, its properties are still affected by the

incorrect spheres, which are omitted. Improving a simple model with fewer spheres allows for a faster and more

effective reconstruction process than a redundant and complex one. Thus, the newly substituted spheres will

become the atomic part of the reconstructed model.

It is important to determine which spheres can be grouped together and replaced with a new substitute sphere. In

some situations, grouping spheres according to their location and radii can cause mistakes and inaccuracies. For

example, when two neurites almost cross each other, a junction could accidentally be created by grouping the

nearly located spheres together (see Figure 3-1, a and b). Similarly, two parallel neurites can mistakenly be

grouped into a single neurite, located between the old ones. For this reason, determining the group of similar

spheres does not depend only on their locations and radii, but also on other influencing features like the neurite’s

direction.

40

(a)

(b)

(c)

(d)

Figure 3-1: Special cases of grouping. (a) Found representative spheres of two neurites nearly crossing each

other before grouping. (b) The model neurites are joined after grouping with respect to radii and location of

spheres. (c) Two parallel neurites and their found representative spheres. (d) The probable result of grouping,

considering location and radii, is a single neurite model located between the old ones.

Generally, grouping data with regard to defined features is also a kind of classification. In machine learning,

partitioning algorithm in the feature space is called clustering. There are different types of clustering algorithms

(see section 2.5.1.4 for more details). The mean shift clustering algorithm will be used in this study. Because we

are using this type of clustering, there is no need to assign the number of prototypes (new substituted spheres)

and there are not many parameters to set. The mean shift clustering algorithm constructs clusters containing

similar objects (spheres) in terms of the computed features. It also finds a prototype for each cluster. Since each

prototype possesses the average feature values of all the included spheres in the cluster, it becomes a good

substitute for the clustered spheres. For more details about the defined features, and the use of the mean shift

clustering algorithm in this study, see section 3.2.2.

3.1.3 Connections

With the aid of mean shift clustering, the similar spheres are grouped together and are represented by a new

sphere as a prototype. However, the newly separated spheres cannot represent the neuron structure completely. A

neuron consists of several neurites and, hence, the next step of reconstruction is to find the connections between

spheres representing tubular neurite structures. Considering neuron morphology (see section 2.1), there are

different obtainable types of connections. Treating the whole neurite structure as a tree, the thickest neurites are

grown from the soma and branch out into thinner ones. There are conceivably two different types of connections:

1) connections between spheres along a neurite’s segment and 2) connections between two neurite segments. We

try to find these connections systematically. First, we find simple connections between spheres in order to

41

construct neurites segments separately from their branching points. Then, we connect the constructed segments

together.

To find the connection between spheres, representing a neurite segment, we need to consider different factors.

Although sequentially located spheres with similar radii can compose a neurite segment with the utmost

probability, radii and location are insufficient means for reconstruction and situations could arise where other

features, like direction of spheres, play a decisive role in determining how spheres are connected (See section

3.2.2). We can apply mean shift clustering to all features. Here, it is applied in order to cluster the substituted

spheres, which belong to a single segment, together. In this case, each cluster represents a group of spheres that

can be connected together. Therefore, the prototypes of clusters are not important for us, but rather the clusters’

members themselves are taken into consideration. After clustering, the spheres inside each cluster should be

connected together sequentially. This allows us to produce a reconstructed neurite’s segments for each cluster.

The next task is to find the connections between neurite segments. In this case, the location of the neurite

segments becomes more important than their radii or direction. This is because a neurite can branch out into two

different thick neurites with different directions. Therefore, by having well-connected segments, the segments

ending near each other can be connected together in all probability.

Figure 3-2: Connections of spheres. The blue dashed lines represent the connection between spheres and the

green dotted lines represent the connections between segments.

For more details about connections and their algorithms, see section 3.2.3.

3.1.4 Corrections

By connecting the spheres together, we create an initial model of the neuron. However, this model is not

sufficiently accurate. The following factors affect the resulting model and make it only an approximate

reconstruction of the neuron:

42

1. Noises in the neuron images cause modeling errors. For example, spheres can be created with wrong

radii or at wrong positions. In addition, the feature values calculated from the neuron’s image for the

SVM and clustering algorithms can have small errors.

2. The SVM minimizes the general risk but is never perfect. Some voxels can be identified incorrectly as

center point because of classification error. Furthermore, the regression errors cause inexact radius

estimation.

3. The mean shift clustering algorithm groups the representative spheres and replaces them with a new

sphere. This leads to an inexact approximation of the neurite structure. Obviously, the more numerous

the spheres in the model are, the more exactly the details of the neurite’s structure can be reconstructed.

4. The new spheres after clustering have the average properties of their components. The average is

calculated by a mean shift algorithm without considering the neuron’s image. Therefore, the prototype

spheres can locate at positions, which are not exactly inside the neuron structure. In addition, the radii

may not fit the neurite.

We solve the above problems in the following way. First, we add new spheres to fill in the gaps in order to have

enough spheres for detailed corrections. Then we fit all the spheres, by translating and scaling, in respect to the

neuron’s image. This is the task of the registration algorithm, which is described in section 2.6.2. For more

details and a discussion of the use of registration algorithm in this study, see section 3.2.4.

3.1.5 Data Structure

The reconstructed 3D model of neuron consists of spheres and the connections between these spheres. To save

such a model as a data object, one needs to know positions of the spheres, radii and connections. On the one

hand, it is important to have an adequate data structure to easily save, access, and modify the model data. On the

other hand, the model should be compatible with the reconstruction process carried out by different sequentially

executed algorithms. Each algorithm uses the result of the last one as input:

1. First, we have the representative spheres as output of the SVM.

2. The mean shift algorithm clusters these representative spheres and replaces them with the prototypes.

3. Using the prototype spheres, the simple connection between spheres will be constructed to model the

neurite segments.

4. Finally, we construct the connections between these segments in order to accomplish the 3D model.

We use a tree-form graph as the model’s data structure. Each node of the graph stands for a part of the model,

such as a sphere, a segment or connected segments. The edges between the nodes in the graph represent the

connection between these parts. Since the results of the SVM (the representative spheres) are not relevant for the

model representation, we take the prototype spheres from the mean shift clustering algorithm as the atomic

components of our data structure. These are represented by so-called point nodes in the graph. We construct the

graph in a hierarchical way, so that each algorithm can build nodes of the next level based on the result of the

previous level. The first level of the hierarchical model is made up of point nodes. The next level represents

segments including the connections between spheres. The so-called line nodes have edges with the point nodes,

which are connected together to represent a segment. The highest level consists of connection nodes, which

43

represent the connections between segments. A connection node connects point nodes and, consequently, their

line nodes together. Figure 3-3, illustrates the data structure of a model.

Figure 3-3: A model and its data structure as a tree-form graph.

3.2 Implementation

In Figure 3-4 the pipeline of main algorithms is illustrated. The neuron’s image as a 3D image stack is the main

input. The first processing module is the eigenanalysis of Hessian that calculates the eigenvalues and

eigenvectors of the Hessian matrix for voxels of the image. These values will be used by the SVM and mean

shift clustering algorithms as features. The SVM first needs some toy data for training and for setting the

classification and regression parameters. With the aid of these parameters, and, based on the features of the

neuron’s image, the SVM can extract the representative spheres of the neuron structure. The mean shift

clustering takes these spheres and tries to cluster them in order to create point nodes (the prototypes of the first

mean shift clustering) and line nodes (the clusters of the second mean shift clustering). The connector takes this

data and creates connections between point nodes inside a line node. Furthermore, the connector module creates

connection nodes to accomplish the model. Finally, the registrator fits the model on the neuron’s image to refine

the 3D reconstruction.

44

Figure 3-4: Pipeline of algorithms.

In following sections, we are going to describe each module in detail.

3.2.1 Representative Sphere Extraction with the Aid of SVM

In Figure 3-5, the SVM module is illustrated in detail. The task of this module is to extract representative spheres

of neurite structures from the neuron’s image with the aid of support vector machines. This module consists of

four submodules. The main submodule is the SVM classifier which performs the classification and regression

processes. In this study, the SVM is implemented by means of LIBSVM library. We use the C-SVC26 approach

for classification (determining the location of spheres) and #-SVR27 for regression (assigning the radii). They can

both tolerate noise in the feature space. The radial basis function (RBF)28 is used as kernel for both approaches.

26 C-SVM is the implementation of C-SVM classification approach in the LIBSVM library. 27 #-SVR is the implementation of #-SVM regression approach in the LIBSVM library. 28 A radial basis function is a real-valued function whose value depends only on the distance from the point c,

called a center, so that: . The commonly used RBFs are Gaussian, multiquadric, polyharmonic

spline and thin plate spline [Source: Wikipedia, Radial basis function, http://en.wikipedia.org/wiki/

Radial_basis_function (as of Sep. 22, 2008, 12:15 GMT)].

Neuron’s image

SVM

Connector

Registrator

3D model of neuron

Eigenvector

s

Eigenvalue

s

Representative

spheres

Point nodes

& line nodes

Reconstructed 3D

model

Eigenanalysis of Hessian

Toy data (3D Model)

Mean Shift Clustering

Model

(For training)

Input

Output

Module

45

Figure 3-5: The pipeline of the SVM module.

SVM, as a supervised learning algorithm, needs to be trained using examples. Besides the SVM classifier, the

remaining submodules are responsible for creating toy data. This toy data is used to create classification and

regression examples for SVM. Toy data are spheres with matching locations and radii. The toy data, used as

input in this module, is a manually constructed simple 3D model. The module image sampler samples an image

stack from this model. By adding noise and blurring the image, the resulting image more accurately resembles a

real neuron’s image. In this way, the 3D model becomes a perfect representation of the neuron structure shown

in the sampled image. In contrast to the real neuron’s image, the sampled image of a toy data along with the 3D

model allows us to determine whether a voxel is a center point or not. This information can be used to create

classification examples.

The task of the SVM is to determine the center points based on the voxels’ features. Therefore, each voxel in the

image is a data point in the SVM’s feature space. However, by using the sampled image of the toy data models,

we can provide each voxel with classification and regression target values (called labels). For instance, in the

case of classification, voxels locating along the middle axis of the tubular toy data model are labeled as positive

and all other voxels as negative data points. The target values, by regression, are the correct radius of each

determined center point. These values can be provided by the toy data as well. By means of such labeled data

Hessian’s Eigenvalues of

neuron’s image

SVM classifier

(Classification and Regression)

Toy data (3D Model)

Image sampler

Eigenanalysis of Hessian

Mode

l

Imag

e

Eigenvalue

s

SVM training

SVM

Parameter

s

Representative spheres

Input

Output

Module

Labels

46

sets, The SVM trainer calculates the necessary parameters for the SVM classifier. With the aid of these

parameters, the SVM classifier should be able to find spheres at correct locations (classification) with

appropriate radius (regression) based on their features. For each voxel, we use the eigenvalues of the Hessian

matrix as features.

3.2.1.1 Toy Data

The performance of the SVM (classification and regression) depends, among other things, on the toy data. The

toy data consists of a 3D model and its sampled image. The model should cover all common cases of neuron

structures. It should represent neurite structures with a large range of radii and segments with different curves.

The sampled image should resemble a confocal microscopy image stack and include the relevant noise of

confocal microscopy. Furthermore, it should be blurred in order to simulate the neurons images approximately

(See Figure 3-8).

The sampling process of toy data can be outlined as follows. Given a constructed 3D model, we sample an image

stack from the model by setting the voxels’ intensities. Voxels, locating inside the neurite structures in the model,

are assigned with the highest intensity (i.e. the brightest gray value) and all other voxels locating in the

background are set to the lowest gray value (black). The resulting borders between the bright and dark voxels are

excessively clear and need to be smoothed. Blurring is an approach used in image processing to smooth an

image in order to remove noise and fine-scale structures. One of the common blurring methods is achieved using

convolution29 operators. Convolution operators change the gray values, by means of a predefined matrix, known

as kernel30. There are diverse linear and nonlinear kernels that adjust different functions for gray value

combination. For the purpose of our study we use the Gaussian kernel. The 2D Gaussian distribution has the

following form (shown in Figure 3-6):

(Equation 3-1)

where $ (sigma) is the standard deviation.

In this work, we use the Gaussian filter of ITK library, which implements the Gaussian kernel, described by T.

Lindenberg in discrete Scale-Space theory (see [Lin91]).

Another step of the image resampling process is adding noise.Considering a typical image stack recorded with

confocal microscopy revealed that intensities, with some exceptions, contain mild noise levels. Usually, noise in

imaging systems is either additive or multiplicative. In this case, we use a zero-mean additive noise for to

calculate the image intensities. The noise for each voxel is independent and identically distributed. Common

additive noise models are Gaussian, Laplacian and Uniform models.

29 Convolution is a mathematical operator that takes two matrixes as input. It uses one of the input matrixes as an

operator to modify the other one in order to produce a third matrix as output. 30 Kernel in image processing field is a small matrix, which is used as operator by convolution processes. It

defines the way of combining neighbors’ gray values to compute the value of the target voxel.

47

Figure 3-6: Graph of a 2D Gaussian distribution ($ = 1).

In Figure 3-7, the probability density functions for these noise models are illustrated. In the case of the Uniform

noise model, all noise values are generated with same probability. Thus this model is not suitable for this study.

In the case of Gaussian and Laplacian models, small noise values are generated more frequently than the bigger

ones. Therefore, these models better resemble confocal microscopy noise. However, Laplacian in comparison to

the Gaussian model generates high noise values with bigger probabilities and can simulate the outliers more

effectively.

Figure 3-7: Comparing three zero-mean additive noise models (p.d.f) with standard deviation equal to one.

The Laplacian probability distribution can be thought of as two exponential distributions spliced together back-

to-back and is defined as:

48

where µ is the mean and % indicates the standard deviation.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3-8: Comparing a neuron’s image with the image of a toy data. (a) The toy data as a 3D model. (b) The

sampled image from the model. (c) Noise added to the image (% = 0.1 and µ = 0). (d) The blurred version (% = 1).

(e) A sample layer of a neuron image for comparison. (f) The model of toy data with the projected constructed

image onto the xy-, xz-, and yz-planes.

3.2.1.2 Image Scaling

We take the eigenvalues of the image’s Hessian matrix as input features for the SVM. As described in section

2.4, the Hessian matrix represents the second-order partial derivatives of the image’s matrix in all three

directions. By means of these features, we aim to find the center point of the neurite’s structure. However, only

the center points of thin neurites can be distinguished in this way. This is because thick neurites have more than

one voxel with the brightest gray value in the middle of the structure. Therefore, the partial derivatives do not

change along the width of such neurites and are equal to zero (see Figure 3-9). This means that the eigenvalues

of such center points, calculated from the Hessian matrix, are not significant features in differentiating the center

points from the others. Consequently, voxels, located in the middle of thick neurites, cannot be distinguished

from each other and the SVM determines several voxels as center points instead of one.

49

(a)

(b)

Figure 3-9: (a) Different size of tubular shapes. (b) Gray value and derivatives of voxels along x direction of the

image (marked with red line in (a)). The blue curves indicate the gray values; the red ones are the first

derivatives and the green ones show the second derivatives.

However, the voxels at center of thin structures can easily be classified by the SVM. Ideally, only one voxel at

the center possesses the brightest gray value along the neurite. Nevertheless, by scaling down the image and

making the neurites thinner, thick neurites can also provide the SVM with significant features. Depending on the

scale rate, neuritis of different thickness can be classified by the SVM. Typically, the neurites in an image stack

have a radius in the range of 1 to 5 voxels. In order to classify center points in all common structures, it is

necessary to resample the image with three different scales: 0.5, 0.25 and 0.125. After the resampling process,

depending on the corresponding scale rate, the image’s resolution is reduced. Consequently, each image has

fewer voxels representing the neurite. However, the image contains bigger spacing values than those accounted

for in the world coordinate system. By raising the spacing values, the resampled image will be concordant with

the original image.

Using a smaller scale may cause the thin neurites to disappear completely. Therefore, thick neurites will be

represented with fewer voxels, so that their features are more significant and the center points can be identified

more precisely. For example, using the scale rate of 0.125, the width of a neurite with a radius of eight voxels

will be represented by two voxels instead of 16 but with 8-times greater spacing between each voxel. As a

result, we have four images as input for the SVM: one original image and three scaled resamples. Through this

process, a specific range of radius will be correctly determined for each image. Consequently, we have four

SVMs and each one is trained with a certain image scale. In order to assess this approach, simple tubular neurite

models with different radii are created and classified into two cases: i) with one original image stack as input and

one SVM and ii) with the additional three scaled resamples and, overall, four specialized SVMs. The statistic

shown in Table 3-1 clarifies the problem: In the case of one image, by raising the radius of neurites from 1 to 3,

x coordinate of image

Valu

e

50

the number of correctly identified center points (correct hits31) is reduced and more voxels are incorrectly

identified as center points (false alarms32). When considering thicker neurites, the center voxels, which have

positive labels in the training data set, have similar features to the voxels located near them with negative labels.

Therefore, the feature space is so noisy, that the SVM is no longer able to classify them. So the neurites with a

radius bigger than 4 are not classifiable with just one image as input.

When using the three resamples, four specialized SVMs are used for different scales. In contrast to the cases

with one original image stack as input and one SVM, here, the results are considerably improved. In this tubular

neurite model, all of the center points are identified successfully by the appropriate SVM and the false alarms are

greatly reduced. However, the results generated from models with radii of 1, 2 and 9 need to be considered in

detail. When using these models, the false alarms are almost as numerous as the correct hits. In the first case

(radius of one), we use the original image. Thus, the neurite is a tube with the thickness of two voxels, each

possessing similar features. In the next case (radius of two), the image is scaled down by 0.5 and, consequently,

the thickness of the tube is reduced to two voxels at each cross-section. In these cases, the SVM cannot

distinguish the voxels from one another. This is because only one of each pair of similar voxels is labeled as a

positive example for training. Hence, one point, due to its similarity to the other, is incorrectly classified as a

center point. The same problem is seen in very thick structures where the radius is bigger than eight voxels.

These classification errors are corrected by means of mean shift clustering algorithm, described in section 3.2.2.

1 image (not resampled) 4 images (resampled) Error

Radius Correct hits False alarms Correct hits False alarms Scale

1 100% (31 voxels) 0.06% (32 voxels) 100% (31 voxels) 0.06% (32 voxels) 1

2 96.77% (30 voxels) 0.05% (29 voxels) 100% (16 voxels) 0.22% (15 voxels) 0.5

3 93.55% (29 voxels) 0.12% (76 voxels) 100% (8 voxels) 0% 0.25

4 100% (8 voxels) 0% 0.25

5 100% (8 voxels) 0.21% (2 voxels) 0.25

6 100% (5 voxels) 0% 0.125

7 100% (5 voxels) 0.81% (1 voxel) 0.125

8 100% (5 voxels) 0.81% (1 voxel) 0.125

9

0% 0%

100% (5 voxels) 8.13% (10 voxels) 0.125

Table 3-1: Comparing the SVM classification errors by different tick toy data in the cases of using one or four

image scales. The sampled image of each model consists of many voxels labeled as positive examples (hit) or

negative examples (alarm). The number of negative examples is much greater than the positive ones. The

percentage ratio is calculated in respect to the corresponding toy data.

31 Correct hits are data points in a given data set which are correctly classified as a positive data point by a binary

classification algorithm. 32 False alarms are data points in a given data set which actually belong to the negative class but are classified as

a positive data point by a binary classification algorithm.

51

Since image resolution is reduced by resampling to a smaller scale, fewer voxels are available to be identified as

center points. However, in view of the fact that the SVM aims to recognize the skeleton of the neuron, the

number of the determined center points does not affect the accuracy of reconstruction. Furthermore, the

registration algorithm compensates for the lack of center points by interpolation (see section 3.2.4).

Each specialized SVM works on a specified scale of the image and is responsible for a specified range of radii.

Therefore, each SVM must be provided with examples which are cognizable at that scale. For example, a neurite

structure with the radius of 4 will possess no positive labeled data for an SVM specialized for thinner neurites.

The radius, however, is a continuous real number, generated from the assumed interval [0.5, 8]. To determine the

appropriate interval of radii for each SVM, a toy data model, containing neurites with radii of 0.5 to 8, is created.

Since the two largest eigenvalues of image’s Hessian matrix are the deciding values by classification and

regression, we compare the average of these eigenvalues produced by different radii. As shown in Figure 3-10,

for each scale of the image, neurites of a specific radii interval have the most significant feature values.

Therefore, the whole interval is divided into four overlapping intervals and neurites are trained, depending on

their thickness, with the corresponding SVM for classification and regression.

Figure 3-10: The behavior of two largest eigenvalues comparing to the radii for four different scales specialized

SVMs.

In neuron images, the neuron structure consists of different tick neurites. These neurites will be classified by all

four SVMs. Each SVM can identify the center points of the related radii in the related image. After determining

the center points for each scale, these points are transformed to the world coordinates system, which takes into

account the spacing of the resampled image.

SVM for

scale 1

r < 2

SVM for

scale 0.25

r ] 2.5 , 5.5[

SVM for

scale 0.5

r ] 1.5, 3 [

SVM for

scale 0.125

r > 4.5

Original image

Resample of 0.5

Resample of 0.25

Resample of 0.125

Radius

Average of

two largest

eigenvalues

52

Since the regression process uses the same features as classification, the radii of the determined center points are

also assigned by specialized SVM for regression. However, because of overlapping radii intervals and the error

of SVMs, there may be some voxels identified by more than one SVM as the center point of a representative

sphere. Consequently, these spheres may be assigned with different radii by two different SVMs. We solve this

problem by using the probability estimator SVM (see section 2.5.1). This kind of SVM allows us to assign each

data point to a class with a specific probability. The class with a probability larger than 50% is taken by default

as the result. In the case of double classified voxels, the radius value is taken, which is assigned by the SVM

with larger classification probability.

3.2.1.3 Parameter Selection

Each of the regression and classification SVM approaches (C-SVC and #-SVR) has a parameter through which

the noise tolerance can be set. In this section, we will try to show the effect of these parameters on the results.

Figure 3-11 shows the effect of the parameter C on the C-SVC classification algorithm and the parameter # on

the regression errors of #-SVR. If both parameters are big enough, the SVMs achieve their maximum

performance.

Figure 3-11: The effect of parameter C in C-SVM on the classification results, and # in #-SVR on the regression

results of four specialized SVMs on a sample toy data.

3.2.2 Grouping the Representative Spheres Using Mean Shift

Clustering

As result of the SVM module, we have representative spheres located at the middle axis of the tubular neurite

structure with a fitting radius. As described in section 3.1.2, we group these spheres together to create the point

nodes of our data structure by means of mean shift clustering. Furthermore, point nodes located on a segment

compose a line node. The line nodes are also determined by mean shift clustering.

Clustering algorithms group data based on its given features. Hence, each given representative sphere is a data

point in the clustering feature space. On the one hand, we have locations and radii computed by the SVM module

as features. On the other hand, we have the Hessian’s eigenvectors of the neuron’s image at the center point

position.

53

This module consists of two submodules. The submodule Point nodes clustering receives the features and

outputs the prototypes of data points through mean shift clustering. The prototypes represent the point nodes of

the data structure. The next submodule, Line node clustering, receives the point nodes as input and clusters them

in order to determine the line nodes of the data structure. Both submodules emit their result as output of the mean

shift clustering module.

Figure 3-12: The pipeline of the Clustering module.

3.2.2.1 Significant Features

The mean shift clustering algorithm groups similar data points. The similarity of these data points is defined by

the distance function in feature space. Hence, using significant features that can indicate essential properties of

representative spheres is an important part of the clustering process. The major criterion of grouping is that only

the representative spheres, belonging to a single segment, can be grouped together. Consequently, the location of

the spheres should be considered when clustering. Moreover, points of a segment must have similar radii.

Therefore, the radius of the spheres also becomes a deciding value. As described in section 3.1.2, there are some

cases where these two features (radius and location) cannot distinguish two neurites from one another (see

Figure 3-1 a). For this reason, we need to take the direction of the sphere into consideration. The direction of a

sphere can be defined in respect to the neurite’s segment, at whose axis it is located. As described in section 2.4,

the eigenvector, corresponding to the smallest eigenvalue, of the image’s Hessian matrix at the center of a

tubular shape indicates the tube’s direction. Thus, we use also this eigenvector as feature in order to consider the

neurite’s direction. As a result the clustering’s feature space has seven components:

I. Spatial location of the representative sphere that consists of three components: x, y and z.

Hessian’s Eigenvectors of

neuron’s image

Representative spheres

Locations &

Radii

Point nodes clustering Line nodes clustering

Point

nodes

Features of representative

spheres

Point nodes

& line nodes

Input

Output

Module

54

II. Radius of the representative sphere determined by the SVM regression.

III. The Hessian’s eigenvector corresponding to the smallest eigenvalue of the neuron’s image, at the position

of the center point. This 3D eigenvector also has three components.

3.2.2.2 Parameters

One of the advantages of mean shift clustering is that the algorithm does not require the setting of many

parameters. The only parameter is h which represents the bandwidth of the kernel. However, the features used in

this case are not balanced. The seven mentioned features are all in a spatial domain but are defined by different

scales and natures. By assigning the differing bandwidth parameters for each group of similar features, we

attempt to balance them. Through this process, the kernel is changed into a product of more radially symmetric

kernels:

(eq. 3-2)

where

xr is the radius of spheres and hr is the bandwidth of its kernel,

xs is the spatial Euclidean distance between spheres and hs stand for the bandwidth of the kernel for its three

components and

xe is proportional to the sinus of the angle between the eigenvectors corresponding to the smallest eigenvalue

of the image’s Hessian at the center of the spheres. It reaches the minimum distance in feature space, when

the eigenvectors point toward the same direction or in opposite directions. he refers to the bandwidth of the

kernel for the three eigenvector’s components.

However, these features are not sufficient for optimal clustering. Considering the grouping problem illustrated in

Figure 3-1 b, we need to examine more properties of the structure. In the aforementioned case, when two parallel

neurites of similar thickness locate enough near to each other their points may mistakenly be clustered together.

Since these parallel neurites are also aligned with themselves, the eigenvalues feature cannot avoid clustering

them together. Consequently, they are grouped as a single neurite, locating in-between them. In order to avoid

such cases, we extend the kernel (eq. 3-2) with a new term. In so doing, we define a new type of spatial

destination, which exposes the difference between parallel neurites. The new kernel is defined as following:

. (Equation 3-3)

The new distance is defined as , where P is a scalar coefficient, computed in Figure 3-13. P reaches

its maximum in parallel spheres and its minimum in situations where two spheres are located on the same

branch. In this way, both the similarity between spheres, belonging to the same neurite segment, and the

dissimilarity of the spheres, located in two different parallel neurites, are increased. The parameter hp indicates

the bandwidth of the newly defined parallel distance.

55

For the red center point r we have:

And for the blue center point b:

where is the cross product. P is defined

as:

.

Figure 3-13: Calculating a new term for the kernel in order to make the spheres of parallel-located segments

dissimilar from each other in the feature space. P reaches its maximum, equal to 2, when &='=0 or &='= % (i.e.

the branches are parallel). On the other hand, P=0 when &='= %/2 or &='=3%/2 (i.e. the spheres locate in the

same branch).

As a result, by computation of the mean shift vector mh,G(x) (Equation 2-17), we calculate the weights of data

points as follows:

(Equation 3-4)

where for two data points a=(ra,sa,ea) and b=( rb,sb,eb) we have:

where is the angle between ea and eb.

where P is defined in Figure 3-13.

The parameter h={hr,hs,he,hp} must be set in order to determine the neuron structure. This parameter indicates

the bandwidth and affects the size of the density estimation window for each data point. The bigger the value for

h is, the more data points are considered for the estimation. When more points are considered, the estimation is

smoother. Thus, h is also called the smoothing parameter (Figure 3-14).

56

Figure 3-14: The effect of the bandwidth parameter h on the kernel density estimation of 100 IDD samples

[Source: Wikipedia, Kernel density estimation, http://en.wikipedia.org/wiki/Kernel_density_estimation (as of

Sep. 22, 2008, 12:03 GMT)].

The parameter h is also used to balance data from different natures in a 7-dimensional feature space. Therefore,

the values of each component should match the domain its data values are coming from. In addition, the

component values depend on the use and the goal of the mean shift algorithm. As previously discussed, a mean

shift clustering algorithm is used in two submodules: Point node clustering and Line node clustering. The

initialization of the parameter h differs in each module and depends on the goal of the clustering process. If the

algorithm aims to create a point node, the radii of grouped points, for instance, should not differ greatly from

each other. However, when considering line node clustering, the lack of differentiation is not so crucial. This is

because a line node can consist of point nodes with a larger range of radii and is dependent on the neurite’s

thickness at different positions. Furthermore, the spatial distances between representative spheres inside a point

node may be smaller than the distances between point nodes in the creation of line nodes. In section 4.2 the

results of clustering algorithms with different parameter initialization are illustrated.

3.2.2.3 Spatial Distance between Point and Line Nodes

In addition to input data and parameter initialization, there is another difference between point node and line

node clustering submodules: calculating the spatial distances. In point node clustering, each data point is a

representative sphere of the neuron structure. The spatial distance between spheres is calculated by determining

the Euclidean distance between their centers (Figure 3-15 a). Since a point node represents many sequentially

located spheres, it can be considered as an ellipse (illustrated in Figure 3-15 b). In this case, the distance between

point nodes is calculated as the Euclidean distance between the nearest spheres located inside the clusters. Thus,

the actual elliptical shape of the point node is taken into consideration and can be more easily determined by

examining whether or not two point nodes are located near enough to each other to be clustered in a line node.

57

(a) (b)

Figure 3-15: (a) The input model for a point node clustering submodule. Each circle is a representative sphere.

The spatial distance between two spheres is denoted by xs. (b) The same model after the point node clustering

process. The spheres are clustered in point nodes. The spheres, which are clustered together, are marked with

same border color and enfold in an ellipse. is the defined Euclidean spatial distance between the point nodes 2

and 3 and reflects the Euclidean spatial distance between point nodes 1 and 2.

The line node clustering submodule allows us to show point nodes inside a line node as a cluster. However, these

point nodes are still not connected. In the next section, we will try to connect point nodes in each cluster together

in order to create neurite segments. In addition, in order to accomplish the skeleton reconstruction of the 3D

model, further connections need to be established.

3.2.3 Connecting and Enhancements

A model of a neuron consists of many connected point nodes found in our data structure. The goal of the

connector module is to find relevant connections between the point nodes in order to create line and connection

nodes. The connecting process is carried out in different hierarchical phases. We begin by connecting the point

nodes of the recognizable short segments and continue the process by determining new connections between the

previously created segments.

3.2.3.1 Line Node Connections

The clustering module allows us to determine the point nodes and incomplete line nodes as groups of

unconnected point nodes. The first step of the connecting module is to try to connect the point nodes inside each

of the recognized line nodes together. In order to find these connections we need to consider each line node

cluster as a directed, weighted and complete graph G=(V,E). Each vertex represents a point node and each

directed edge represents a connection between individual point nodes (see Figure 3-16). The edges

representing relevant connections of the line node must be identified. An assessment function which assesses the

edges of the graph is defined. The probability, if two points inside a line node are connected together, depends on

two criteria: i) How far are they located from each other and ii) how different their directions are. The radius is

not important in this case, since there is no correlation between the sequencing of the connected point nodes of a

segment and their thickness. The assessment function assigns the edge weights based on the two aforementioned

58

criteria. The assessment function a(.,.) for the directed edge from point node p1=(r1,s1,e1) to p2=(r2,s2,e2) is

defined as follows:

(Equation 3-5)

where

e1 is the eigenvector corresponding to the smallest eigenvalue of p1,

d is the spatial distance vector from s1 to s2 and

& is the angle between e1 and the vector d;

involving .

The weights can be negative or positive and are dependent on the sign of cos(&). For angles bigger than , the

weights are negative and for angles smaller than the weights are positive. The angle is defined by the

distance vector d and the eigenvector e1 corresponding to the smallest eigenvalue. Since, such eigenvector of

each point node is an independent constant, the direction of distance vector d ascertains the sign of the weights

of the edges between two point nodes and d depends on the choice of the point nodes as a source or destination.

Actually, there is only one conceivable possible connection between two point nodes of a model. However, in the

represented graph we have two directed edges indicating the same connection between the related vertices. The

absolute values of the weights for the connection are the same. This is because they differ only in the direction

of d and one has the inverse direction of the other. Hence, they differ only in their signs. The graph is defined as

a complete graph, whereby all possible directed edges become available. As a result, each vertex also has a loop

edge, which is weighted by the defined assessment function. Since, for a single point node, the spatial direction

vector d is a null vector, all loops are weighted by zero (see Figure 3-16).

The weights of the connections are proportional to the deviation of the directions of the point nodes from the

directions of the line nodes (between e1 and d) and are inversely proportional to the length of the spatial distance

vector. The more proximal the directions are, and the less distance between them there is, the bigger the absolute

value of the weights of the edges between them will be. When considering the weights of the edges in the graph

G=(V,E), the set of all relevant connections of a point node (represented as vertex v in graph), fulfilling the

mentioned connecting criteria, is defined as follows:

(Equation 3-6)

where

Ev is an edge beginning from the vertex v,

w(Ev) is the weight of the edge Ev,

and

.

59

As a result of (Equation 3-6, each vertex v should have a maximum of two edges in Cv with almost opposite

directions. Vertices with two edges represent the inside point nodes of the line node and are connected with two

other neighbors. However, the vertices having only positive or negative edges possess only one edge in the

connection set. These vertices indicate the corner point nodes locating at the ends of a line node and are

connected with a single neighbor point node. Considering (Equation 3-6, such point nodes have either a weight

equal to or to , i.e. all other point nodes are located on the same side and there is no outgoing edge

from these point nodes in the opposite direction. In this way, the directionality of the graph is used to identify the

corner points of a line node.

(a)

(b)

Figure 3-16: (a) Four point nodes clustered in a line node and their connections. (b) A graph representing all

possible connections of four point nodes in (a), weighted by the defined assessment function (Equation 3-5). The

blue edges represent the relevant connections with the minimum weights ( ) and the red ones are the most

weighted connections ( ) of each point node.

After determining the set of relevant edges, each pair of directed edges connecting two vertices is replaced with a

single connection between the related point nodes. Consequently, all point nodes clustered in a line node are

connected together sequentially in order to represent a segment of a neurite.

3.2.3.2 Line Node Enhancements

Because of the first clustering algorithm, which groups the representative spheres into point nodes, some

fineness may be ignored. For example, the point nodes located at the end of a line node, consist of a group of

representative spheres which are replaced by a single point node located in the middle of them. This type of

point node is the corner point of a line node and cannot be a good representation for the end of a segment. Since,

the actual end is located at the position of the farthest representative sphere. Therefore, the point node should be

moved into the position of the representative sphere, located at the end of the segment (see Figure 3-17). The

question is: which representative sphere from the cluster best represents the sought-after end of the neurite? We

define a new assessment function s(p,p´), which assesses each possible movement of corner point node p to the

location of a representative sphere chosen from the same cluster:

(Equation 3-7)

60

where

dp is the spatial distance vector from the current point

node p to the location of and

& is the angle between d (the vector from the connected

neighbor of p to p ) involving

.

The value of the assessment function is, on the one hand, proportional to the distance between the new and old

locations. Hence, the sphere located farther from the old location has priority over the one located nearer to it.

On the other hand, the value of the assessment function depends on cos(&) that reaches the maximum by &=0, i.e.

when d and dp have the same direction. In other words, the maximum is reached by movement towards the

location of a sphere, which is located on the run of the neurite in the same direction. As illustrated in Figure

3-17, the result of relocating better represents the tubular structure of a neurite’s segment.

(b1) (a) (b2) (c)

Figure 3-17: Comparing the connecting process with and without relocating, marked with red and blue arrows.

(a) The result of clustering module, including the representative spheres as empty circles, point nodes as filled

circles, and line node clusters as dashed ellipses. (b1) The connected point nodes as a line node without

relocating the corner point nodes. (b2) Relocating the corner point nodes to the position of a sphere located

further away. (c) The resulting connected point nodes after relocating.

Another important case of enhancement is the enhancement of a line node consisting of only one point node.

Consequently, the line node has a spherical shape. Normally, in a real neuron, a segment has a tubular shape.

However, line nodes consisting of a singular point node occur because short neurite segments consist only of a

few representative spheres and all of them may be clustered as a single point node. The number of the clustered

point nodes depends on their features and clustering parameters. Usually, the parameters are set in respect to a

typical neurite structure and do not reflect the aforementioned type of case. Therefore, such cases must be

examined and accounted for. Because a single point node can be considered as a corner point node, this problem

61

can be fixed with the aid of a similar relocating method. Singular point nodes should be replaced with two new

ones in order to give the line node a tubular shape (as illustrated in Figure 3-18). All connections between

representative spheres of the point node are determined using an assessment function similar to (Equation 3-7.

However, in this case, & indicates the angle between the eigenvector corresponding to the smallest eigenvalue of

p, and the distance vector between the two candidates. Consequently, the choice of two representative spheres as

end point nodes with proximal direction to the neurite structure and with the highest distance has the biggest

assessment. Finally, we replace the old line node with a new one that consists of point nodes where

representative spheres having the highest assessment values are located.

(a) (b)

Figure 3-18: Replacing an alone point node with two far located ones.

Through mean shift clustering, point nodes are clustered in order to compose line nodes. However, image stacks

are normally noisy and some structures are so complex that a single line node cannot represent some long neurite

segments completely. In our data structure, a line node is not only defined as an absolute equivalent to the model

of a segment, but it can also represent it partially. Therefore, long and complex segments can be reconstructed

systematically using an interactive connecting process which connects neurite segments together into longer

segments. Hence, we try to find line nodes which can be grouped together and replace them with a single one. In

this way, the data structure is simplified and can be executed faster. At the same time, this process fixes the

modeling errors caused by noise.

The mean shift algorithm considers three features of line node clustering: radius, location and eigenvector. Noise

in images can easily change the lengths and the directions of the eigenvectors. In addition, the radius can be

determined incorrectly due to the regression error. Some representative spheres may also be missed so that the

continuous run of a long segment is not identifiable. Therefore, we emphasize location as the most decisive

feature in combining the line nodes. It is assumed that line nodes with direct contact from the ending points are

assumed to represent a single segment. In other words, if the distance between corner point nodes of two line

nodes is less than or equal to the sum of their radii, both point nodes are connected together and a new line node

is created.

3.2.3.3 Finding Connection Nodes

The high-level components of our defined data structure in section 3.1.5 are the connection nodes that generally

act in place of roots in tree-shaped representation. They model the connections between line nodes at different

62

positions in order to represent the branching points of neurite structures. A neuron consists mainly of neurites

with various radii that grow in a dendriform way. A neurite can also branch out into many thinner neurites. The

aim of connection nodes is to model such branching points by connecting two point nodes of different line nodes

together. Since connected point nodes at a branching point can have extremely different radii, their location and

direction become the deciding features used to determine their connections. We define two thresholds for spatial

distance and direction deviation in order to decide if a connection between an inside point node of a line node

with a corner point node of another one is relevant enough. A decision function is defined that can extract the

connections whose spatial distance and direction deviation are below the predefined thresholds. Such

connections are modeled with the aid of connection nodes. The decision function m(.,.) for connection between

two point nodes p1=(r1,s1,e1) and p2=(r2,s2,e2) is defined as follows:

(Equation 3-8)

where

d is the spatial distance between s1 and s2 ,

& is the angle between the e1 and e2 ,

Td is the direction threshold (as parameter ) and

is the spatial distance threshold, whereas Ts is a factor to set as parameter.

The decision function returns 1 in the case of finding a connection between p1 and p2. Since point nodes in thick

structures have big radii and are located farther from each other than small point nodes, the spatial distance

threshold is defined depending on the radii of both point nodes. Consequently, in the case of thick structures, the

decision function tolerates longer spatial distances. Furthermore, in order to avoid connecting parallel branches

together, we only need to connect the point nodes whose difference in direction exceeds a preset threshold. For

example, by setting the parameter Td to 0.8, only point nodes, which are aligned in directions with the difference

bigger than 36.86 degrees, are connected together.

3.2.3.4 Summary

The connector module described in this section (illustrated in Figure 3-19) consists of three submodules that

connect the point nodes in a hierarchical manner in order to construct a tree-shaped data structure. Thus, the

order in which the submodules are performed is important. The received point nodes are the building blocks of

our data structure and are located at the lowest level. In addition, the incomplete line nodes are groups of point

nodes. The Line node connector submodule connects point nodes inside each line node together in order to

accomplish the next level of our data structure, the line nodes. Because the resulting line nodes are still not a

good representative of the neurites, further enhancements must be performed by the next module, the Line node

extender in the following order:

1. The corner point nodes are relocated to the end of the identified tubular structure.

2. Singular point nodes are replaced with line nodes, consisting of two point nodes.

63

3. Line nodes are connected together at proper positions and are replaced with a new one, in order to

simplify the data structure.

After these improvements, we construct the next level of our data structure, connection nodes, by means of the

submodule Connection node finder.

Figure 3-19: The pipeline of connector module.

3.2.4 Fine Adjustments with the Aid of Registration

As previously explained in section 3.1.4, we use the registration method in order to perform fine adjustments on

our reconstructed model. We use a model based registration algorithm from the ITK library that can fit our

reconstructed 3D model on the inputted image stack (see section 2.6.2).

(a)

(b)

Figure 3-20: (a) The model of a neurite structure before and (b) after interpolation.

The registration module receives the neuron’s image as a fixed image and the model as a moving spatial object

instead of moving image. The reconstructed model consists of several connected point nodes, depicted as

spherical spatial objects, that can be translated or scaled in order to fit the neurite structure in the image. The 3D

Point nodes

& line nodes Line node connector

3D Model

Model

Model

Line node extender

Connection node finder

Input

Output

Module

64

model as input is an approximation consisting of point nodes. Actually, the number of point nodes determines the

flexibility of the model. The more point nodes represent a neurite, the more accurately the neurite and its finesses

can be modeled. Therefore, before starting the registration process, we need to interpolate each connection

between two point nodes by adding additional spheres. In this way, we guarantee that the registration algorithm

has enough spheres to make the model, after registration, a precise representation of the neuron structure (see

Figure 3-20).

The pipeline illustrated in Figure 3-21 shows the model-based registration framework, used in this study, as one

that is designed based on the ITK’s registration framework. This model-based registration framework consists of

four submodules which are invoked in successive order. Similar to the described standard registration

framework, the input model is updated in each loop by a transformation submodule until a predefined

termination condition is reached.

Figure 3-21: The pipeline of a model-based registration framework.

3.2.4.1 Registration’s Framework

In this section, we are going to describe the task, input, output and the algorithm used in each submodule.

3.2.4.1.1 Interpolator

The goal of the interpolator submodule is to provide the metric with quantified data, calculated with respect to

the neuron’s model and image, in order to make the correspondence between the inputs measurable. The

structure of a neuron in the confocal microscopy images is shown by bright voxels against a dark background. If

the model is a good representation of a neuron, it will cover as much bright voxels as possible and not contain

many dark voxels which are normally part of the background. Calculation of the intensities of voxels located

inside and outside the model is the goal of interpolation. In this way, the metric is provided with intensities as

3D Model

Input

Input and Output

Module

Metric

Voxels

Interpolator

Transform

Optimizer

Neuron’s image

Transformed

Points

Fitness

values

Updated

transform

parameters

Intensitie

s

Point

nodes

Jacobian

matrix

Gradient - Low & high intensities

65

comparable quantified data. Our 3D model consists of many spherical connected point nodes which, together,

represent tubular neurite structures. In the first step, we reconstruct the model by using geometrical spatial

objects that are readable as components of the ITK’s registration framework. Although spheres, as geometrical

objects, can represent point nodes perfectly, reconstructing each point node as a small tube is more

advantageous. It both allows our model to cover more voxels, which it is representing in the tubular neurite, and

takes into account that each point node has a direction oriented towards the axis of the tube (see Figure 3-22, step

1). There are some predefined spatial objects in the ITK library which are used to reconstruct our point nodes in

the form of tubular spatial data (called !"#$%&'()'*+#,$-(). A tube is defined by its location, height and the

radius of the circular slice-plane. The goal of the interpolator module is to make it possible for the metric module

to measure and compare these parameters with respect to the fixed image intensities. In this case, the height of

the tube is not significant and can be set to a fixed value. The radius of the tube’s slice reflects the thickness of

the tubular structure at that position and, consequently, the radius of the corresponding point node it is

representing. Furthermore, the position and direction of the tube indicate the position and direction of the point

node. Therefore, radius, position, and direction are the properties which need to be considered in the

interpolation process.

The interpolator resamples the tubes from continuous real space onto the image space, which is restricted to a

discrete grid. The interpolator should resample the tube so that it is possible to determine changes of the

aforementioned significant properties of the tube from the computed image intensities. Therefore, we choose and

interpolate some spatial points inside, outside, and on the tube’s surface. The points are located in a circle

centralized to the axis of the tube (see Figure 3-22, step 2). In this way, the location of the point node can be

determined by the position of the group of chosen points and its radius can be determined by the circles’ radii.

The directional orientation of the tube is defined by the alignment of the circles in the spatial space. The number

of circles and the number of chosen points in each circle ascertain the approximation accuracy and are adjustable

as parameters.

Figure 3-22: Spatial preprocessing for the interpolation step. In the first step, a tube as a representation of point

node is created. In the second step, some spatial points are extracted.

There are different types of interpolations that can be used to estimate the intensities of the extracted spatial

points in respect to those of the fixed image. The four available interpolators in the registration framework are

defined as follows:

1. Nearest neighbor interpolator: This interpolator simply takes the intensity of the nearest voxel for a

non-grid position.

66

2. Linear interpolator: This interpolator assumes that the intensity between the grids changes linearly and

estimates the intensity of a non-grid position in respect to its neighbor voxels, depending on their

distances.

3. B-Spline interpolator: This interpolator uses the B-Spline33 basis function for estimating intensities of

spatial points.

4. Windowed Sinc Interpolator: This interpolator interpolates intensities of spatial points based on Fourier

analysis considerations.

For our purposes, we use the simple linear interpolator (Figure 3-23). As result, the interpolated intensity of a

non-grid position x is the weighted sum of the eight surrounding neighbors in the 3D image stack:

(Equation 3-9)

where di is the distance to i-th neighbor voxel and Ii the intensity of the i-th voxel.

Figure 3-23: The 2-dimensional linear interpolation of a spatial point at non-grid position. The intensity is

estimated in respect to the distances of the point from the neighbor voxels in the fixed image (dot-dashed lines).

3.2.4.1.2 Metric

The metric is the most critical submodule in this framework. It receives the interpolated intensities of the 3D

model and the intensities of the image as input and calculates the fitness values that indicate how well the model

and the image are matched. A model fits an image well, if, on the one hand, the interpolated intensities of spatial

points located inside and on the surface have small values. In this case, the spatial points are located inside the

neurite structure that is already shown by the bright voxels in the neuron’s image. On the other hand, if the

interpolated intensities of outside spatial points possess big values, which show that they are located at the dark

background voxels, then there is also a good fit between the model and the image. The metric uses a cost

function that calculates the difference of the interpolated intensities from the desired values. These desired values

33 Spline is a differentiable function that is defined piecewise by polynomials and is used for solving

interpolation problems. A spline function can be represented as a linear combination of B-Splines as basis

functions.

67

are called target intensities and are given to the metric submodule as the highest and lowest available intensities

within the neuron’s image. We use the mean square error as a cost function for the metric34 in order to calculate

the differences between the n interpolated intensities I = {I1, I2, …, In} and the target values for each of them as It

= {It,1, It,2, …, It,n}:

. (Equation 3-10)

The metric provides the optimizer with the necessary values (the fitness values), so that it can decrease the costs

(mean square error) in order to solve the optimization problem. The fitness values are described in the next

section.

3.2.4.1.3 Optimizer

The optimizer solves the optimization problem defined by the cost function of the metric. The optimizer

minimizes the costs with respect to the transform parameters. As a result, it emits new parameters for the

transform submodule that should match the model better on the neuron’s image. There are several types of

optimizers to use in this framework35. We achieved the best result with the L-BFGS-B36

method which is

actually a software package to solve nonlinear optimization problems [ZBN97]. This optimizer is derived from

the BFGS37 method and is enhanced by two extensions: i) It sets simple lower and upper bounds on the

variables, and ii) employs a limited-memory quasi-Newton38 approximation that does not require much storage

or computation. Since the quasi-Newton approximation is used in this case, there is no need to compute the

Hessian matrix of second derivatives of the cost function. However, the first derivative (gradient) of the cost

function E (Equation 3-10) with respect to the transform parameter T is necessary. In order to calculate the

derivative , we use the chain rule and rewrite it using X as a spatial movement vector in a spatial space:

. (Equation 3-11)

The term indicates the changes of intensities with respect to the spatial movement that is equal to the

gradient of the image. The term points towards the spatial movement with respect to the transform

parameter that is calculated as transform’s Jacobian matrix. In this context, the Jacobian is a matrix whose

34 There are also other metrics available for this framework such as normalized correlation, mean reciprocal

squared difference, normalized mutual information, Kullback Leibler distance metric, mean squares histogram,

gradient difference metric, Kappa statistics metric, etc. 35 Available optimizers: Nelder-Mead downhill simplex method, Conjugate Gradient, Gradient Descent, Regular

Step Gradient Descent, LBFGS, L-BFGS-B, One Plus One Evolutionary, Powell optimizer, SPSA Optimizer and

Levenberg-Marquardt. 36 Limited memory Broyden Fletcher Goldfarb Shannon minimization with simple bounds [BLN95]. 37 The Broyden Fletcher Goldfarb Shannon (BFGS) method is derived from the Newton's method in optimization

to solve an unconstrained nonlinear optimization problem. 38 Quasi-Newton method is an iterative algorithm for finding roots of equations that can be also used to find the

local optima of a function. It assumes that this function can be locally approximated as a quadratic in the region

around the optimum. In order to find the optima, this method needs the first derivatives (gradient) of the

function. The second derivatives (Hessian) are approximated successively in each iteration.

68

elements are the partial derivatives of the output point’s spatial coordinates with respect to the parameters that

define the transform. The Jacobian of a point x = {x1, x2, x3} in 3D space regarding n parameters p = {p1, …, pn}

of a transform model is defined as following:

. (Equation 3-12)

Therefore, the optimizer, above all, needs the following values (indicated as fitness values) from the metric in

order to minimize the cost function:

1. Gradient of the image: The Image’s Gradient is computed in the metric submodule by means of the

neuron’s image.

2. The cost: The output of the cost function is calculated in the metric submodule.

3. Transform gradient Jacobian: This matrix is provided by the transform submodule for the metric.

As a result, the optimizer calculates new transform parameters that lead to a better fitness between the

transformed model and the neuron’s image.

3.2.4.1.4 Transform

The transform submodule contains transform models, which are responsible for mapping each point node of the

model using the given parameters generated by the optimizer. In this case, each point node (represented as a

small tube) can be translated, rotated or scaled in order to find the fitting location, size and alignment regarding

the neuron structure in the image. The translation parameters are calculated based on fitness values, which

depend on the interpolated intensities. As described in section 3.2.4.1.1, the interpolated intensities are calculated

with the aid of spatial points located on circular layers, representing a point node. These points are located inside

and outside the point node. However, the outside points are not relevant for translation or rotation, because

fitting the inside points to the neuron structure sufficiently leads to finding the proper parameters. In contrast, the

outside points are necessary for scaling, because fitted inside points do not result in the fitted radius of the point

node. The radius is fitted to the neuron structure, when all inside points are located on the voxels representing the

neuron structure, and all outside points are located on the background voxels. Therefore, we use in the transform

submodule two different transform models. The versor rigid 3D transform is used in this framework for

translation and rotation, and the scale logarithmic transform is used to scale the point nodes. The versor rigid

3D transform represents a rigid rotation in 3D space. That is, a rotation followed by a 3D translation. The

rotation is specified by three angles, representing rotations, to be applied around the x-, y- and z-axis one after

another. The translation is represented by a 3D vector, specified by its three components as parameters. In this

case, the scale logarithmic transform represents a scale in 3D space and, therefore, it is specified by three

parameters along three axes. The scaling transform is performed on a spatial tube, whose radius should be fitted.

69

Therefore, in order to determine the final scale factor, we need to average the scale factors along the orthogonal

directions to the run of the neurite.

Another property of the scale logarithmic transform is that the parameters are passed as logarithms. In this way,

the multiplicative variations in the scale become additive variations in the logarithm of the scaling factors.

Additive variation is an essential property for the optimizer submodule, because the optimizer manages the

parameter space as a vector space where addition is the basic operation. Consequently, the effect of an additive

increment on a scale factor by the optimizer does not decrease as the factor grows. Otherwise, for example, a

scale factor variation of 1+# would be different from a scale variation of 5+#.

There are overall nine parameters to initialize in the transform submodule for both transform models: versor

rigid 3D (six parameters) and scale logarithmic (three parameters). The initialized transform models translate,

rotate and scale the corresponded point node for a better fitting. In this way, an iteration of the registration

process accomplished.

3.2.4.2 Smoothness

When performing the model to image registration algorithm, each point node of the model is considered and

fitted separately on the image. However, the point nodes are connected together in order to represent the

structure of the neurite. Hence, the position and size of neighboring points need to be considered, in order to

construct homogeneous and smoothed neurites. In addition, the noise of confocal image stacks is local and

affects the voxels, which are located in close proximity to each other. Therefore, fitting the size and position of

each point node with respect to its neighbors, make the model robust against the local noise of image intensities.

For this purpose, an algorithm, within the same framework as the registration algorithm (Figure 3-21), is

developed, which is called smoothness. The smoothness algorithm is in the metric submodule of the framework

included, and therefore, it is concurrently invoked during the registration process. The metric of smoothness

calculates the scale and translation costs for each point node in regard to its neighboring point nodes.

Figure 3-24: The smoothness’ translation cost vector T, for the point node p regarding its neighbors p1 and p2.

The point m is located exact in the middle of the dashed-dot-line, which connects p1 to p2.

The translation cost is defined as the vector from the location of the point node to the center of the line, which

connects both neighbors together (see Figure 3-24). The cost of the scale for smoothness is the difference of the

point node’s radius and the average radius of both of its neighbors. These costs are multiplied with a factor from

the interval [0,1] and combined with the cost of registration, computed by the mean square error in (Equation

3-10). As a result, the cost of smoothness affects the fitness values, which are also sent to the optimizer.

Consequently, the transform module fits each point node according to the neuron’s image intensities and the

neighboring point nodes.

70

The results of smoothness algorithm with different parameter initialization are illustrated in section 4.4.1, Figure

4-20.

3.2.4.3 Summary

Registration is an optimization problem which, in this study, is solved by an iterative process. The neuron’s

image and its approximated interpolated model are given as inputs. The first submodule is the interpolator that

computes the intensities of the neuron’s image, which are represented by each point node of the 3D model. With

the aid of these interpolated intensities, the metric submodule can compare the model with the image. The metric

measures the fitness of the model by means of two cost functions: i) for registration (mean square error) and ii)

for smoothness (translation vector and scale factor). By minimizing the calculated cost, the model can fit the

neuron’s image more accurately. Solving this optimization problem, with respect to transformation parameter, is

the task of the optimizer submodule. The optimizer needs some information to minimize the cost function. These

data, called fitness values, are provided by the metric. The metric gathers these data as inputs or calculates them

by itself. The optimizer calculates the transform parameters, which allow for a better fit with the model. Finally,

the transform submodule translates, rotates and resizes the point nodes of the model. This loop continues until

one of the termination criteria is met. The termination criteria in this framework are: i) when the number of

iterations reaches the maximum iteration value, set as a parameter, or ii) when the distances between the newly

calculated transform parameters and the old ones are less than a predefined value. An alternative to the last

termination criterion is setting a lower limit for the mean square error (the value of cost function). On one hand,

the advantage of the criterion depends on the cost function that it can be set by a single scalar value. In contrast,

there are nine transform parameters. Each of them should be checked for the termination criterion. On the other

hand, the values of the transform parameters are meaningful. They are easier to set in comparison to determining

a limit value for the cost function.

3.2.5 User Interactions

The developed algorithms perform a fully automated reconstruction process on a neuron’s image. The quality

and accuracy of the reconstructed 3D model depend on different criteria:

1. The noise of the neuron’s image causes incorrect modeling.

2. Each algorithm needs parameters to be set. The result of reconstruction depends on the parameter

initialization.

3. There are various types of neurons with different structures. The complexity of the neuron structure

affects the model’s quality.

4. Clustering and connection algorithms are decisive partial processes and their errors, in some cases,

cause incorrect modeling (the restrictions are discussed in chapter 4).

Therefore, it cannot be expected that the reconstructed 3D model would be a perfect representation of the neuron

structure. The model should be still corrected in the aforementioned cases in regard to the neuron’s image or the

general criteria derived from a typical neuron structure. Hence, user interaction is necessary for additional

adjustments. As described in section 2.6.1, we use Amira software for visualization and user interaction. The 3D

71

model is represented as a data module in a data pool and can be explored in the view window. The following

actions are developed to allow the user to modify the reconstructed model:

1. Adding, deleting or moving spheres (point nodes)

2. Connecting spheres together or deleting available connections

3. Changing the radius of each sphere

4. Changing the interpolation step (the number of spheres that are automatically added along connections)

These actions can be released directly in the view window or by pressing buttons in the properties window. As

illustrated in Figure 3-25, further parameter settings are also available in this window.

Figure 3-25: The properties of the skeleton data module in Amira.

The parameters are set for the registration process:

1. Position and radius flexibilities are the parameters for the cost function of the smoothness algorithm,

discussed in section 3.2.4.2. Flexibility indicates the value of the multiplied factor with smoothness’

cost, deducted from 1.

2. Scale bias is the factor used to merge the fitness values of interpolated points located inside and outside

of the tubular structure, described in section 3.2.4.1.1. The bigger the value is, the larger are the point

nodes. The reason for this is that, by increasing this factor, the outside located points gain more effect in

comparison to the points located inside.

3. Z-Axis strain factor specifies this phenomenon of the confocal microscopy in the current image (see

section 2.2). This factor is considered by creating spatial points in the interpolation step, discussed in

section 3.2.4.1.1, in order to locate the points at the fitting position with respect to the image voxels.

72

4. Max. translation, Min. and Max scales are parameters of the optimizer module described in section

3.2.4.1.3. They specify the lower and upper bounds of the L-BFGS-B method. The lower bound of

translation is set to zero by default.

4 Results

4.1 Classification and Regression Results of SVM

In this section, we discuss the result generated by the SVM algorithm that performs classification and regression

on both toy and real neuron images. Since the target values of real neurons are not available, the quantitative

SVM error can be measured with the aid of toy data. In order to test different cases, five toy data with distinctive

characters are created (see Table 4-1).

ID Model Sample layer image Particular characters

1

Wide range of radii (0.5 to 8).

2

Multiple branch-offs.

Thin neurites.

3

Curving neurite.

4

Branchings with different angles.

5

Parallel and nearly crossing neurites.

Thick neurites.

Table 4-1: Toy data for testing classification and regression algorithms.

73

4.1.1 Classification Errors

The classification errors are measured using two criteria, the number of correct hits and the number of false

alarms. Correct hits are the voxels located on the axis of the neurite structure and correctly classified as center

voxels. False alarms are voxels, which are not located on the axis, but are, nevertheless, incorrectly classified as

center points. These errors are calculated for each of the four SVMs which are responsible for the original image

and its three resampled images. Table 4-2 shows the classification errors of the aforementioned toy data.

SVM1: Original image SVM2: Scale 0.5 SVM3: Scale 0.25 SVM4: Scale 0.125

ID

CH FA CH FA CH FA CH FA

1 66.67% (12) 2.4"10-5% (4) 88.89% (8) 0.03% (6) 100% (9) 0.85% (20) 100% (5) 0.91% (2)

2 95.28% (62) 0.16% (133) 94.44% (17) 0.31% (31) 100% (1) 0.58% (7) - -

3 100% (29) 0.2% (24) 100% (30) 0.64% (9) - - - -

4 82.35% (14) 0.02% (23) 92.5% (37) 0.34% (57) 93.75% (15) 0.43% (8) - -

5 85.37% (35) 0.02% (29) 55.56% (5) 0.01% (2) 100% (16) 0.84% (17) 100% (2) 0.93% (2)

Table 4-2: Classification errors (CH stands for correct hits and FA for false alarms) of five toy data by four

SVMs for different scales (parameter C=100). The errors are calculated as percentaged numbers relative to the

whole number of positive or negative labeled voxels in the corresponding scale. The numbers in parentheses

indicate the absolute number of voxels. Cells with no entry belong to scales of the models, which do not possess

neurites with the associated radii (see range of associated radii with four SVMs in Figure 3-10).

In following, we are going to discuss the three main causes of classification errors in different cases of

classification of toy data Table 4-2:

1. Very thin neurites: The neurite structures with radii smaller than 0.5 cannot be determined. This error

occurs e.g. in the classification of the first model (ID=1) by SVM1 using the original image. In this

case, there are 12 correct hits and 6 missing hits39, whereas all missing hits are located at the peak of

the neurite with radii of 0.5 (see Figure 4-2 a). Figure 4-1 illustrates the feature space in this case. The

missing hits (marked with black filled circles) are located among many negative labeled data points (red

dots). Because of the noisy feature space, positive data points cannot be distinguished from the negative

ones and are classified as negative data points. The missing hits of this kind can also be found in the

case of model 2, classified by SVM1. It should be pointed out that balancing the number of positive and

negative data points does not affect the results in this case. In the illustrated feature space (Figure 4-1),

the red dots represent 10% of all negative data which are selected randomly. Heuristically, choosing

more or fewer negative labeled data points does not affect the result of the classification algorithm.

39 Missing hits in this context are the voxels located at the middle axis of the neurite structure, however not

determined by the SVM.

74

Figure 4-1: The feature space of the model of ID=1 for classifying the original image by SVM1. Red dots

represent negative labeled data points and the filled circles represent the positive labeled data points (i.e. voxels

located on the axis). The black circles indicate center voxels representing radius of 0.5, and the blue circles

represent center voxels with radii from the interval [0.5, 2].

2. Correct hits of an SVM are false alarms produced by other SVMs: Some of the false alarms are positive

classified voxels located correctly on the axis of the neurite model. However, these voxels are out of the

associated radii interval with the corresponding SVM. Therefore, they are considered as false alarms.

The reason for this is that the image of the model is an approximated sampled image and the continuous

change of radii cannot be represented exactly using voxels, which are restricted on the image grid.

Therefore, the boundary between two radii intervals cannot be distinguished based only upon the

features of neighboring voxels, which are both located on the axis of neurite. An example is provided by

original image’s classification of the model with ID=1, where a single voxel is classified correctly as

center point. However it is located outside the associated radii interval and therefore is considered to be

a false alarm (see Figure 4-2d). Obviously, this kind of false alarm does not cause a problem. The cases

containing such false alarms are model 1 by SVM1 and SVM3, model 3 by SVM1 and model 4 by

SVM3.

3. False alarms close to correct hits: In these types of false alarm errors, one frequently observes an

incorrectly classified positive voxel (false alarm) that is located close to the model’s axis on the

neighboring voxel. Figure 4-2d shows these kind of false alarms for three voxels (located in close

proximity) in the original image of the first model caused by SVM1. This classification error occurs

because these false alarms possess similar features to their neighbors, the correct hits. In most cases,

feature similarity occurs due to inexact labeling. This is because the positive labeled data points (voxels

locate on the axis) are determined by interpolating the model of toy data on its sampled image.

Consequently, the voxel located closest to the axis of the model (model’s skeleton) is labeled as a

positive data point. However, the axis of the model does not run along the exact middle of a voxel.

1st Eigenvalue

2nd Eigenvalue

3th

Eig

env

alue

Negative labelled

Positive labelled (r = 0.5)

Positive labelled (r ]0.5, 2[)

75

Under these conditions, only one of the two neighbor voxels with similar features is labeled as a

positive data point and the other one is labeled as a negative data point. Thus, the SVM classifier cannot

distinguish them from each other based on their features. Therefore, one of them is determined as

correct hit and the other one as false alarm. This kind of false alarm error can be found in almost all

cases produced by different models.

(a)

(b)

(c)

(d)

Figure 4-2: The classification results of SVM1 for the original image for the model with ID=1 (marked with

small spheres). (a) Not determined as center voxels (missing hits), (b) determined as center voxels (correct hits +

false alarms), (c) correct hits and (d) false alarms.

The classification results show that all false alarms are voxels located inside the artificial neurite structure and

that no background voxel is incorrectly determined as a center voxel. This kind of false alarm (as voxels located

inside the structure) is easy to fix because voxels can be clustered together by the next algorithm (mean shift

clustering). This causes only a minor deviation in the position of the prototype clusters. These small deviations

can be tolerated by connection algorithms and depend, to some extent, on the parameter initialization. Finally,

the registration algorithm can fix these errors.

4.1.2 Regression Errors

The regression error is calculated relative to the target radius and defined as following:

(Equation 4-1)

where

n is the number of determined center voxels by the classification algorithm,

rs,i is the radius of the i-th voxel, computed by the regression algorithm and

76

rt,i is the target radius of the i-th voxel, specified by the model

The regression algorithm is performed on all determined center voxels of the constructed toy data (Table 4-1) by

the classification algorithm. However, only the correct hits have a target radius and not the false alarms.

Therefore, the regression errors are measured with the aid of correct hits as illustrated in Table 4-3.

ID SVM1: Original image SVM2: Scale 0.5 SVM3: Scale 0.25 SVM4: Scale 0.125

1 0.082 0.017 0.083 0.019

2 0.113 0.099 2.3 " 10-6 -

3 0.187 0.352 - -

4 0.093 0.056 0.097 -

5 0.119 0.047 0.038 4.84 " 10-4

Table 4-3: The regression errors (normalized mean square error) of five toy data by four SVMs for different

scales (parameter (=0.55). Cells with no entry belong to the scales of the models, which do not possess neurites

with the associated radii (see range of associated radii with four SVMs in Figure 3-10).

The outlier with the biggest regression errors is the curving model with ID=3 in all its scales (by SVM1 and

SVM2). This case indicates that the SVM results of neurites with curves are more inaccurate in comparison with

straight neurites.

A detailed comparison of the registration errors indicates that there are shared common behaviors between the

different scales and models that we are going to discuss in this section. As described in section 3.2.1.2 regression

is achieved by four different SVMs. Each of them is responsible for an interval of radius values (see Figure

3-10). These intervals are assigned depending on the significance of eigenvalues for different radii. This

coherence also affects the performance of the regression algorithm for different target radii. The regression

algorithm shows its best performance (least errors) when dealing with the correct hits, whose target radii have

the most significant values. In contrast, significant errors occur in cases with correct hits, whose target radii do

not possess significant eigenvalues for the corresponding SVM. The detailed regression result of model 1 is

illustrated in Figure 4-3. Negative error indicates that the estimated radius is smaller than the target radius

(underestimation). In fact, the more a target radius deviates from the significant eigenvalues (the maximum is

marked with a red dashed line), the bigger the regression error (exception by scale of 0.125) will be. However,

because of overlapped radii intervals, some of the inexact estimated radii near to the intervals’ boundary can be

more precisely estimated by another SVM. For example, significant regression errors of larger radii with SVM1

and SVM3 are rectified using the SVM2 and SVM4 for the same radii. To read more about the radii estimation

of correct hits, which are determined by more than one SVM, refer to section 3.2.1.2.

77

Figure 4-3: The result of regression for each of the correct hits of model 1 by all four SVMs. The red dashed

lines represent the target radius with the most significant eigenvalues for each SVM.

Another considerable property of regression performance is that the regression errors of radii with eigenvalues

smaller than the most significant value, are mostly positive i.e. the SVM overestimate the radii.

Figure 4-4: Comparing the sign of regression errors of different radii in model 2, relative to the radius with the

most significant eigenvalues (in this scale the value of 1.5, represented with the red dashed line).

In addition, the radii with larger eigenvalues than the most significant one are normally underestimated (see

Figure 4-4).

SVM1 (Scale 1) SVM2 (Scale 0.5)

SVM4 (Scale 0.125) SVM3 (Scale 0.25)

78

The regression errors affect the result of both clustering algorithms. In the case of the first mean shift clustering

algorithm, these errors result in prototypes with radius deviation from their target radii (the average of the target

radii of all of the spheres inside the clusters). This type of error can be fixed using the registration algorithm. In

addition, regression errors can affect the result of second mean shift clustering algorithms. Thus, due to the radii

difference, not all relevant spheres are clustered together. In order to recover this error, during the clustering’s

parameter initialization, the radii difference should be taken into consideration as well. By setting the parameter

hr to an enough high value, the radius difference between two spheres (clustering’s input data points) is weighted

with a smaller value and consequently the clustering algorithm can tolerate more radius differences. In this case,

the location feature should play a dominant role in the grouping process.

4.1.3 Noise vs. Errors

In this section, we analyze the affect of noise on the results of classification and regression. The standard

deviation (%) of the Laplacian noise model indicates the occurrence probability of highly noisy voxels. In order

to compare the errors of different noisy images, the SVM classifier is trained using a default noisy image, i.e. % =

0.1.

The classifier is tested with the model 1 (see Table 4-1) equipped, however, with different noise parameter

initialization. The results are shown in Table 4-4.

! 0.01 0.05 0.1 (default) 0.2 0.4 0.8 1.6

Sample

layer

Classifi-

cation CH FA CH FA CH FA CH FA CH FA CH FA CH FA

SVM1 0.667 ' 0 0.667 ' 0 0.667 ' 0 0.611 0.0001 0.611 0.0001 0.056 ' 0 0.0 0.0001

SVM2 0.889 0.0003 0.889 0.0003 0.889 0.0003 0.889 0.0005 1.0 0.0007 0.889 0.001 0.333 0.004

SVM3 1.0 0.009 1.0 0.009 1.0 0.008 1.0 0.008 1.0 0.008 1.0 0.007 0.889 0.005

SVM4 1.0 0.0091 1.0 0.009 1.0 0.009 1.0 0.009 1.0 0.009 1.0 0.014 1.0 0.009

Regression

SVM1 0.076 0.063 0.082 0.132 0.237 0.290 not invoked

79

SVM2 0.019 0.018 0.017 0.029 0.065 0.278 0.475

SVM3 0.091 0.083 0.083 0.080 0.102 0.208 0.190

SVM4 0.022 0.020 0.019 0.026 0.027 0.042 0.077

Table 4-4: Classification and regression errors of different noisy images of the model with ID = 1. The SVM

classifier is trained using an image with a default Laplacian noise parameter (% = 0.1). CH stands for correct hits

and FA for false alarms. The SVM1 could not determine any correct hits and therefore the regression algorithm

is not invoked.

The classification algorithm works robustly even for very noisy images. The scaled resample images verify that

noise levels are efficiently reduced by scaling down the image (see Figure 4-5). Therefore, increasing the noise

causes a greater reduction in the quality of the classification results, generated by SVM1, than it does for the

resampled images (SVM2-4).

Scale 1

Scale 0.5

Scale 0.25

Scale 0.125

Figure 4-5: Different scales of a toy’s noisy image (% = 1.6).

In contrast to classification, the regression algorithm is very sensitive against noise. The errors increase

proportionally to the standard deviation of noise model (see Figure 4-6). It should be pointed out that the results

also depend on the toy data, which has trained the SVM classifier. The best results are gained by using toy data

with the same noise parameter (%) as the training model. However, the results generated by toy data with more

noisy images are obviously more inexact than those produced by the less noisy toy data. Hence, training the

SVM classifier with an image with similar noise levels to a real neuron’s image results in better regression

performance.

80

Figure 4-6: Regression error versus the standard deviation, log(%). The red dashed line indicates the default value

of % = 0.1 used also for training.

4.1.4 Results of Toy and Real Data

Table 4-5 illustrates the target value and results of SVM’s classification and regression algorithms (i.e. the

determined representative spheres) of toy data.

Models Target spheres SVM results

1

2

81

3

4

5

Table 4-5: The target and the resulted representative spheres of the SVM from toy data.

The performance of the SVM algorithms is tested on three neurons’ images with different structures and noise.

ID = 1

Image’s size: 385 " 271 " 101 voxels

Sample’s size: 79 " 70 " 32 voxels

Spacing: 0.39 " 0.39 " 0.75

ID = 2



Spacing: 0.82 " 0.82 " 1.0

ID = 3



Spacing: 0.636 " 0.636 " 1.0

Figure 4-7: Images of real neurons and the tested cutouts, marked with squares.

82

The determined representative spheres of the neuron images illustrate a common problem: thin neurites cannot

be determined. This problem was also recognizable in toy data.

Figure 4-8: The result of the SVM performed on the cutout of the neuron with ID = 1.

The result of the neuron’s image with ID=2 has an curious property: there are too many determined

representative spheres along z-axes of the image. The reason for this is that the image is very noisy in the z-

direction and therefore the tubular structure of the neuron is extra expanded in this direction. Consequently, there

is more than one voxel in the middle of the structure, which are located on a plane across the tubular neurite and

have the same features.

Figure 4-9: The result of SVM performed on the cutout of the neuron with ID = 2.

Figure 4-10: The result of the SVM performed on the cutout of the neuron with ID = 3.

83

In following sections, the aforementioned results of the SVM are used as input for further algorithms. The

successive accomplishing process of each model indicates how each algorithm can recompense the errors of the

previous one.

4.2 Grouping Results of Mean Shift Clustering

In order to assess the results of clustering algorithms, we use toy data (Table 4-1) that represent the different

critical situations produced by both clustering problems: determining point node and line node clusters.

However, clustering, as an unsupervised learning algorithm, does not specify any target value and consequently

no quantitative error can be calculated. Nevertheless, by visualizing the results of clustering, one can recognize if

the clustering algorithm has achieved the intended quality or not. The resulting clusters also depend on the

parameters. Thus, the affect of parameter initialization on the results of toy and real data is discussed in this

section as well.

4.2.1 Toy data

The point node clustering algorithm uses the results generated by the SVM, the representative spheres, as input.

Therefore, the result also contains the errors associated with the SVM. To avoid unnecessary confusion and to

visualize only the result of clustering algorithm, we use the target representative spheres of toy data (i.e. target

values of classification and regression) as input. The Line node clustering algorithm receives the point node

clusters. Hence, the quality of the result generated by this algorithm depends, among other criteria, on the result

of previous clustering. The determined point and line nodes of the test set of toy data are illustrated in Table 4-6.

Model Target spheres Point node clusters Line node clusters

1

2

3

84

4

5

Table 4-6: The result of point node clustering algorithm from toy data, using the target spheres of the SVM as

input.

In the cases of models with ID of 1 and 3, the representative spheres are sequentially located near each other.

However, the clustering features of the representative spheres are so dissimilar, that they result in the formation

of densely clustered point nodes. In the first model (and also a branch of second model) the radii are different. In

the third model the spheres have varied directions due to the curve of the model. Finesses, like curves and radii

diversity, can be more accurately modelled by using additional point nodes. In contrast, in other models, the

clustered point nodes are the result of many similar representative spheres. Therefore, the toy data are

represented with the minimum number of point nodes. However, these point nodes already represent all

important points of the toy data and, therefore, prepare a good model for future enhancements.

The line node clustering algorithm determines in models, such as 1 and 5, all connections between point nodes.

In other cases (2, 3 and 4) the connected results only partially match the original toy data. Further connections

can be found by using the connecting algorithms discussed in section 4.3.

The illustrated results in Table 4-6 are achieved using the following parameter initialization for h = {hr, hs, he,

hp} (for more details about each parameter refer to section 3.2.2.2):

1. Point node clustering: h = {0.5, 1, 0.15, 1}

2. Line node clustering: h = {4, 5, 0.3, 2}

Large parametric values lead to a reduced weighting of the corresponding feature (i.e. radius, spatial distance,

direction or parallel distance). Consequently, the clustering algorithm can tolerate more dissimilarity of that

feature between two data points in order to group them together in a single cluster. For example, the results of

point node clustering, with different parameter values for the first model, are illustrated in Figure 4-11. In this

simple case, all four components of h are assigned with equal values in order to show the changing tolerance of

clustering over the uniformly increasing parameters. The result depends especially on the parameters hr and hs

that specify the weights of special distances and radii. However, the parameter hp still affects the result as a

special type of spatial distance, which takes the direction also into the consideration. The parameter he is actually

not so important for this toy data, because the directions (i.e. the eigenvector corresponding to the smallest

eigenvalue of the Hessian matrix) have more or less the same alignment. Obviously, by increasing the parameter

values and consequently reducing the tolerance of clustering, the number of clusters (point nodes) representing

the data points (representative spheres) is reduced.

85

h = {1,1,1,1}

h = {3,3,3,3}

h = {5,5,5,5}

h = {7,7,7,7}

h = {9,9,9,9}

Figure 4-11: Result of point node clustering for the model with ID=1 with different initialization of h.

One of the most critical cases of point node clustering can be observed in the last toy data (ID=5), which

contains both parallel and nearly crossing neurites. With improper parameter values, the parallel-aligned

branches can be grouped together as a single branch. This phenomenon is already discussed in section 3.1.2.

h = {0.5, 1, 0.15, 1}

h = {1, 3, 0.15, 3}

Figure 4-12: Parallel branches can be clustered together with improper parameter assignment for point node

clustering.

Similarly to point node clustering, the results of line node clustering are also affected by the parameter

initialization. Figure 4-13 illustrates the resulting line nodes generated by the third toy data with different

parameter values. It should be pointed out, that the connections between the point nodes inside each line node

are determined with the aid of connection algorithm, described in section 3.2.3.1.

86

h = {1,1,1,1}

h = {3,3,3,3}

h = {5,5,5,5}

h = {7,7,7,7}

Figure 4-13: Result of line node clustering for the model with ID=3 with different initialization of h (parameter

initialization for previously performed point node clustering was h = {0.5, 1, 0.15, 1}). The point nodes are

scaled down in order to show the connections clearly.

As the tolerance of the clustering algorithm against the feature dissimilarity is increased, more and more point

nodes are grouped together in a line node. Thus, more connections are established between the points. One of the

important components of h, in this case, is he, which is a critical parameter in determining connections and in

representing the curves of neurite.

4.2.2 Real data

In this section, we visualize and discuss the results produced by clustering algorithms, performed on the real

neuron images. This process is illustrated in Figure 4-7. These neurons (depending on their structure, average

thickness, spacing and the noise of the images) need different parameter initialization to achieve an adequate

result.

Figure 4-14: The result of clustering algorithms performed on the image of the neuron with ID = 1. Point node

parameter h = {3, 2, 0.3, 2} and line node parameter h = {6, 8, 0.3, 4}.

87

The result of clustering with the second neuron (Figure 4-15) shows that the clustering algorithm is able to solve

the problem of z-axis strain. The determined neurite consists of multiple layers of spheres that are grouped

together. The clustering algorithm results show a few sequentially located point nodes along the neurite.

One of the important features in noisy images, that leads to inaccurate results, is the Hessian’s eigenvector

corresponding to the smallest eigenvalue, which is used here to determine the direction of point nodes and

representative spheres. Especially by thin neurites, each noisy pixel around the centre of the sphere affects the

direction of the eigenvectors.


parameter h = {0.5, 2, 0.3, 2} and line node parameter h = {1, 2, 0.5, 2}.

The resulting line nodes still cannot represent the neurites completely. Because of the SVM errors and the noise

levels of the image, the clustered point nodes do not fit the structure and, consequently, they are not similar

enough, in terms of their shared clustering features, to be grouped together in a single line node. This problem

can be solved by using the connecting and enhancement algorithms. Results generated by these types of

algorithms are illustrated in the next section.


parameter h = {2, 3, 0.2, 3} and line node parameter h = {4, 5, 0.4, 4}.

4.3 Results of Line Node Enhancements and Connection Nodes

Line node enhancements perform two actions: extending line nodes by moving the end point nodes and replacing

lone point nodes with a line node, consisting of two point nodes. After these changes, the connection algorithms

determine the final connections (as connection nodes) between the point nodes.

88

4.3.1 Toy data

In Table 4-7, the resulting toy data models, after extending line nodes (enhances models column) and

determining connection nodes (connections column), are illustrated.

The results of line node enhancements in some of toy data, such as number 2, 4 and 5, are clearer, more

noticeable and more effective than other ones. Especially, the end point node relocation in the second toy data

represents the branch out point node more precisely. Homogenous branches with constant radii, e.g. toy data

with ID of 4 and 5, after performing clustering algorithms, are not long enough to fit the structure of toy data into

the image. The reason for this is that all clustering features, except for spatial location, are similar. Consequently,

more data points are grouped together and the number of resulting point nodes is reduced. Line node

enhancement solves this problem by extending such these types of branches so that they are as long as possible

in order to allow them to cover all determined representative spheres. In this way, these branches fit the structure

accurately.

The connections accomplish the skeleton of the model. All results of the connections are satisfactory, with the

exception of the fourth model, where the branching points are not modeled by any connection node. The reason

this occurs is that the main neurite branch consists only of two point nodes. After relocating these points, their

position is such that the spatial distance from any of the branches to these point nodes exceeds the predefined

threshold of the connection algorithm. Interpolating before determining connection nodes can solve this problem.

Alternatively, one can add new end point nodes to the model instead of relocating the old ones. This allows

homogenous branches to have enough point nodes so that they can be connected with other branches.

Model Line node clusters Enhanced models Connections

1

2

3

89

4

5

Table 4-7: Result of line node enhancements and determined connections of toy data. The point nodes of the

connection column are scaled down in order to show the connections clearly.

4.3.2 Real data

The following figure illustrates the results of line enhancements and determined connection nodes of real

neurons’ models. The first model, when compared with clustering results, fits the determined structure very well.

Figure 4-17: The result of enhancements and connection algorithms performed on the image of the neuron with

ID = 1.

The second and third models clarify the effects of line node enhancements. Many locally determined segments

are extended to include their most distant representative spheres. Consequently, their end points are located near

to the next determined segment of the same neurite and can easily be connected together. In addition, alone point

nodes are split into two connected point nodes, which represent a small segment of the neuron. Hence, the

connections between the segments are determined more precisely.

90


ID = 2.


ID = 3.

Although the models of real images in this step represent just an approximation of the structure with few point

nodes, this estimation is primarily a good initialization for the registration algorithm.

4.4 Results of Registration

With the aid of the mean square error function in the metric submodule (Equation 3-10), the alignment of the

model in regard to the image can be measured. This function calculates the cost of the optimization problem,

which is minimized by the optimizer submodule. By visualizing the results generated by the registration

algorithm, performed on toy and real data, one can justify how well the registered model fits its image. The

result also depends on the parameter initialization that leads to different model transformations, e.g. setting the

flexibility of the model (smoothness parameter) regarding curves or radii changes. Smoothness parameters are

also discussed in this section. The optimization process and the produced performance values are illustrated

during execution of the registration algorithm. Finally, the result of real data will be presented.

91

4.4.1 Smoothness Parameters

As described in section 3.2.4.2, the smoothness algorithm has two parameters which consider the neighboring

point nodes in order to control the effect of smoothness transformation. In following, the affect of these

parameters on the registration algorithm, performed on an image of a real neuron, is illustrated and discussed.

(a)

(b)

(c) Scale flx. = 0.0 - Translation flx. = 0.0

(d) Scale flx. = 0.4 - Translation flx. = 0.6

(e) Scale flx. = 0.6 - Translation flx. = 0.4

(f) Scale flx. = 1.0 - Translation flx. = 1.0

Figure 4-20: Modeling of a neurite with different initialization of smoothness parameters. (a) A layer of an image

of a neurite, (b) A manually constructed model as initialization and (c)-(f) The registered models with different

flexibilities (in short flx.).

The smaller the translation flexibility is, the more the finenesses and the sharp curves of the neuron structure are

ignored. Furthermore, small scale flexibility leads to more consistent radii over the run of the neurite. By

increasing the value of flexibilities, the accuracy of the model, covering all finenesses of the neuron structure

(with respect to the image intensities), increases as well. However, noise in the image also leads to irrelevant

irregularities that can be disregarded by decreasing the value of these flexibility parameters. Finding the best

initializations depends on different criteria such as the image’s noise, neuron’s structure and the desired accuracy

or unevenness of the model. The heuristic adequate values are 0.4 for scale and 0.6 for translation flexibility

(Figure 4-20). Usually, the scale flexibility is assigned with a smaller value than the translation. The reason for

this is that, on the one hand, the radius of a point node is more strongly affected by one noisy voxel than its

92

position. Therefore, the scale transform is more sensible against the noise than against the translation. On the

other hand, the radius of a typical neurite changes slowly and continuously so that it can be more effectively

modeled with small scale flexibility.

4.4.2 Quantitative Performance

The registration process is a convergence approach. As the mean square error is reduced, the suggested transform

parameters converge toward their limits. Changes in transform parameters are specified by the optimizer

submodule which omits new transform parameters in each iteration and depends on the calculated error in the

metric. This error (Equation 3-10) is related to the intensities of the image’s voxels that compose the 3D space in

which the point nodes are transformed. The course of omitted translation parameters is proportional to the course

of error minimizations in the image’s 3D space. Figure 4-21 shows an example of simple toy data, where the

model does not yet fit its image. When registering this model onto the image, we consider the course of

translation parameters for the three marked and numbered point nodes, illustrated in Figure 4-22.

Figure 4-21: A simple toy data, used to analyze the translation convergence of three marked point nodes by

registering the model on the image.

The L-BFGS-B optimizer, which is used in the registration framework, minimizes the error while it converges

the parameters in a specific direction. After some iterations, because the level of change is small enough, a new

direction is chosen (e.g. orthogonal to the old direction in the cases of 1 and 3) in order to converge toward the

minimum error.

Point node 1

Point node 2

1

2

3

93

Point node 3

Figure 4-22: The path of changes of 3D translation parameters of three marked point nodes in Figure 4-21 in the

course of the registration process. Each small circle indicated the omitted translation parameters after an

iteration. The first and last iterations are labeled and the series of changes applied to parameters at the end of the

process is emphasized in order to make the convergence clear.

The trend of translation’s parameter is also observable in the course of mean square error value while registering

the three aforementioned point nodes (see Figure 4-23).

Point node 1

Point node 2

Point node 3

Figure 4-23: The minimizing of mean square error, calculated by metric in the registration framework, for

registering each of the three marked point nodes in Figure 4-21.

4.4.3 Real data

After the final algorithms are performed on the real data, including enhancements and connections, the estimated

model is fit enough to be registered onto the image of the neuron. Before performing the registration algorithm

on our three real data, the models are interpolated using the step of three voxels, in order to have enough point

nodes to account for modeling finesses and curves of the image.

94

Figure 4-24: The result of interpolation and registration, performed on the image of the neuron with ID = 1.

The result of registration depends on the initialized parameter, whereas, in almost all cases, the parameters can

be assigned with their default values, with the exception of the smoothness parameters (the flexibilities). In the

case of second real data, the noise and strain of the image along z-axis is large and therefore the point nodes can

easily slip up and down and result in an uneven neurite by default. Hence, the translation flexibility is set to 0.2

to fix the point nodes onto the run of the neurite structure.



After the registration process, all models are interpolated with the step of 0.5 to more effectively represent the

tubular structure. It should be pointed out that performing such a fine interpolation before invoking the

95

registration algorithm causes a model that is too noisy. This is because the model will be so flexible that all

noise, specially the z-axis strain noise, can affect the model.

5 Conclusion

The proposed algorithms in this study have advantages and limitations. The most critical part of the

reconstruction process is the classification of results generated by the support vector machine. Some of the

missing hits of the SVM are voxels representing thin neurites, which are unrecognizable by this algorithm. As

described in section 4.1.1, the feature space of SVM is so noisy that the classification boundary cannot

discriminate voxels representing thin neuritis from the negative labeled background voxels. Extending the

feature space, by including the voxels’ gray values and gradients, can solve this problem. However, this process

increases the number of false alarms because noisy voxels are frequently mistakenly determined as

representative spheres. In this case, further work is needed in order to filter out the noisy spheres or using other

features to improve the quality of decision-making, and to raise the accuracy of classification. In addition, the

results generated by SVM are considered as an initialization model for the next reconstruction steps. Therefore,

further classification or regression errors (i.e. false alarms and missing hits among correct hits) are small enough

to be ignored because these types of errors can be corrected and fitted through the use of subsequent algorithms.

After performing the mean shift clustering algorithm, false alarms lead to location deviation and regression

errors cause radius deviation of the clustered prototype spheres. Such small deviations can be easily corrected by

using registration algorithm. For further fine corrections, the implemented graphical user interaction provides the

user with sufficient semi-automatic functionality to modify the model, while visualizing the image stack of the

neuron. By adding missing spheres, restoring missing connections and, consequently, interpolating newly added

segments the model can be easily enhanced and eventually used to initiate another repetition of the registration.

The reconstruction results are already robust in many cases against the noise found in the image. Disconnected

segments of a branch are, for the most part, correctly recognized as a single neurite and isolated bright noisy

pixels are not falsely determined as pixels representing a neuron structure.

By setting adequate parameter values, the clustering, connecting and regression algorithms yield accurate results.

A crucial step in the reconstruction process, outlined in this study, is parameter initialization. Each algorithm

requires the setting of parameters. This can make initialization more complex and the goal of obtaining optimal

performance difficult to achieve. However, many of the parameters are easy to set. This is because these

parameters depend neither on the quality of the image nor on the structure of the neuron. The best suggested

values for parameters are proposed in this study and are preset by user interaction by default. In contrast, some of

the parameters for clustering, registration and connecting depend both on the properties of the neurons and the

images, and also on the user’s desired model representation. In this case, the default value of the parameter is

definitely not the best initialization. Therefore, it is recommended to apply the algorithm on a smaller but

significant part of the input neuron image in order find the best initialization, in terms of the neuron’s structure

and image acquisition noise of the image. Consequently, the whole neuron can be reconstructed using the

parameter values, which allows the sample part to perform optimally.

The results produced by the SVM classifier module depend on the training data set. In this study, all shown

results of real neuron images are reconstructed using a single SVM classifier, which has been trained using a toy

data set. In general, however, neuron structures are too varied to train a universal classifier that would guarantee

96

the best performance for all input data. Thus, if the trained SVM cannot provide the clustering with sufficient

fine representative spheres, a small part of the image can be reconstructed through the use of developed semi-

automatic algorithms, i.e. with the aid of user interaction and registration. This reconstruction can be used as a

training dataset for the SVM, which reflects the particular neuron structure and image quality. The trained SVM

can also be saved and reused for similar neuron images.

97

6 References

[ABR64] M. Aizerman, E. Braverman, and L. Rozonoer: Theoretical foundations of the potential

function method in pattern recognition learning. Automation and Remote Control, Vol. 25,

pp. 821–837, 1964.

[ALS02] K. Al-Kofahi, S. Lasek, D. Szarowski, C. Pace and G. Nagy: Rapid automated

threedimensional tracing of neurons from confocal image stacks. IEEE T Inf Technol, vol.

6, pp. 171–187, 2002.

[ART02] K. Al-kofahi, B. Roysam, J.N. Turner: Method and apparatus for automatically tracing line-

structure images, United States Patent 7072515, 2006.

[Ber02] P. Berkhin: Survey of Clustering Data Mining Techniques. In: Accrue Sotware, 2002.

[BGV92] B.e. Boser, I.m. Guyon and V.N. Vapnik: A training algorithm for optimal margin

classifiers, COLT ’92: Proceedings of the Fifth Annual Workshop on Computational

Learning Theory. New York, NY, USA, ACM Press, pp. 144–152. 1992.

[BLN95] R. H. Byrd, P. Lu, J. Nocedal: A Limited Memory Algorithm for Bound Constrained

Optimization, SIAM Journal on Scientific and Statistical Computing, Vol. 16, Issue 5, pp.

1190-1208, 1995.

[BSR04] P.J. Broser, R. Schulte, A. Roth, F. Helmchen, S. Lang, G. Wittum and B. Sakmann:

Nonlinear anisotropic diffusion filtering of threedimensional image data from two-photon

microscopy,” J. Biomedical Optics, vol. 9, no. 6, pp. 1253–1264, 2004.

[BST05] S. Bouix, K. Siddiqi and A. Tannenbaum: Flux driven automatic centerline extraction,

Medical Image Analysis, vol. 9, no. 3, pp. 209–221, 2005.

[CCL01] Chih-Chung Chang and Chih-Jen Lin: LIBSVM: a library for support vector machines,

2001.

[CoM02] D. Comaniciu, P. Meer: Mean Shift: A Robust Approach toward Feature Space Analysis,

IEEE Trans. Pattern Analysis Machine Intell., Vol. 24, Issue 5, 603-619, 2002.

[CST99] A. Can, H. Shen, J.N. Turner, H.L. Tanenbaum and B. Roysam: Rapid automated tracing

and feature extraction from retinal fundus images using direct exploratory algorithms,

IEEE Trans. Inform. Technol. Biomed., vol. 3, pp. 125–138, 1999.

[CVp95] C. Cortes and V.N. Vapnik: Support-Vector Networks, Machine Learning, Springer

Netherlands, vol. 20, pp. 273-297, 1995.

[DeC02] T. Deschamps and L. D. Cohen, Fast extraction of tubular and tree 3D surfaces with front

propagation methods, in Proc. International Conference on Pattern Recognition, Quebec,

Canada, vol. 1, pp. 731–734, 2002.

[DSO02] A. Dima, M. Scholz, and K. Obermayer: Automatic segmentation and skeletonization of

neurons from confocal microscopy images based on the 3D wavelet transform, IEEE Trans.

Image Processing, vol. 119, pp. 790–801, 2002.

[ESS05] J.F. Evers, S. Schmitt, M. Sibila, and C. Duch: Progress in functional neuroanatomy:

precise automatic geometric reconstruction of neuronal morphology from confocal image

stacks, Journal of Neurophysiology, vol. 93, pp. 2331–2342, 2005.

[Est96] Ester, M., Kriegel, H.P., Sander, J., and Xu, X.: A density-based algorithm for discovering

clusters in large spatial databases with noise. Proceedings of the 2nd International

Conference on Knowledge Discovery and Data Mining, Portland, Oregon, USA: AAAI

Press, pp. 226–231, 1996.

98

[FuH75] K. Fukunaga and L. Hostetler: The estimation of the gradient of a density function, with

applications in pattern recognition, IEEE Transactions on Information Theory, vol. 21, p.p.

32- 40, 1975.

[HBV96] M. Herbin, N. Bonnet and P. Vautrot: A Clustering Method Based On The Estimation Of

The Probability Density Function And On The Skeleton By Influence Zones, Pattern

Recognition Letters, vol. 17, pp. 1141-1150, 1996.

[HFF05] M.S. Hassouna, A.A. Farag and R. Falk: Differential fly-throughs (DFT): A general

framework for computing flight paths, in Proc. Medical Image Computing and Computer

Assisted Intervention, Palm Springs, USA, vol. 1, pp. 654–661, 2005.

[HHC03] W. He, T.A. Hamilton, A.R. Cohen, T.J. Holmes, C. Pace, D.H. Szarowski, J.N. Turner and

B. Roysam: Automated Three-Dimensional Tracing of Neurons in Confocal and Brightfield

Images. Microscopy and Microanalysis, vol. 9 , pp 296-310, 2003.

[ITK05] L. Ibá(ez, W. Schroeder, L. Ng, J. Cates: The ITK Software Guide, Second Edition,

Updated for ITK version 2.4., Insight Software Consortium, 2005.

[KLZ02] I.Y.Y. Koh, W.B. Lindquist, K. Zito, E. A. Nimchinsky, and K. Svoboda: An image analysis

algorithm for dendritic spines, Neural Computation, vol. 14, pp. 1283–1310, 2002.

[LGB92] L. G. Brown: A Survey of Image Registration Techniques, ACM Computing Surveys, vol.

24, Issue 4, pp. 325–376, 1992.

[Lin91] T. Lindeberg: Discrete Scale-Space Theory and the Scale-Space Primal Sketch.

Dissertation. Royal Institute of Technology, Stockholm, Sweden. May 1991.

[McK03] David J.C. MacKay: Information Theory, Inference, and Learning Algorithms, Cambridge

University Press, 2003.

[Paw95] J.B. Pawley: Handbook of Biological Confocal Microscopy, Plenum Press, vol. 25, pp.

228-229, 1995.

[REK03] A. Rodriguez, D. Ehlenberger, K. Kelliher, M. Einstein, S.C. Henderson, J.H. Morrison,

P.R. Hof and S.L. Wearne: Automated reconstruction of three-dimensional neuronal

morphology from laser scanning microscopy images, Methods, vol. 30, no. 1, pp. 94–105,

2003.

[SRS01] H. Shen, B. Roysam, C.V. Stewart, J.N. Turner and H.L. Tanenbaum: Optimal scheduling

of tracing computations for real-time vascular landmark extraction from retinal fundus

images, IEEE Transactions on Information Technology in Biomedicine, vol. 5(1), pp. 77–

91, 2001.

[STC99] J. Shawe-Taylor and N. Cristianini: Margin Distribution Bounds on Generalization. Lecture

Notes in Computer Science, Springer Berlin/Heidelberg, vol.1572/1999, pp. 638-639,

1999.

[SWB00] B. Schölkopf, A. Smola, R.C. Williamson, and P.L. Bartlett: New support vector

algorithms, Neural Computation, vol. 12, pp.1207–1245, 2000.

[SZM07] R. Srinivasan, X. Zhou, E. Miller, J. Lu, J. Litchman, S. T. C. Wong: Automated Axon

Tracking of 3D Confocal Laser Scanning Microscopy Images Using Guided Probabilistic

Region Merging. Neuroinformatics, Vol. 5, No 3, pp. 189-203, 2007.

[TPo89] A. Touzani and J.G. Postaire: Clustering by Mode Binary Detection, Pattern Recognition

Letters, vol. 9, pp. 1-12, 1989.

[VeJ98] T.L. Veldhuizen, M.E. Jernigan: Grid filters for local nonlinear image restoration, vol. 5,

p.p. 2885-2888, 1998.

99

[VpC74] V.N. Vapnik and A.Ya.Chervonenkis: Theory of Pattern Recognition (in Russian), Nauka,

Moscow, 1974.

[VpL63] V.N. Vapnik and A. Lerner: Pattern recognition using generalized portrait method.

Automatic and Remote Control, vol. 24, pp. 774–780, 1963.

[Vpn79] V.N. Vapnik: Estimation of Dependences Based on Empirical Data (in Russian). Nauka,

Moscow, 1979. (English translation: Springer Verlag, New York, 1982).

[Vpn95] V.N. Vapnik: The Nature of Statistical Learning Theory. Springer New York Inc., 1995.

[WHW04] C.M. Weaver, P.R. Hof, S.L. Wearne, and W.B. Lindquist, Automated algorithms for

multiscale morphometry of neuronal dendrites, Neural Computation, vol. 16, no. 7, pp.

1353–1383, 2004.

[WRE05] S.L. Wearne, A. Rodriguez, D.B. Ehlenrger, A.B. Rocher, S.C. Henderson and P.R. Hof:

New techniques for imaging, digitization and analysis of three-dimensional neural

morphology on multiple scales, Neuroscience, vol. 136, no. 3, 2005.

[WSp90] R. Wilson and M. Spann: A new approach to clustering, vol. 23, p.p. 1413-1425, 1990.

[ZBN97] C. Zhu, R. H. Byrd and J. Nocedal: L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN

routines for large scale bound constrained optimization, ACM Transactions on

Mathematical Software , vol. 23, Issue 4, pp. 550 – 560, 1997.

[ZiF03] B. Zitová and J. Flusser: Image registration methods: a survey. Image and Vision

Computing, Vol. 21, Issue 11., pp. 977-1000, 2003.

[ZZL08] Y. Zhang, X. Zhou, J. Lu, J. Lichtman, D. Adjeroh and ST. Wong: 3D Axon structure

extraction and analysis in confocal fluorescence microscopy images. Neural computation,

Vol. 20, No. 8. (August 2008), pp. 1899-1927, 2008.

Eigenständigkeitserklärung

Ich versichere hiermit, dass ich die vorstehende Diplomarbeit mit dem Titel „Algorithms of automatic

reconstruction of neurons from confocal microscopy images” selbstständig verfasst und keine anderen als die

angegebenen Hilfsmittel benutzt habe. Die Stellen, die anderen Werken dem Wortlaut oder dem Sinn nach

entnommen wurden, habe ich in jedem einzelnen Fall durch die Angabe der Quelle, auch der benutzten

Sekundärliteratur, als Entlehnung kenntlich gemacht.

Berlin, den 14.10.2008

Amir Sadeghipour

Declaration of Authorship

I confirm that this diploma thesis with the title “Algorithms of automatic reconstruction of neurons from

confocal microscopy images” and the work carried out to complete it are all of my own achievement. Where I

have consulted and used work of others in the context of this thesis, this has been clearly accredited and I have

acknowledged all main sources of help.

October 14, 2008

Amir Sadeghipour

Date post:	19-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Algorithms of Automatic Reconstruction of Neurons from the …asadeghi/pub/... · 2011. 12. 18. ·...

Documents