Multiresolution image registration for two-dimensional gel electrophoresis
Stefan Veeser, Michael J. Dunn*, Guang-ZhongYang
Royal Society/Wolfson Foundation Medical Image Computing Laboratory
*National Heart and Lung Institute
Imperial College of Science, Technology and Medicine
University of London, London, SW7 2BZ, UK
Running title: Mutiresolution image registration of 2-D gels Keywords: Image processing / Image registration / Multi-resolution analysis / Two-dimensional gel electrophoresis Non-standard abbreviations: PBM-piecewise bilinear mapping, MIR-multi-resolution image registration Address for Correspondence Dr. G.Z. Yang Royal Society/Wolfson Medical Image Computing Laboratory Department of Computing 180 Queens Gate Imperial College of Science, Technology and Medicine Tel: (+44) 20 7594 8441 London SW7 2BZ Fax: (+44) 20 7581 8024 United Kingdom Email: [email protected]
2
Summary In proteomic research, 2-DE is an important tool for investigating differential patterns of
qualitative and quantitative protein expression. The strength of the technique is due to its
unrivalled power of being able to separate simultaneously thousands of proteins. The key
to the comparison of 2-D protein profiles, however, lies in use of a fast and robust image
matching process which is essential to the subsequent quantification procedure. To satisfy
the growing demand for a robust and fully automatic method of matching 2-D gel protein
separation profiles, we describe in this paper a novel registration technique based on
image intensity distribution rather than selected features. The method uses a multi-
resolution representation of the gel profiles and exploits the fact that coarse
approximations to the optimal matching can be extracted efficiently from low-resolution
images. This permits the removal of misalignments at different scales in a systematic
manner and the strength of the new method has been confirmed by a double blind trial of
111 2-D gel pairs. The proposed method requires neither landmarks nor an a priori image
alignment, and takes about 5 seconds for processing a typical gel pair on a standard
personal computer.
3
1. Introduction The first complete genome, that of the bacterium Haemophilus influenzae, was published
in 1995 [1]. We now have the complete genomic sequences for more than 40 prokaryotic
and eukaryotic organisms, and a major milestone has been reached recently with the
completion of the human genome [2,3]. A major challenge in the post-genome era will
be to elucidate the biological function of the large number of novel gene products that
have been revealed by the genome sequencing initiatives, to understand their role in
health and disease, and to exploit this information to develop new therapeutic agents.
Proteomics will play a major role in the assignment of protein function through direct
analysis of the patterns of expression, interaction, localisation, and structure of the
proteins encoded by genomes [4].
Recently, progress has been made in the development of alternative methods of protein
separation for proteomics, such as the use of chip-based technologies [5, 6], the direct
analysis of protein complexes using mass spectrometry [7], the use of affinity tags [8, 9],
and large-scale yeast two-hybrid screening [10]. However, two-dimensional
polyacrylamide gel electrophoresis (2-DE) remains the core technology of choice for
separating complex protein mixtures in the majority of proteome projects [11, 12]. This
is due to the unrivalled power of 2-DE to separate simultaneously thousands of proteins,
the subsequent high-sensitivity visualisation of the resulting 2-D separations [13] and the
relative ease with which proteins from 2-D gels can be identified and characterised using
highly sensitive microchemical methods [14], particularly those based on mass
spectrometry [15].
4
In many proteomic projects, 2-DE is used as a tool to investigate differential patterns of
qualitative and quantitative protein expression. This requires the comparative analysis
often of large sets of 2-D protein profiles, as task that is impossible without the use of
computer systems. Advances in 2-D gel technology, particularly with the use of IPG for
the first-dimension IEF separation [11], have significantly improved the reproducibility
of 2-D separations [16, 17]. Nevertheless, 2-DE remains an imperfect technique such
that no pair of 2-D gel patterns can be directly superimposed. This has necessitated the
development of specialised software packages, many of these having their origins in the
late 1970’s and early 1980’s, for example TYCHO [18], GELLAB [19], QUEST [20],
and ELSIE [21]. Most of these systems have matured into commercial packages,
including PDQuest, Kepler, Melanie, Phoretix and BioImage, and these have found some
acceptance in the 2-DE community [22].
It has been acknowledged that the key to the comparison of 2-D protein profiles lies in
use of a fast and robust image matching process, which is essential to the subsequent
quantification procedure. Currently, most of the matching procedures used employed in
the current generation of software packages require extensive user interaction. They
normally require users to specify manually from 20 to 50 corresponding spots that are
present in each 2-D gel making up the data set. A common paradigm to these methods is
to divide the matching process into the following three main steps:
• Pre-processing
5
• Spot detection
• Pattern matching
The pre-processing step is concerned with noise removal, background intensity bias
correction, and elimination of streak artefacts. Spot detection, on the other hand, is
mostly performed with Gaussian fitting or Laplacian filtering, which results in a list of
spot co-ordinates and values describing the intensity and the area covered by each spot.
This is usually followed by a pattern matching procedure for establishing corresponding
spots in the reference and target images based on user-defined landmarks.
There is evidence that in practice it may not be appropriate to follow the paradigm of
dividing the image registration task into two separate tasks; spot detection and pattern
matching. Firstly, precise extraction of spot location is computationally intensive and
error prone. In the case of complex patterns where spots are located very close to each
other or occur as overlapping regions, this can impose major challenges [23]. Secondly,
by reducing the gel images into a list or a graph of spot locations, a considerable amount
of visual information, such as shape and intensity spread, is lost for the subsequent
matching procedure. Thirdly, algorithms following this paradigm of spot detection and
pattern matching must cope with the combinatorial explosion of possible matches
between numerous small local patterns in each gel.
6
In an attempt to alleviate the above problems, a recent system called CAROL [24] uses
the smallest local pattern possible, i.e. pairs of locally intensive spots, called intensive
edges.
With the maturity of intensity based image registration in medical imaging [25], there has
been a growing amount of interest in applying this method for gel image processing.
With this approach, the process of spot feature detection is avoided, and the complete raw
gel images are matched according to their intensity distributions. The images can then be
displayed in an overlaid fashion using a subtractive colour scheme to highlight the
intrinsic structural differences between the gel pairs. The concept of intensity based
image registration is an actively researched area in the computer vision and image
processing community and has been used for both uni-modal and multi-modal
registration tasks in medical imaging, camera pose estimation for navigating robots,
video compression, and human gesture recognition. One of the first systems that adopted
this approach for electrophoresis is a software environment called Z3 developed by
Compugen [26]. The software has clearly demonstrated the advantages of using
intensity-based registration for 2-D gel image analysis, despite the drawbacks in handling
images with large distortion and sparsely distributed protein spots.
The purpose of this paper is to present an improved image registration technique based on
multi-resolution analysis [27]. The method adopts a coarse- to-fine feature-matching
paradigm such that the combinatorial complexity of the possible matches between a large
number of small local patterns in each gel is avoided. It also permits high-order non-
7
linear distortion to be recovered in an accurate way. The method only makes general
assumptions on the underlying structure of the spot distribution, and therefore is
applicable to all gel patterns. The method has been applied to a set of 111 gel pairs and
the assessment results have shown the strength of the proposed technique.
2. Methods
The scales of deformation associated with 2-D gel images can range from the whole
image down to areas of several square millimetres. Our goal is to devise a procedure for
the registration that is accurate yet efficient. A multi-resolution approach has been
adopted so as to handle both global and local deformations. The basic idea is that the
distortions, which produce the misalignment in the two gels, can be decomposed into
their respective components in each resolution level. By using low-resolution images of
the gels, only the coarse components of the distortions will be reflected and thus a rough
approximation to the optimal geometric transformation can be calculated. Once the
coarse misalignments are eliminated, the finer distortions on the next resolution level can
be dealt with. This process can be conducted in a recursive manner to derive the final
optimal transformation. The efficiency of the algorithm is reflected from the following
considerations. First, the processing of coarse transformation from low-resolution images
is computationally efficient. Secondly, misalignments on finer scales can be decoupled
from each other, as they are limited to small, non-overlapping areas. This is particularly
suitable for it to be used with a parallel architecture. The software implementation of the
8
described algorithm for Windows 98/2000/NT environment as well as some sample
images is available to download at www.doc.ic.ac.uk/~gzy.
2.1 Multi-resolution representation of image deformation
The registration of two images 21 , II consists of finding a transformation t, such that the
difference between 1I and t( 2I ) is minimal according to a predefined similarity measure
sim. In image registration, it is commonly formulated as an optimisation problem such
that the parameter vector c, which represents the ideal transformation tc, will maximise an
objective function ))(,()( 21 ItIcorrcf c= .
The choice of the transformation to be used depends on the knowledge of the deformation
to be recovered. For 2D gel images, the distortion introduced can be of higher orders and
its exact modelling can be difficult. For this reason, we have chosen to use
transformations defined by Piecewise Bilinear Maps (PBM) to represent the associated
deformation. Multiresolution decomposition can be applied to PBMs resulting in a
hierarchy of transformation spaces, where transformations from spaces with higher
dimension can model more localised and finer detail of the distortion. Hierarchical
transformation spaces have been used successfully in image registration [27-33].
Note that the transformation t of a source image I into the target image t(I) is defined by a
map m in the opposite direction, namely mapping points in the target image onto points
of the source image. This way m can be used to determine the intensities at different
points p in the target image by sampling them from points m(p) in the source image.
9
A PBM consists of a lattice of maps illustrated in Fig. 1. In order to built this lattice the
target image is first partitioned into ll 22 × regular squares, where l is called the level-of-
detail for the corresponding transformation. For a given level and a given index (i,j) the
vertices 1,1,1,,1, ,, ++++ jijijiji aaaa of the corresponding square in the target image and the
control points 1,1,1,,1, ,, ++++ jijijiji cccc of the corresponding quadrilateral in the source will
define the mapping function m. A point q, which lies in a square 1,1,1,,1, ,, ++++ jijijiji aaaa in
the target image, is mapped according the following weighted sum of the corresponding
control points.
1,11,11,1,,1,1,, ),(),(),(),()( ++++++++ ⋅+⋅+⋅+⋅= jijijijijijijiji cvucvucvucvuqm ωωωω
where
vuvu
vuvu
vuvu
vuvu
ji
ji
ji
ji
⋅=
⋅−=
−⋅=
−⋅−=
++
+
+
),(
)1(),(
)1(),(
)1()1(),(
1,1
1,
,1
,
ω
ω
ω
ω
The values of u and v are the horizontal and vertical ratios of point q, as indicated in Fig.
1.
While the vertices in the source image jia , are fixed at a given detail level, the vertices
jic , can vary to represent different maps. For a given detail level a PBM is defined by a
parameter vector )( , jicc = containing all the co-ordinates of the control points. All
possible parameter vectors of a certain detail level l define the linear space lT . By
subdividing each square in the target image into 4 smaller squares the linear space 1+lT
10
for the next higher detail level is defined. Each parameter vector )( ,l
jic in lT can be
subdivided such that it becomes a parameter vector }{ 1,+ljic in 1+lT , representing the same
mapping. The subdivision scheme is defined by:
+++++
=
++++
+
++
oddjoddi
oddjeveni
evenjoddi
evenjeveni
if
if
if
if
cccc
cc
cc
c
c
lba
lba
lba
lba
lba
lba
lba
lba
lba
lji
,
,
,
,
)(
)(
)(
1,11,,1,41
1,,21
,1,21
,
1,
where 22 , ji ba == , and x means the greatest integer smaller than x.
By subdividing the map between the target image and the source image, the number of
parameters increases approximately by a factor 4. By varying these new parameters it is
possible for the corresponding transformation to control more local distortions in the
given image.
2.2 Similarity Measure For intensity based image registration, there have been a number of similarity measures
used in practice [34]. 2-D gels prepared from similar samples in two different research
laboratories might vary in contrast as well as brightness, which is equivalent to changes
in mean intensity and variance. A sensible similarity measure should be invariant to these
variations. We have used the following cross-correlation between two intensity
distributions for this purpose
11
)()(
)cov(),(
21
2,121 II
IIIIcorr
σσ=
where
xxx dIIIID
IID∫ −−= ))()()((
1),cov( 221121
∫=D
dID
I xx )(1
and
xx dIID
ID∫ −= 22 ))((
1)(σ
In the above equations D 2ℜ⊂ is the domain of points to be considered for the
registration process and the images 21 , II are represented as functions },...,0{: NDI → ,
which give an intensity value )( pI for each point p in D. 2ℜ denotes the two-dimensional
Euclidian space.
The covariance cov stated above is a measure of the association of two random variables
that is invariant to changes in the mean values. By normalising this with the standard
deviations, the similarity measure derived from cross-correlation is made to be immune
to changes in variance as well. The value of corr varies between [-1,1], where value 1
reflects a perfect alignment of the two images.
One advantage of this measure is that the gradient of the resulting objective function
12
))(,()( 21 ItIcorrcf c= can be calculated fast. This allows the use of faster optimisation
algorithms for the search of the optimum transformation at each level. With “o ”
representing the composition of functions we have
)())(,()( 2,121 cc mIIcorrItIcorrcf o== .
By using the chain rule we can therefore calculate the gradient
}2,...,0{,,,
,)(lji
yji
xji c
f
c
fcf
∈
∂∂
∂∂
=∇
)()(
)()cov()()cov(
22
1
22,122,1
c
cccc
mII
mImIImImII
o
oooo
σσ
σσ ∇⋅−⋅∇=
where
)(,cov()cov( 212,1 cc mIImII oo ∇=∇ )
)(
))(,cov()(
2
222
c
ccc mI
mImImI
o
ooo
σσ
∇=∇
( ) ( )DpcccyxDpcc pmpmIpmImI
∈∈ ∇⋅∇=∇=∇ )())()(())(()( 2,22 o
In the above equations, ),( 222, III yxyx ∂∂
∂∂=∇ represents the gradient of the intensity in
the image and
)( =∇ pmcc
)2...0{,,,
,,
)()(
)()(
lji
yji
yc
xji
yc
yji
xc
xji
xc
c
pm
c
pm
c
pm
c
pm
∈
∂∂
∂∂
∂∂
∂∂
13
is a 11 2222 ++ ⋅×× ll matrix containing the partial derivatives of the piecewise bilinear
map with respect to the co-ordinates of each control point. For a given point p in the
target image it represents the change in the co-ordinates of the mapped point m(p) in the
source image with respect to the change of control point co-ordinates. A 22 × submatrix
of )( pmcc∇ for a given control point jic , equals
)(0
0)(
,
,
p
p
ji
ji
ωω
if p lies in an
adjacent square, otherwise all entries of the matrix are zero.
2.3 Optimisation With the transformation function and similarity measure defined, the next step towards a
registration algorithm is to find a suitable optimisation procedure to generate the optimal
transformation parameters in the allowed degrees of freedom. The choice of the
technique depends on the smoothness of the objective function and the presence of
misleading local maxima in the vicinity of the global maximum. In the case when the
objective function has many local maxima, algorithms like statistical gradient descent,
simulated annealing or exhaustive search have been used [35]. Such algorithms can be
computationally expensive and in the case where no local maxima are present, techniques
such as Powell’s method or the downhill simplex method are commonly adopted [36].
One unique feature of our proposed method is to exploit the derivative information of the
two images to accelerate the optimisation process based on a commonly used variable
metric method called the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [36].
Variable Metric Methods in general are extensions of Netwon’s method, which identify
14
the maximum of a function by searching for a zero of the gradient. In each step of the
Netwon’s method, the objective function is approximated by a second order Taylor
polynomial and the displacement leading to the unique stationary value (minimum or
maximum) of this quadratic is calculated directly. Variable metric methods use the same
strategy but do not require the second order partial derivatives needed for the Taylor
polynomial. Instead, it approximates these values using information from previous
iterations. Similar to the Newton method, quadratic convergence close to the maximum is
guaranteed.
2.4 Algorithm design The flow chart in Fig. 2 schematically illustrates the proposed algorithm, which is to be
referred to as MIR (Multi-resolution Image Registration) in the subsequent sections.
Starting with an initial scaled rigid transformation, the main loop recursively generates
transformations with gradually higher level of detail, which approximate the optimal
transformation with improving quality. In other words, the optimisation problem
presented above is divided in a sequence of optimisation problems, with an initial stage
for the parameters of a scalable rigid transformation and additional ones for PBMs at
each detail level. The starting point of the optimisation in the initial stage is the identity
transformation. For each subsequent stage, the solution of the previous iteration is used. It
is worth noting that the first subdivision is different from the subsequent ones, as it is
generating the initial grid rather than refining an existing one. The positions of the 4
control points of this initial grid are determined such that they represent the scalable rigid
15
transformation found in the first optimisation stage. This is achieved by identifying the 4
control points with the corner points of the image after the scalable rigid transformation.
Fig. 3 shows the result of the optimisation process at different detail levels for a typical
gel pair. The gel images are overlaid using a special subtractive colour scheme. Instead of
simultaneously reducing the amount of all three colour components, (red, green and blue)
to produce the usual shades of grey for structures in the images, only green is reduced in
the first image, rendering its structures in magenta. In the second image, red and blue are
reduced, rendering the structures in green. In the overlay image, reductions for green are
controlled by the first image and reductions for red and blue are controlled by the second
image. This means that the structures will appear as grey or black, if they coincide in both
images otherwise, they will appear as green or magenta. The PBM mapping used for
registering the images is indicated by the corresponding control grid in red.
Fig. 4 shows the same grids in green, however this time they are subdivided such that
they contain the same number of quadrilaterals as the grid of the (final) optimal
transformation, which is overlaid in red.
It can be seen from both figures, that the corrections made to the image become smaller
as details increase. This is equivalent to saying that after the best arrangement for a
quadrilateral is found, the maximal residual misalignment in this area will only be a
constant fraction of the side-length of the quadrilateral. In our experiments it turned out
that this fraction is around 161 in the average case and not more than 41 in the worst
16
case. This suggests that the size of the misalignments present in the images decreases
exponentially with the side-length of the quadrilaterals with increasing level of detail,
that is with )21( lO .
2.4.1 Multiresolution representation of the gel images The strategy of multiresolution representation is achieved by applying different degrees
of blurring to the gel images. At a given level, misalignments that are localised to areas
smaller than details conveyable by the corresponding image resolution will become
invisible. This permits the retrieval of mis-registration following a coarse-to-fine
paradigm. The blurring was done by reducing the resolution R of the images. An image at
resolution R consists of RR 22 × pixels. Initially the images were sampled at a resolution
of 10=R . The algorithm works with resolutions down to 5=R . For each lower
resolution the intensities of four corresponding pixels in the image at the previous
resolution are averaged. It has been shown that the ideal smoothing function is Gaussian
convolution [37]. The averaging method adopted here can be seen as the result of
convolving the image with a discrete Gaussian filter [38].
Before running BFGS in lT , the resolution is reduced from 10 to lR += 5 , which means
that the pixel size is about a 321 th of the average size of the quadrilaterals in the control
point grid at detail level l. This is half of the size of the misalignments found at this level
in the average case and therefore enough to blur them. Fig. 5 shows again the result of the
17
optimisation at different detail levels for the same gel pair. This time the gel images are
blurred according to the scheme explained above.
One important advantage of the blurring process is that it helps to meet the basic
requirement for the BFGS algorithm, that there should be no misleading local maxima
near the global maximum. Fig. 6 shows how blurring helps to remove such local maxima
in an simplified example where two similar one dimensional images have to be registered
by finding the optimal shift. At high resolution, the objective function has maxima for
two different shifts. One of them is only sub optimal, as it mis-registers the peaks within
the two images. At lower resolution, this sub optimal maximum vanishes. Although the
abscissa of the remaining single maximum does not exactly coincide with the optimal
shift, it can be used as a good starting point for the optimisation at higher resolution. This
starting point is close enough to the optimum to prevent BFGS from becoming trapped in
the local maximum.
18
2.4.2 Decoupling of local registration for efficiency
For a given detail level, the inner loop of the algorithm, as indicated by Fig. 2, optimises
the quadrilateral 1,1,1,,1, ,, ++++ jijijiji cccc for each square 1,1,1,,1, ,, ++++ jijijiji aaaa in the image
domain of gel image 1I separately. We found empirically that for the optimisation on
finer detail levels, only a small number of these quadrilaterals need major rearrangement,
as indicated in Fig. 4. By locally optimising the control grids, it is possible for the
optimisation process to rapidly converge or even skip areas that do not require further
adjustments, which leads to considerable savings in execution time.
The strategy of local registration through decoupling can be justified by the following
facts. If the control points are already close to their optimal positions, the separate
optimisation of each control point leads to the same solution as the optimisation of all
control points in parallel. The optimal value of the similarity function is achieved in both
cases by maximising the local contributions of each quadrilateral to the global similarity
function. These contributions are independent of each other because they are achieved by
small rearrangements of the control points, which adjust the correlation of both images in
locally bounded, separate areas. A slight dependence is left in the case of neighbouring
squares and the optimisation was therefore done in the order of decreasing intensity
variance occurred in the squares.
19
2.5 Performance Evaluation
2.5.1 Gels used for Evaluation
For the assessment of the proposed method, gels from three different cell biological
experiments were considered. Each of these experiments aimed to detect changes in
protein expression by comparing 2-D gel profiles from a control group with those from a
candidate group(s), see Fig. 7. In the first experiment, the effect of a plant extract on the
protein expression of IBR3 human dermal fibroblasts was investigated. Both control and
candidate samples were taken from homogeneous cell cultures grown in the laboratory,
the candidate samples were treated with the plant extract. Four gels were generated for
the control group and 9 for the candidate group.
The 2-D gel profiles for the second experiment were generated using samples of human
serum. Here the candidate samples came from several individuals with Paget’s disease.
The control samples were taken from normal, healthy individuals. The control group
consisted of 10 gels and the candidate group contained 4 gels.
The third experiment was a time course experiment to investigate the potential effects of
proteolysis on a total protein extract of human heart tissue prepared for 2-DE. One
sample of human heart was solubilised using standard sample lysis buffer (9.5 M urea,
2% w/v CHAPS, 0.1% w/v DTT, 0.8% w/v 2-D Pharmalyte pH 3-10). The other sample
of human heart was solubilised using the same lysis buffer to which a cocktail of protease
inhibitors had been added (Complete Mini, Roche, Mannheim, Germany). The samples
were then incubated at room temperature. Aliquots of each sample were removed and
20
frozen at -70°C after 0, 20, 40, 70 and 120 min. From each sample (100 µg loading) a 2D
gel was generated and the 2D protein separation patterns were visualised by silver
staining by our standard protocols [39, 40]. The gels were scanned with a laser
densitometer at 100 µm resolution. The candidate and control gels were given by the sets
of 2D gel profiles generated using the samples prepared in the absence and presence of
protease inhibitors respectively. Both candidate and control groups contained 5 gels, one
for each time point.
The 2-D gels shown in Fig. 8 are typical for each of the 3 experiments. It can be seen that
they vary significantly in spot size, distribution and dominance of streaks. All the 2-D
gels generated in the plant extract experiment are very similar, because they originate
from samples taken from homogeneous cell cultures. However, the human serum 2-D gel
patterns are very different because they were taken from different patients and control
individuals. Based on this we can say that the test included examples from both ends of
the range of 2-D gels commonly dealt with in laboratories.
2.5.2 Intra and Inter image group assessment
The performance of the proposed MIR algorithm was tested in two ways, see Fig. 9. The
first test, called Intra Assessment, consisted in comparing the relative performance of our
method and the Z3 software from Compugen. This is the only fully automated stand-
alone 2-D gel matching software known to the authors at this time. For this experiment,
only 2-D gels taken from the same group were matched for assessing the intra-group
variations. Both algorithms were applied to 15 pairs taken from the control group of the
21
human serum experiment and 15 pairs taken from the control group of the plant extract
experiment. An experienced observer scored the results depending on how well the
corresponding spots were matched. The scoring was done in double blind fashion.
A second test, called Inter Assessment, (see Fig. 9) was to assess the performance of MIR
on 2-D gel pairs consisting of one control gel and one candidate gel. The matching of
these gels was much more difficult as some image features were present in only one of
the gels. For the first part of the Inter Assessment MIR was run on pairs consisting each
of a candidate gel and a control gel from the human serum experiment and from the plant
extract experiment. The scoring was done along the same lines as in the Intra
Assessment. In the second part of the Inter Assessment, it was determined how well the
spots by which the two 2-D gels differed could be identified. For this the expert was
asked to analyse 2D gel images generated in the heart time course and plant extract
experiment only on the basis of the overlaid images generated by MIR. The results of this
analysis were compared with the results of an analysis on the same 2-D gels using other
techniques.
2.5.3 Scoring Scheme
For the purpose of quantitative evaluation we define P as the number of misregistred
spots found in the image of overlaid 2-D gels, that has to be scored. It is important only to
count spots as misregistred, if they do not overlap sufficiently but (a) truly represent the
same protein and (b) are defined enough to be meaningful. Each gel used for evaluation
contained around 600 meaningful spots. We defined two spots being registered if and
22
only if more than 75% of their area is overlapped. Fig. 10 illustrates several examples for
both the cases where two candidate spots are considered to be registered and un-
registered respectively. By using these criteria the scores shown in Table 1, were
devised.
3. Results
3.1 Intra-group variation
For the Intra Assessment, 15 2-D gel pairs from the control group of the experiment with
the plant extract and 15 2-D gel pairs from the control group of the experiment with
human serum were used. The images presented to the expert were randomised in their
names and order, such that neither the authors nor the expert were able to tell whether a
given image was the result of a run of Z3 or MIR.
Figs. 11 and 12 show the frequencies of scores for Z3 and MIR. It can be seen clearly
from these figures, that MIR scored significantly better than Z3 and achieved a smaller
deviation from the mean score. Z3 achieved a mean score of 2.19 and 3.53 for the Plant
Extract gels and Human Serum gels, whereas the mean scores for MIR were 3.53 and
4.87, respectively. In 29 out of a total of 30 cases MIR scores better than or equal to Z3.
On the 2-D gels of human serum, both algorithms decrease in their performance. This is
due to the increased difficulty of matching, because of more dominant streaks, more spots
which are washed together and a less dense distribution of spots. Similar to the Z3
program, the proposed algorithm requires around 5 seconds for a single run on a Pentium
II P300 machine.
23
3.2 Inter-group variation
For the first part of the Inter Assessment, MIR was run on a total of 76 pairs consisting
each of a candidate 2-D gel and a control 2-D gel. Forty gel pairs were taken from the
human serum experiment and 36 pairs were taken from the plant extract experiment.
Figs. 13 and 14 show the frequencies for the scores for these experiments. The mean
scores were 2.83 for the Human Serum Gels and 3.48 for the Plant Extract Gels, which
represents a decline of approximately one unit, compared with the scores in the Intra
Assessment. The decline is to be expected as non-matching features complicate the image
registration. However the scores achieved in the Inter Assessment are still reasonably
good, as a score of 3 means that not more than 10 corresponding spots were mis-
registered. This means that only around 1 % of the total number of spots could not be
matched. Table 2 summarises information on all the runs of MIR in Inter Assessment as
well as Intra Assessment.
As stated above, the aim of the second part of the Inter Assessment was a more
qualitative assessment of the usefulness of the matching procedure. For this the expert
was asked to analyse 2-D gel images generated in the plant extract experiment and in the
time course experiment using only the images produced by MIR.
In the case of the plant extract experiment, there was, in addition to the control group and
the candidate group (Plant Extract A) a third group of 2-D gels. These were generated
from the same type of cells but treated with a different kind of plant extract (plant extract
24
B). The aim of the analysis was to find out whether 2-D gels generated using plant extract
A could be identified if presented along with 2-D gels generated using plant extract B.
The identification could only be done by using the spots by which each type of 2-D gel
differed from the control. The expert was therefore confronted with two groups of
images, one group containing matches between control 2-D gels and extract A 2-D gels
the other group containing matches between control 2-D gels and extract B 2-D gels.
Like in the original analysis, where other tools had been used, the expert was able to
identify the group that contained the 2-D gels generated from plant extract A.
In the case of the time course experiment, the purpose was to investigate protein
proteolysis in a sample of human heart proteins prepared for 2-DE in the absence and
presence of protease inhibitors. The candidate 2-D gels were run using aliquots of the
two samples taken at 5 successive time points. In order to identify the affected proteins,
the candidate 2-D gels (no protease inhibitors in the sample) were compared with control
2-D gels (protease inhibitor cocktail present in the sample). The expert was confronted
with 5 images, each of which was a match between a candidate 2-D gel and a control 2-
Dgel sampled at a certain time point. As in the original analysis, the proteins that were
subject to proteolysis under these conditions could be identified. Fig. 15 illustrates this
result by showing the gels obtained after incubation of the samples at room temperature
for 0 (A-C) and 2 (D-F) hours. At time 0, the 2-D protein profiles are essentially identical
in the presence (A) and absence (B) of protease inhibitors. After registration these two
images matched perfectly so that almost all of the spots appeared grey (C). However,
after 120 minutes in the absence of protease inhibitors (E), considerable proteolysis was
25
evident with the appearance of many additional protein spots (arrows in E) compared
with the sample incubated in the presence of protease inhibitors (D). These 2D Gel
images could still be registered very well by MIR, but the additional protein spots due to
proteolysis were clearly visible in the registered image as green features (F).
4. Discussion
It is worth noting that although MIR shows a significant improvement in performance
there are still a couple of cases, 16 out of 111, where the results are not satisfactory (more
than 10 spots mis-registered). 15 of these 16 cases come from the experiment on human
serum gels. The problems of the algorithm with these 2-D gel pairs can be ascribed to the
fact that the gels contained only two very different kinds of geometric features. First,
there are very broad, widespread homogeneous shapes, originating from linear series of
large saturated spots which are coalesced and secondly there are a few sparsely
distributed, well localised small spots. The broad features differ strongly in their
geometrical outline between gels and also the small spots do not always have
corresponding partners.
Although the algorithm matches the corresponding broad features in almost all the cases,
the difference in their geometric shape makes the coarse registration so imprecise, that
corresponding small spots do not come close enough to be registered on the high detail
levels. The scores turned out low in this case, because it was these small spots which
were of interest in the experiment. These examples show a potential drawback of the
method. As it is based on the idea of using the extra geometric information present in the
26
intensity distribution for registration, it must fail if this extra information is strongly
misleading.
There was one gel pair from the plant extract experiment that did not achieve a
satisfactory score, due to severe local distortion. The local misalignments present in this
gel pair were of the size of a whole side length of a quadrilateral, instead of the expected
worst case of ¼ of a side length. Fig. 16 illustrates one such situation for two
corresponding sections. The two images on the top represent the sections with the critical
area encircled. The image pair at the bottom shows a manually found transformation,
which registers section 1 on section 2. Note the extreme distortion of the grid in the
critical area. This demonstrates the limit of the proposed method. If the distortions on a
certain detail level exceed ¼ of the side length, it can no longer be assumed that
corresponding structures overlap enough to guide the registration process, i.e. BFGS is
likely to terminate in a local maximum.
5. Concluding remarks
In this paper we present a new algorithm based on the BFGS optimisation method for
fully automatic matching of 2-D gel protein separation profiles based on multiresolution
image registration. The method exploits the fact that coarse approximations to the
optimal transformation can be extracted efficiently from low-resolution images. In this
way it is possible to remove misalignments at different scales in a systematic manner. In
a double blind trial the method performed significantly better on a set of 30 2-D gel pairs
than a commercial which also follows the intensity based method. It showed excellent
27
results in 90 % of the cases. In another double blind trial the system was tested on 76
more demanding matching problems and gave satisfactory results in 81 % of the cases.
Acknowledgements
The work in MJD’s laboratory is supported by the British Heart Foundation and
Proteome Sciences PLC. SV is supported by an undergraduate exchange scheme between
the Royal Society/Wolfson Medical Image Computing Laboratory at Imperial College
and the Universitaet Karlsruhe in Germany.
References
[1] Fleischmann, R..D., Adams, M.D., White, O., Clayton, R.A.. et al., Science 1995,
269, 496-512
[2] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W. et al., Science 2001, 291,
1304-1351.
[3] International Human Genome Sequencing, Nature 2001, 409, 860-922.
[4] Banks, R., Dunn, M.J., Hochstrasser, D.F., Sanchez JC, et al., The Lancet 2001,
356, 1749-1756.
[5] Merchant, M., Weinberger, S.R., Electrophoresis 2000, 21, 1165-1177.
[6] Nelson, R.W., Nedelkov, D., Tubbs, K.A., Electrophoresis 2000, 21, 1155-1163.
[7] Link, A.J., Eng, J., Schieltz, D.M., Carmack, E., et al., Nature Biotechnol. 1999,
17, 676-82.
28
[8] Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., et al., Nature Biotechnol. 1999, 17,
994-999.
[9] Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., et al., Nature Biotechnol. 1999,
17, 1030-1032.
[10] Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., et al., Nature 2000, 403, 623-627.
[11] Görg, A., Obermaier, C., Boguth, G., Harder, A., et al., Electrophoresis 2000, 21,
1037-1053.
[12] Dunn, M.J., Görg, A., in: Pennington, S.R. and Dunn, M.J. (Eds.), Proteomics,
From Protein Sequence to Function, BIOS Scientific Publishers, Oxford 2001,
pp.43-63.
[13] Patton, W.F., in: Pennington, S.R. and Dunn, M.J. (Eds.), Proteomics, From
Protein Sequence to Function, BIOS Scientific Publishers, Oxford 2001, pp. 65-
86.
[14] Wilkins, M.R., Gooley, A., in: Wilkins, M.R., Williams, K.L., Appel, R.D.,
Hochstrasser, D.F. (Eds.), Proteome Research: New Frontiers in Functional
Genomics (Wilkins, M.R., Williams, K.L., Appel, R.D. and Hochstrasser, D.F.,
eds.), Springer-Verlag, Berlin 1997, pp. 35-64.
[15] Patterson, S.D., Aebersold, R., Goodlett, D.R., in: Pennington, S.R. and Dunn,
M.J. (Eds.), Proteomics, From Protein Sequence to Function, BIOS Scientific
Publishers, Oxford 2001, pp. 87-130.
[16] Corbett, J.M., Dunn, M.J., Posch, A., Görg. A., Electrophoresis 1994, 15, 1205-
1211.
29
[17] Blomberg, A., Blomberg, L., Norbeck, J., Fey, S.J., et al., Electrophoresis 1995,
16, 1935-1945.
[18] Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P., Anderson, N.G., Clin.
Chem. 1981, 27, 1807-1820.
[19] Lemkin, P.F., Lipkin, L.E., Lester, E.P., Clin. Chem. 1982, 28, 840-849.
[20] Garrells, J.I., J. Biol. Chem. 1989, 264,5259-5282.
[21] Olson, A.D., Miller, M.J., Anal. Biochem. 1988, 169, 49-70.
[22] Pleissner, K.P., Oswald, H., Wegner, S., in: Pennington, S.R. and Dunn, M.J.
(Eds.), Proteomics, From Protein Sequence to Function, BIOS Scientific
Publishers, Oxford 2001, pp. 131-149.
[23] Kriegel, K., Seefeldt, I., Hoffmann, F., Schultz, C., et al., Electrophoresis 2000,
21, 2637-2640.
[24] Pleissner, K.P., Hoffmann, F., Kriegel, K, Wenk, C., et al., Electrophoresis 1999,
20, 755-765.
[25] Maintz, J.B.A., Viergever, M.A., Medical Image Analysis 1998, 2, 1-36.
[26] http://www.2dgels.com/
[27] Lester, H., Arridge, S.R., Pattern Recognition1999, 32, 129-149.
[28] Hajnal, J.V., Hill, D.J., Hawkes, D.J., Medical Image Registration, CRC Press,
London, 2001.
[29] Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.,
IEEE Transactions on Medical Imaging 1999, 18(8), 712-721.
[30] Starck, J.L., Murtagh, F., Bijaoui, A., Geometric Registration, Cambridge
University Press, Cambridge, 1998.
30
[31] Szeliski, R., Shum, H.Y., IEEE Transactions on Pattern Analysis and Machine
Intelligence 1996, (18)12, 1199-1209.
[32] Downie, T.R., Shepstone, L., Silverman, B.W., in: K.V. Madia et al. (Eds)
Proceedings in Image Fusion and Shape Variability Techniques 1996, 161-169,
Leeds University Press.
[33] Amit, Y., SIAM Journal of Scientific Computing 1994, 15(1), 207-224.
[34] Holden, M., Hill, D.L.G., Denton, E.R.E., Jarosz, J.M., Cox, T.C.S., Rohlfing T.,
Goodey, J., Hawkes, D.J., IEEE Transactions on Medical Imaging 2000, 19(2),
94-102.
[35] Ritter, N., Owens, R., Cooper, J., Eikelboom, R.H., van Saarloos, P.P., IEEE
Transactions on Medical Imaging 1999, 18, 404-418.
[36] Press, W.H., Teukolsky, S.A.., Vetterling, W.T., Flannery, B.P., in: C, the Art of
Scientific Computing, 2nd Edition, Cambridge University Press, Cambridge 1997,
p. 425-430.
[37] Lindeberg T., Journal of Applied Statistics 1993, 2(11), 22–270.
[38] Burt P.J., Computer Vision, Graphics and Image Processing 1981, 16, 20-51,
[39] Weekes, J., Wheeler, C.H., Yan ,J.X., Weil, J., et al., Electrophoresis 1998, 20,
898-906.
[40] Heinke, M.Y., Wheeler, C.H., Yan ,J.X., Amin, V., et al, Chang D, Einstein R,
Dunn M.J., dos Remedios C.G., Electrophoresis 1999, 20, 2086-2093.
31
Table 1:
Scores for the assessment of registration performance.
Score Description of match 5 P = 0, i.e. Gels are perfectly registered 4 0 < P < 6 3 5 < P <11 2 10 < P < 20 1 20 < P but the area in which these spots lie is less than a quarter of the total
area of the Gel 0 All spots in more than a quarter of the total area of the Gel are mis-
registered.
32
Table 2: Mean, standard deviation, max and min over all scores and execution times for all 106
test runs using MIR on Human Serum and Plant Extract.
Score Time Cost
Mean 3.47 5.46 sec StDev 1.12 1.05 sec Max 5 9 sec Min 0 3 sec
33
Figure legends
Figure 1: Definition of the bilinear mapping of point p, using the square
1,1,1,,1, ,, ++++ jijijiji aaaa in the target image and the control point quadrilateral
1,1,1,,1, ,, ++++ jijijiji cccc in the source image.
Figure 2: Schematic illustration of all the processing steps involved in the proposed
algorithm.
Figure 3: Optimisation results at different detail levels for a typical gel pair. The gel
images are overlaid using a special subtractive colour scheme. The PBM
mapping used for image registration is indicated by the red control grid. In
order to make the control grid visible along with the overlaid images, it was
necessary to transform the first (magenta) image instead of the second, using
the inverse mapping of the PBM.
Figure 4: Illustration of the best approximations to the (final) optimal transformation
found in different detail levels. The grids are similar to Fig. 3, however this
time they are subdivided such that they contain the same number of
quadrilaterals as the grid of the optimal transformation, which is overlaid in
red.
34
Figure 5: Optimisation results at different detail levels similar to Fig. 3. The images are
shown at the resolution from which the transformations are determined.
Figure 6: The effect of misleading local maxima and how it can be removed near the
global maximum by blurring.
Figure 7: Clarification of the grouping for the candidate and control 2-D gels in the
three different biological experiments used to assess the proposed method.
Figure 8: Three typical 2-D gels from each of the three biological experiments.
Figure 9: Illustration of the 2-D gel pair selection for the Intra and Inter Assessment.
Figure 10: Illustration of several registered and un-registered candidate spots as
examples for the scoring scheme.
Figure 11: Frequencies of scores for MIR and Z3 on 2-D gels of human serum
Figure 12: Frequencies of scores for MIR and Z3 on 2-D gels from the plant extract
experiment.
Figure 13: Frequencies of scores for MIR in Inter Assessment on the human serum
experiment.
35
Figure 14: Frequencies of scores for MIR in Inter Assessment on plant extract
experiment.
Figure 15: Illustration of a successful analysis of the 2-D gels in the heart time course
experiment using overlaid images generated by MIR.
Figure 16: Illustrates an example where MIR was not able to register corresponding
sections in two 2-D gels. The manually found transformation, shows the
extreme local distortions present in this unusual case.