Multiresolution image registration for two -dimensional ... · images. This permits the removal of...

Multiresolution image registration for two-dimensional gel electrophoresis

Stefan Veeser, Michael J. Dunn*, Guang-ZhongYang

Royal Society/Wolfson Foundation Medical Image Computing Laboratory

*National Heart and Lung Institute

Imperial College of Science, Technology and Medicine

University of London, London, SW7 2BZ, UK

Running title: Mutiresolution image registration of 2-D gels Keywords: Image processing / Image registration / Multi-resolution analysis / Two-dimensional gel electrophoresis Non-standard abbreviations: PBM-piecewise bilinear mapping, MIR-multi-resolution image registration Address for Correspondence Dr. G.Z. Yang Royal Society/Wolfson Medical Image Computing Laboratory Department of Computing 180 Queens Gate Imperial College of Science, Technology and Medicine Tel: (+44) 20 7594 8441 London SW7 2BZ Fax: (+44) 20 7581 8024 United Kingdom Email: [email protected]

2

Summary In proteomic research, 2-DE is an important tool for investigating differential patterns of

qualitative and quantitative protein expression. The strength of the technique is due to its

unrivalled power of being able to separate simultaneously thousands of proteins. The key

to the comparison of 2-D protein profiles, however, lies in use of a fast and robust image

matching process which is essential to the subsequent quantification procedure. To satisfy

the growing demand for a robust and fully automatic method of matching 2-D gel protein

separation profiles, we describe in this paper a novel registration technique based on

image intensity distribution rather than selected features. The method uses a multi-

resolution representation of the gel profiles and exploits the fact that coarse

approximations to the optimal matching can be extracted efficiently from low-resolution

images. This permits the removal of misalignments at different scales in a systematic

manner and the strength of the new method has been confirmed by a double blind trial of

111 2-D gel pairs. The proposed method requires neither landmarks nor an a priori image

alignment, and takes about 5 seconds for processing a typical gel pair on a standard

personal computer.

3

1. Introduction The first complete genome, that of the bacterium Haemophilus influenzae, was published

in 1995 [1]. We now have the complete genomic sequences for more than 40 prokaryotic

and eukaryotic organisms, and a major milestone has been reached recently with the

completion of the human genome [2,3]. A major challenge in the post-genome era will

be to elucidate the biological function of the large number of novel gene products that

have been revealed by the genome sequencing initiatives, to understand their role in

health and disease, and to exploit this information to develop new therapeutic agents.

Proteomics will play a major role in the assignment of protein function through direct

analysis of the patterns of expression, interaction, localisation, and structure of the

proteins encoded by genomes [4].

Recently, progress has been made in the development of alternative methods of protein

separation for proteomics, such as the use of chip-based technologies [5, 6], the direct

analysis of protein complexes using mass spectrometry [7], the use of affinity tags [8, 9],

and large-scale yeast two-hybrid screening [10]. However, two-dimensional

polyacrylamide gel electrophoresis (2-DE) remains the core technology of choice for

separating complex protein mixtures in the majority of proteome projects [11, 12]. This

is due to the unrivalled power of 2-DE to separate simultaneously thousands of proteins,

the subsequent high-sensitivity visualisation of the resulting 2-D separations [13] and the

relative ease with which proteins from 2-D gels can be identified and characterised using

highly sensitive microchemical methods [14], particularly those based on mass

spectrometry [15].

4

In many proteomic projects, 2-DE is used as a tool to investigate differential patterns of

qualitative and quantitative protein expression. This requires the comparative analysis

often of large sets of 2-D protein profiles, as task that is impossible without the use of

computer systems. Advances in 2-D gel technology, particularly with the use of IPG for

the first-dimension IEF separation [11], have significantly improved the reproducibility

of 2-D separations [16, 17]. Nevertheless, 2-DE remains an imperfect technique such

that no pair of 2-D gel patterns can be directly superimposed. This has necessitated the

development of specialised software packages, many of these having their origins in the

late 1970’s and early 1980’s, for example TYCHO [18], GELLAB [19], QUEST [20],

and ELSIE [21]. Most of these systems have matured into commercial packages,

including PDQuest, Kepler, Melanie, Phoretix and BioImage, and these have found some

acceptance in the 2-DE community [22].

It has been acknowledged that the key to the comparison of 2-D protein profiles lies in

use of a fast and robust image matching process, which is essential to the subsequent

quantification procedure. Currently, most of the matching procedures used employed in

the current generation of software packages require extensive user interaction. They

normally require users to specify manually from 20 to 50 corresponding spots that are

present in each 2-D gel making up the data set. A common paradigm to these methods is

to divide the matching process into the following three main steps:

• Pre-processing

5

• Spot detection

• Pattern matching

The pre-processing step is concerned with noise removal, background intensity bias

correction, and elimination of streak artefacts. Spot detection, on the other hand, is

mostly performed with Gaussian fitting or Laplacian filtering, which results in a list of

spot co-ordinates and values describing the intensity and the area covered by each spot.

This is usually followed by a pattern matching procedure for establishing corresponding

spots in the reference and target images based on user-defined landmarks.

There is evidence that in practice it may not be appropriate to follow the paradigm of

dividing the image registration task into two separate tasks; spot detection and pattern

matching. Firstly, precise extraction of spot location is computationally intensive and

error prone. In the case of complex patterns where spots are located very close to each

other or occur as overlapping regions, this can impose major challenges [23]. Secondly,

by reducing the gel images into a list or a graph of spot locations, a considerable amount

of visual information, such as shape and intensity spread, is lost for the subsequent

matching procedure. Thirdly, algorithms following this paradigm of spot detection and

pattern matching must cope with the combinatorial explosion of possible matches

between numerous small local patterns in each gel.

6

In an attempt to alleviate the above problems, a recent system called CAROL [24] uses

the smallest local pattern possible, i.e. pairs of locally intensive spots, called intensive

edges.

With the maturity of intensity based image registration in medical imaging [25], there has

been a growing amount of interest in applying this method for gel image processing.

With this approach, the process of spot feature detection is avoided, and the complete raw

gel images are matched according to their intensity distributions. The images can then be

displayed in an overlaid fashion using a subtractive colour scheme to highlight the

intrinsic structural differences between the gel pairs. The concept of intensity based

image registration is an actively researched area in the computer vision and image

processing community and has been used for both uni-modal and multi-modal

registration tasks in medical imaging, camera pose estimation for navigating robots,

video compression, and human gesture recognition. One of the first systems that adopted

this approach for electrophoresis is a software environment called Z3 developed by

Compugen [26]. The software has clearly demonstrated the advantages of using

intensity-based registration for 2-D gel image analysis, despite the drawbacks in handling

images with large distortion and sparsely distributed protein spots.

The purpose of this paper is to present an improved image registration technique based on

multi-resolution analysis [27]. The method adopts a coarse- to-fine feature-matching

paradigm such that the combinatorial complexity of the possible matches between a large

number of small local patterns in each gel is avoided. It also permits high-order non-

7

linear distortion to be recovered in an accurate way. The method only makes general

assumptions on the underlying structure of the spot distribution, and therefore is

applicable to all gel patterns. The method has been applied to a set of 111 gel pairs and

the assessment results have shown the strength of the proposed technique.

2. Methods

The scales of deformation associated with 2-D gel images can range from the whole

image down to areas of several square millimetres. Our goal is to devise a procedure for

the registration that is accurate yet efficient. A multi-resolution approach has been

adopted so as to handle both global and local deformations. The basic idea is that the

distortions, which produce the misalignment in the two gels, can be decomposed into

their respective components in each resolution level. By using low-resolution images of

the gels, only the coarse components of the distortions will be reflected and thus a rough

approximation to the optimal geometric transformation can be calculated. Once the

coarse misalignments are eliminated, the finer distortions on the next resolution level can

be dealt with. This process can be conducted in a recursive manner to derive the final

optimal transformation. The efficiency of the algorithm is reflected from the following

considerations. First, the processing of coarse transformation from low-resolution images

is computationally efficient. Secondly, misalignments on finer scales can be decoupled

from each other, as they are limited to small, non-overlapping areas. This is particularly

suitable for it to be used with a parallel architecture. The software implementation of the

8

described algorithm for Windows 98/2000/NT environment as well as some sample

images is available to download at www.doc.ic.ac.uk/~gzy.

2.1 Multi-resolution representation of image deformation

The registration of two images 21 , II consists of finding a transformation t, such that the

difference between 1I and t( 2I ) is minimal according to a predefined similarity measure

sim. In image registration, it is commonly formulated as an optimisation problem such

that the parameter vector c, which represents the ideal transformation tc, will maximise an

objective function ))(,()( 21 ItIcorrcf c= .

The choice of the transformation to be used depends on the knowledge of the deformation

to be recovered. For 2D gel images, the distortion introduced can be of higher orders and

its exact modelling can be difficult. For this reason, we have chosen to use

transformations defined by Piecewise Bilinear Maps (PBM) to represent the associated

deformation. Multiresolution decomposition can be applied to PBMs resulting in a

hierarchy of transformation spaces, where transformations from spaces with higher

dimension can model more localised and finer detail of the distortion. Hierarchical

transformation spaces have been used successfully in image registration [27-33].

Note that the transformation t of a source image I into the target image t(I) is defined by a

map m in the opposite direction, namely mapping points in the target image onto points

of the source image. This way m can be used to determine the intensities at different

points p in the target image by sampling them from points m(p) in the source image.

9

A PBM consists of a lattice of maps illustrated in Fig. 1. In order to built this lattice the

target image is first partitioned into ll 22 × regular squares, where l is called the level-of-

detail for the corresponding transformation. For a given level and a given index (i,j) the

vertices 1,1,1,,1, ,, ++++ jijijiji aaaa of the corresponding square in the target image and the

control points 1,1,1,,1, ,, ++++ jijijiji cccc of the corresponding quadrilateral in the source will

define the mapping function m. A point q, which lies in a square 1,1,1,,1, ,, ++++ jijijiji aaaa in

the target image, is mapped according the following weighted sum of the corresponding

control points.

1,11,11,1,,1,1,, ),(),(),(),()( ++++++++ ⋅+⋅+⋅+⋅= jijijijijijijiji cvucvucvucvuqm ωωωω

where

vuvu

vuvu

vuvu

vuvu

ji

ji

ji

ji

⋅=

⋅−=

−⋅=

−⋅−=

++

+

+

),(

)1(),(

)1(),(

)1()1(),(

1,1

1,

,1

,

ω

ω

ω

ω

The values of u and v are the horizontal and vertical ratios of point q, as indicated in Fig.

1.

While the vertices in the source image jia , are fixed at a given detail level, the vertices

jic , can vary to represent different maps. For a given detail level a PBM is defined by a

parameter vector )( , jicc = containing all the co-ordinates of the control points. All

possible parameter vectors of a certain detail level l define the linear space lT . By

subdividing each square in the target image into 4 smaller squares the linear space 1+lT

10

for the next higher detail level is defined. Each parameter vector )( ,l

jic in lT can be

subdivided such that it becomes a parameter vector }{ 1,+ljic in 1+lT , representing the same

mapping. The subdivision scheme is defined by:

+++++

=

++++

+

++

oddjoddi

oddjeveni

evenjoddi

evenjeveni

if

if

if

if

cccc

cc

cc

c

c

lba

lba

lba

lba

lba

lba

lba

lba

lba

lji

,

,

,

,

)(

)(

)(

1,11,,1,41

1,,21

,1,21

,

1,

where 22 , ji ba == , and x means the greatest integer smaller than x.

By subdividing the map between the target image and the source image, the number of

parameters increases approximately by a factor 4. By varying these new parameters it is

possible for the corresponding transformation to control more local distortions in the

given image.

2.2 Similarity Measure For intensity based image registration, there have been a number of similarity measures

used in practice [34]. 2-D gels prepared from similar samples in two different research

laboratories might vary in contrast as well as brightness, which is equivalent to changes

in mean intensity and variance. A sensible similarity measure should be invariant to these

variations. We have used the following cross-correlation between two intensity

distributions for this purpose

11

)()(

)cov(),(

21

2,121 II

IIIIcorr

σσ=

where

xxx dIIIID

IID∫ −−= ))()()((

1),cov( 221121

∫=D

dID

I xx )(1

and

xx dIID

ID∫ −= 22 ))((

1)(σ

In the above equations D 2ℜ⊂ is the domain of points to be considered for the

registration process and the images 21 , II are represented as functions },...,0{: NDI → ,

which give an intensity value )( pI for each point p in D. 2ℜ denotes the two-dimensional

Euclidian space.

The covariance cov stated above is a measure of the association of two random variables

that is invariant to changes in the mean values. By normalising this with the standard

deviations, the similarity measure derived from cross-correlation is made to be immune

to changes in variance as well. The value of corr varies between [-1,1], where value 1

reflects a perfect alignment of the two images.

One advantage of this measure is that the gradient of the resulting objective function

12

))(,()( 21 ItIcorrcf c= can be calculated fast. This allows the use of faster optimisation

algorithms for the search of the optimum transformation at each level. With “o ”

representing the composition of functions we have

)())(,()( 2,121 cc mIIcorrItIcorrcf o== .

By using the chain rule we can therefore calculate the gradient

}2,...,0{,,,

,)(lji

yji

xji c

f

c

fcf

∈

∂∂

∂∂

=∇

)()(

)()cov()()cov(

22

1

22,122,1

c

cccc

mII

mImIImImII

o

oooo

σσ

σσ ∇⋅−⋅∇=

where

)(,cov()cov( 212,1 cc mIImII oo ∇=∇ )

)(

))(,cov()(

2

222

c

ccc mI

mImImI

o

ooo

σσ

∇=∇

( ) ( )DpcccyxDpcc pmpmIpmImI

∈∈ ∇⋅∇=∇=∇ )())()(())(()( 2,22 o

In the above equations, ),( 222, III yxyx ∂∂

∂∂=∇ represents the gradient of the intensity in

the image and

)( =∇ pmcc

)2...0{,,,

,,

)()(

)()(

lji

yji

yc

xji

yc

yji

xc

xji

xc

c

pm

c

pm

c

pm

c

pm

∈

∂∂

∂∂

∂∂

∂∂

13

is a 11 2222 ++ ⋅×× ll matrix containing the partial derivatives of the piecewise bilinear

map with respect to the co-ordinates of each control point. For a given point p in the

target image it represents the change in the co-ordinates of the mapped point m(p) in the

source image with respect to the change of control point co-ordinates. A 22 × submatrix

of )( pmcc∇ for a given control point jic , equals

)(0

0)(

,

,

p

p

ji

ji

ωω

if p lies in an

adjacent square, otherwise all entries of the matrix are zero.

2.3 Optimisation With the transformation function and similarity measure defined, the next step towards a

registration algorithm is to find a suitable optimisation procedure to generate the optimal

transformation parameters in the allowed degrees of freedom. The choice of the

technique depends on the smoothness of the objective function and the presence of

misleading local maxima in the vicinity of the global maximum. In the case when the

objective function has many local maxima, algorithms like statistical gradient descent,

simulated annealing or exhaustive search have been used [35]. Such algorithms can be

computationally expensive and in the case where no local maxima are present, techniques

such as Powell’s method or the downhill simplex method are commonly adopted [36].

One unique feature of our proposed method is to exploit the derivative information of the

two images to accelerate the optimisation process based on a commonly used variable

metric method called the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [36].

Variable Metric Methods in general are extensions of Netwon’s method, which identify

14

the maximum of a function by searching for a zero of the gradient. In each step of the

Netwon’s method, the objective function is approximated by a second order Taylor

polynomial and the displacement leading to the unique stationary value (minimum or

maximum) of this quadratic is calculated directly. Variable metric methods use the same

strategy but do not require the second order partial derivatives needed for the Taylor

polynomial. Instead, it approximates these values using information from previous

iterations. Similar to the Newton method, quadratic convergence close to the maximum is

guaranteed.

2.4 Algorithm design The flow chart in Fig. 2 schematically illustrates the proposed algorithm, which is to be

referred to as MIR (Multi-resolution Image Registration) in the subsequent sections.

Starting with an initial scaled rigid transformation, the main loop recursively generates

transformations with gradually higher level of detail, which approximate the optimal

transformation with improving quality. In other words, the optimisation problem

presented above is divided in a sequence of optimisation problems, with an initial stage

for the parameters of a scalable rigid transformation and additional ones for PBMs at

each detail level. The starting point of the optimisation in the initial stage is the identity

transformation. For each subsequent stage, the solution of the previous iteration is used. It

is worth noting that the first subdivision is different from the subsequent ones, as it is

generating the initial grid rather than refining an existing one. The positions of the 4

control points of this initial grid are determined such that they represent the scalable rigid

15

transformation found in the first optimisation stage. This is achieved by identifying the 4

control points with the corner points of the image after the scalable rigid transformation.

Fig. 3 shows the result of the optimisation process at different detail levels for a typical

gel pair. The gel images are overlaid using a special subtractive colour scheme. Instead of

simultaneously reducing the amount of all three colour components, (red, green and blue)

to produce the usual shades of grey for structures in the images, only green is reduced in

the first image, rendering its structures in magenta. In the second image, red and blue are

reduced, rendering the structures in green. In the overlay image, reductions for green are

controlled by the first image and reductions for red and blue are controlled by the second

image. This means that the structures will appear as grey or black, if they coincide in both

images otherwise, they will appear as green or magenta. The PBM mapping used for

registering the images is indicated by the corresponding control grid in red.

Fig. 4 shows the same grids in green, however this time they are subdivided such that

they contain the same number of quadrilaterals as the grid of the (final) optimal

transformation, which is overlaid in red.

It can be seen from both figures, that the corrections made to the image become smaller

as details increase. This is equivalent to saying that after the best arrangement for a

quadrilateral is found, the maximal residual misalignment in this area will only be a

constant fraction of the side-length of the quadrilateral. In our experiments it turned out

that this fraction is around 161 in the average case and not more than 41 in the worst

16

case. This suggests that the size of the misalignments present in the images decreases

exponentially with the side-length of the quadrilaterals with increasing level of detail,

that is with )21( lO .

2.4.1 Multiresolution representation of the gel images The strategy of multiresolution representation is achieved by applying different degrees

of blurring to the gel images. At a given level, misalignments that are localised to areas

smaller than details conveyable by the corresponding image resolution will become

invisible. This permits the retrieval of mis-registration following a coarse-to-fine

paradigm. The blurring was done by reducing the resolution R of the images. An image at

resolution R consists of RR 22 × pixels. Initially the images were sampled at a resolution

of 10=R . The algorithm works with resolutions down to 5=R . For each lower

resolution the intensities of four corresponding pixels in the image at the previous

resolution are averaged. It has been shown that the ideal smoothing function is Gaussian

convolution [37]. The averaging method adopted here can be seen as the result of

convolving the image with a discrete Gaussian filter [38].

Before running BFGS in lT , the resolution is reduced from 10 to lR += 5 , which means

that the pixel size is about a 321 th of the average size of the quadrilaterals in the control

point grid at detail level l. This is half of the size of the misalignments found at this level

in the average case and therefore enough to blur them. Fig. 5 shows again the result of the

17

optimisation at different detail levels for the same gel pair. This time the gel images are

blurred according to the scheme explained above.

One important advantage of the blurring process is that it helps to meet the basic

requirement for the BFGS algorithm, that there should be no misleading local maxima

near the global maximum. Fig. 6 shows how blurring helps to remove such local maxima

in an simplified example where two similar one dimensional images have to be registered

by finding the optimal shift. At high resolution, the objective function has maxima for

two different shifts. One of them is only sub optimal, as it mis-registers the peaks within

the two images. At lower resolution, this sub optimal maximum vanishes. Although the

abscissa of the remaining single maximum does not exactly coincide with the optimal

shift, it can be used as a good starting point for the optimisation at higher resolution. This

starting point is close enough to the optimum to prevent BFGS from becoming trapped in

the local maximum.

18

2.4.2 Decoupling of local registration for efficiency

For a given detail level, the inner loop of the algorithm, as indicated by Fig. 2, optimises

the quadrilateral 1,1,1,,1, ,, ++++ jijijiji cccc for each square 1,1,1,,1, ,, ++++ jijijiji aaaa in the image

domain of gel image 1I separately. We found empirically that for the optimisation on

finer detail levels, only a small number of these quadrilaterals need major rearrangement,

as indicated in Fig. 4. By locally optimising the control grids, it is possible for the

optimisation process to rapidly converge or even skip areas that do not require further

adjustments, which leads to considerable savings in execution time.

The strategy of local registration through decoupling can be justified by the following

facts. If the control points are already close to their optimal positions, the separate

optimisation of each control point leads to the same solution as the optimisation of all

control points in parallel. The optimal value of the similarity function is achieved in both

cases by maximising the local contributions of each quadrilateral to the global similarity

function. These contributions are independent of each other because they are achieved by

small rearrangements of the control points, which adjust the correlation of both images in

locally bounded, separate areas. A slight dependence is left in the case of neighbouring

squares and the optimisation was therefore done in the order of decreasing intensity

variance occurred in the squares.

19

2.5 Performance Evaluation

2.5.1 Gels used for Evaluation

For the assessment of the proposed method, gels from three different cell biological

experiments were considered. Each of these experiments aimed to detect changes in

protein expression by comparing 2-D gel profiles from a control group with those from a

candidate group(s), see Fig. 7. In the first experiment, the effect of a plant extract on the

protein expression of IBR3 human dermal fibroblasts was investigated. Both control and

candidate samples were taken from homogeneous cell cultures grown in the laboratory,

the candidate samples were treated with the plant extract. Four gels were generated for

the control group and 9 for the candidate group.

The 2-D gel profiles for the second experiment were generated using samples of human

serum. Here the candidate samples came from several individuals with Paget’s disease.

The control samples were taken from normal, healthy individuals. The control group

consisted of 10 gels and the candidate group contained 4 gels.

The third experiment was a time course experiment to investigate the potential effects of

proteolysis on a total protein extract of human heart tissue prepared for 2-DE. One

sample of human heart was solubilised using standard sample lysis buffer (9.5 M urea,

2% w/v CHAPS, 0.1% w/v DTT, 0.8% w/v 2-D Pharmalyte pH 3-10). The other sample

of human heart was solubilised using the same lysis buffer to which a cocktail of protease

inhibitors had been added (Complete Mini, Roche, Mannheim, Germany). The samples

were then incubated at room temperature. Aliquots of each sample were removed and

20

frozen at -70°C after 0, 20, 40, 70 and 120 min. From each sample (100 µg loading) a 2D

gel was generated and the 2D protein separation patterns were visualised by silver

staining by our standard protocols [39, 40]. The gels were scanned with a laser

densitometer at 100 µm resolution. The candidate and control gels were given by the sets

of 2D gel profiles generated using the samples prepared in the absence and presence of

protease inhibitors respectively. Both candidate and control groups contained 5 gels, one

for each time point.

The 2-D gels shown in Fig. 8 are typical for each of the 3 experiments. It can be seen that

they vary significantly in spot size, distribution and dominance of streaks. All the 2-D

gels generated in the plant extract experiment are very similar, because they originate

from samples taken from homogeneous cell cultures. However, the human serum 2-D gel

patterns are very different because they were taken from different patients and control

individuals. Based on this we can say that the test included examples from both ends of

the range of 2-D gels commonly dealt with in laboratories.

2.5.2 Intra and Inter image group assessment

The performance of the proposed MIR algorithm was tested in two ways, see Fig. 9. The

first test, called Intra Assessment, consisted in comparing the relative performance of our

method and the Z3 software from Compugen. This is the only fully automated stand-

alone 2-D gel matching software known to the authors at this time. For this experiment,

only 2-D gels taken from the same group were matched for assessing the intra-group

variations. Both algorithms were applied to 15 pairs taken from the control group of the

21

human serum experiment and 15 pairs taken from the control group of the plant extract

experiment. An experienced observer scored the results depending on how well the

corresponding spots were matched. The scoring was done in double blind fashion.

A second test, called Inter Assessment, (see Fig. 9) was to assess the performance of MIR

on 2-D gel pairs consisting of one control gel and one candidate gel. The matching of

these gels was much more difficult as some image features were present in only one of

the gels. For the first part of the Inter Assessment MIR was run on pairs consisting each

of a candidate gel and a control gel from the human serum experiment and from the plant

extract experiment. The scoring was done along the same lines as in the Intra

Assessment. In the second part of the Inter Assessment, it was determined how well the

spots by which the two 2-D gels differed could be identified. For this the expert was

asked to analyse 2D gel images generated in the heart time course and plant extract

experiment only on the basis of the overlaid images generated by MIR. The results of this

analysis were compared with the results of an analysis on the same 2-D gels using other

techniques.

2.5.3 Scoring Scheme

For the purpose of quantitative evaluation we define P as the number of misregistred

spots found in the image of overlaid 2-D gels, that has to be scored. It is important only to

count spots as misregistred, if they do not overlap sufficiently but (a) truly represent the

same protein and (b) are defined enough to be meaningful. Each gel used for evaluation

contained around 600 meaningful spots. We defined two spots being registered if and

22

only if more than 75% of their area is overlapped. Fig. 10 illustrates several examples for

both the cases where two candidate spots are considered to be registered and un-

registered respectively. By using these criteria the scores shown in Table 1, were

devised.

3. Results

3.1 Intra-group variation

For the Intra Assessment, 15 2-D gel pairs from the control group of the experiment with

the plant extract and 15 2-D gel pairs from the control group of the experiment with

human serum were used. The images presented to the expert were randomised in their

names and order, such that neither the authors nor the expert were able to tell whether a

given image was the result of a run of Z3 or MIR.

Figs. 11 and 12 show the frequencies of scores for Z3 and MIR. It can be seen clearly

from these figures, that MIR scored significantly better than Z3 and achieved a smaller

deviation from the mean score. Z3 achieved a mean score of 2.19 and 3.53 for the Plant

Extract gels and Human Serum gels, whereas the mean scores for MIR were 3.53 and

4.87, respectively. In 29 out of a total of 30 cases MIR scores better than or equal to Z3.

On the 2-D gels of human serum, both algorithms decrease in their performance. This is

due to the increased difficulty of matching, because of more dominant streaks, more spots

which are washed together and a less dense distribution of spots. Similar to the Z3

program, the proposed algorithm requires around 5 seconds for a single run on a Pentium

II P300 machine.

23

3.2 Inter-group variation

For the first part of the Inter Assessment, MIR was run on a total of 76 pairs consisting

each of a candidate 2-D gel and a control 2-D gel. Forty gel pairs were taken from the

human serum experiment and 36 pairs were taken from the plant extract experiment.

Figs. 13 and 14 show the frequencies for the scores for these experiments. The mean

scores were 2.83 for the Human Serum Gels and 3.48 for the Plant Extract Gels, which

represents a decline of approximately one unit, compared with the scores in the Intra

Assessment. The decline is to be expected as non-matching features complicate the image

registration. However the scores achieved in the Inter Assessment are still reasonably

good, as a score of 3 means that not more than 10 corresponding spots were mis-

registered. This means that only around 1 % of the total number of spots could not be

matched. Table 2 summarises information on all the runs of MIR in Inter Assessment as

well as Intra Assessment.

As stated above, the aim of the second part of the Inter Assessment was a more

qualitative assessment of the usefulness of the matching procedure. For this the expert

was asked to analyse 2-D gel images generated in the plant extract experiment and in the

time course experiment using only the images produced by MIR.

In the case of the plant extract experiment, there was, in addition to the control group and

the candidate group (Plant Extract A) a third group of 2-D gels. These were generated

from the same type of cells but treated with a different kind of plant extract (plant extract

24

B). The aim of the analysis was to find out whether 2-D gels generated using plant extract

A could be identified if presented along with 2-D gels generated using plant extract B.

The identification could only be done by using the spots by which each type of 2-D gel

differed from the control. The expert was therefore confronted with two groups of

images, one group containing matches between control 2-D gels and extract A 2-D gels

the other group containing matches between control 2-D gels and extract B 2-D gels.

Like in the original analysis, where other tools had been used, the expert was able to

identify the group that contained the 2-D gels generated from plant extract A.

In the case of the time course experiment, the purpose was to investigate protein

proteolysis in a sample of human heart proteins prepared for 2-DE in the absence and

presence of protease inhibitors. The candidate 2-D gels were run using aliquots of the

two samples taken at 5 successive time points. In order to identify the affected proteins,

the candidate 2-D gels (no protease inhibitors in the sample) were compared with control

2-D gels (protease inhibitor cocktail present in the sample). The expert was confronted

with 5 images, each of which was a match between a candidate 2-D gel and a control 2-

Dgel sampled at a certain time point. As in the original analysis, the proteins that were

subject to proteolysis under these conditions could be identified. Fig. 15 illustrates this

result by showing the gels obtained after incubation of the samples at room temperature

for 0 (A-C) and 2 (D-F) hours. At time 0, the 2-D protein profiles are essentially identical

in the presence (A) and absence (B) of protease inhibitors. After registration these two

images matched perfectly so that almost all of the spots appeared grey (C). However,

after 120 minutes in the absence of protease inhibitors (E), considerable proteolysis was

25

evident with the appearance of many additional protein spots (arrows in E) compared

with the sample incubated in the presence of protease inhibitors (D). These 2D Gel

images could still be registered very well by MIR, but the additional protein spots due to

proteolysis were clearly visible in the registered image as green features (F).

4. Discussion

It is worth noting that although MIR shows a significant improvement in performance

there are still a couple of cases, 16 out of 111, where the results are not satisfactory (more

than 10 spots mis-registered). 15 of these 16 cases come from the experiment on human

serum gels. The problems of the algorithm with these 2-D gel pairs can be ascribed to the

fact that the gels contained only two very different kinds of geometric features. First,

there are very broad, widespread homogeneous shapes, originating from linear series of

large saturated spots which are coalesced and secondly there are a few sparsely

distributed, well localised small spots. The broad features differ strongly in their

geometrical outline between gels and also the small spots do not always have

corresponding partners.

Although the algorithm matches the corresponding broad features in almost all the cases,

the difference in their geometric shape makes the coarse registration so imprecise, that

corresponding small spots do not come close enough to be registered on the high detail

levels. The scores turned out low in this case, because it was these small spots which

were of interest in the experiment. These examples show a potential drawback of the

method. As it is based on the idea of using the extra geometric information present in the

26

intensity distribution for registration, it must fail if this extra information is strongly

misleading.

There was one gel pair from the plant extract experiment that did not achieve a

satisfactory score, due to severe local distortion. The local misalignments present in this

gel pair were of the size of a whole side length of a quadrilateral, instead of the expected

worst case of ¼ of a side length. Fig. 16 illustrates one such situation for two

corresponding sections. The two images on the top represent the sections with the critical

area encircled. The image pair at the bottom shows a manually found transformation,

which registers section 1 on section 2. Note the extreme distortion of the grid in the

critical area. This demonstrates the limit of the proposed method. If the distortions on a

certain detail level exceed ¼ of the side length, it can no longer be assumed that

corresponding structures overlap enough to guide the registration process, i.e. BFGS is

likely to terminate in a local maximum.

5. Concluding remarks

In this paper we present a new algorithm based on the BFGS optimisation method for

fully automatic matching of 2-D gel protein separation profiles based on multiresolution

image registration. The method exploits the fact that coarse approximations to the

optimal transformation can be extracted efficiently from low-resolution images. In this

way it is possible to remove misalignments at different scales in a systematic manner. In

a double blind trial the method performed significantly better on a set of 30 2-D gel pairs

than a commercial which also follows the intensity based method. It showed excellent

27

results in 90 % of the cases. In another double blind trial the system was tested on 76

more demanding matching problems and gave satisfactory results in 81 % of the cases.

Acknowledgements

The work in MJD’s laboratory is supported by the British Heart Foundation and

Proteome Sciences PLC. SV is supported by an undergraduate exchange scheme between

the Royal Society/Wolfson Medical Image Computing Laboratory at Imperial College

and the Universitaet Karlsruhe in Germany.

References

[1] Fleischmann, R..D., Adams, M.D., White, O., Clayton, R.A.. et al., Science 1995,

269, 496-512

[2] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W. et al., Science 2001, 291,

1304-1351.

[3] International Human Genome Sequencing, Nature 2001, 409, 860-922.

[4] Banks, R., Dunn, M.J., Hochstrasser, D.F., Sanchez JC, et al., The Lancet 2001,

356, 1749-1756.

[5] Merchant, M., Weinberger, S.R., Electrophoresis 2000, 21, 1165-1177.

[6] Nelson, R.W., Nedelkov, D., Tubbs, K.A., Electrophoresis 2000, 21, 1155-1163.

[7] Link, A.J., Eng, J., Schieltz, D.M., Carmack, E., et al., Nature Biotechnol. 1999,

17, 676-82.

28

[8] Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., et al., Nature Biotechnol. 1999, 17,

994-999.

[9] Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., et al., Nature Biotechnol. 1999,

17, 1030-1032.

[10] Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., et al., Nature 2000, 403, 623-627.

[11] Görg, A., Obermaier, C., Boguth, G., Harder, A., et al., Electrophoresis 2000, 21,

1037-1053.

[12] Dunn, M.J., Görg, A., in: Pennington, S.R. and Dunn, M.J. (Eds.), Proteomics,

From Protein Sequence to Function, BIOS Scientific Publishers, Oxford 2001,

pp.43-63.

[13] Patton, W.F., in: Pennington, S.R. and Dunn, M.J. (Eds.), Proteomics, From

Protein Sequence to Function, BIOS Scientific Publishers, Oxford 2001, pp. 65-

86.

[14] Wilkins, M.R., Gooley, A., in: Wilkins, M.R., Williams, K.L., Appel, R.D.,

Hochstrasser, D.F. (Eds.), Proteome Research: New Frontiers in Functional

Genomics (Wilkins, M.R., Williams, K.L., Appel, R.D. and Hochstrasser, D.F.,

eds.), Springer-Verlag, Berlin 1997, pp. 35-64.

[15] Patterson, S.D., Aebersold, R., Goodlett, D.R., in: Pennington, S.R. and Dunn,

M.J. (Eds.), Proteomics, From Protein Sequence to Function, BIOS Scientific

Publishers, Oxford 2001, pp. 87-130.

[16] Corbett, J.M., Dunn, M.J., Posch, A., Görg. A., Electrophoresis 1994, 15, 1205-

1211.

29

[17] Blomberg, A., Blomberg, L., Norbeck, J., Fey, S.J., et al., Electrophoresis 1995,

16, 1935-1945.

[18] Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P., Anderson, N.G., Clin.

Chem. 1981, 27, 1807-1820.

[19] Lemkin, P.F., Lipkin, L.E., Lester, E.P., Clin. Chem. 1982, 28, 840-849.

[20] Garrells, J.I., J. Biol. Chem. 1989, 264,5259-5282.

[21] Olson, A.D., Miller, M.J., Anal. Biochem. 1988, 169, 49-70.

[22] Pleissner, K.P., Oswald, H., Wegner, S., in: Pennington, S.R. and Dunn, M.J.

(Eds.), Proteomics, From Protein Sequence to Function, BIOS Scientific

Publishers, Oxford 2001, pp. 131-149.

[23] Kriegel, K., Seefeldt, I., Hoffmann, F., Schultz, C., et al., Electrophoresis 2000,

21, 2637-2640.

[24] Pleissner, K.P., Hoffmann, F., Kriegel, K, Wenk, C., et al., Electrophoresis 1999,

20, 755-765.

[25] Maintz, J.B.A., Viergever, M.A., Medical Image Analysis 1998, 2, 1-36.

[26] http://www.2dgels.com/

[27] Lester, H., Arridge, S.R., Pattern Recognition1999, 32, 129-149.

[28] Hajnal, J.V., Hill, D.J., Hawkes, D.J., Medical Image Registration, CRC Press,

London, 2001.

[29] Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.,

IEEE Transactions on Medical Imaging 1999, 18(8), 712-721.

[30] Starck, J.L., Murtagh, F., Bijaoui, A., Geometric Registration, Cambridge

University Press, Cambridge, 1998.

30

[31] Szeliski, R., Shum, H.Y., IEEE Transactions on Pattern Analysis and Machine

Intelligence 1996, (18)12, 1199-1209.

[32] Downie, T.R., Shepstone, L., Silverman, B.W., in: K.V. Madia et al. (Eds)

Proceedings in Image Fusion and Shape Variability Techniques 1996, 161-169,

Leeds University Press.

[33] Amit, Y., SIAM Journal of Scientific Computing 1994, 15(1), 207-224.

[34] Holden, M., Hill, D.L.G., Denton, E.R.E., Jarosz, J.M., Cox, T.C.S., Rohlfing T.,

Goodey, J., Hawkes, D.J., IEEE Transactions on Medical Imaging 2000, 19(2),

94-102.

[35] Ritter, N., Owens, R., Cooper, J., Eikelboom, R.H., van Saarloos, P.P., IEEE

Transactions on Medical Imaging 1999, 18, 404-418.

[36] Press, W.H., Teukolsky, S.A.., Vetterling, W.T., Flannery, B.P., in: C, the Art of

Scientific Computing, 2nd Edition, Cambridge University Press, Cambridge 1997,

p. 425-430.

[37] Lindeberg T., Journal of Applied Statistics 1993, 2(11), 22–270.

[38] Burt P.J., Computer Vision, Graphics and Image Processing 1981, 16, 20-51,

[39] Weekes, J., Wheeler, C.H., Yan ,J.X., Weil, J., et al., Electrophoresis 1998, 20,

898-906.

[40] Heinke, M.Y., Wheeler, C.H., Yan ,J.X., Amin, V., et al, Chang D, Einstein R,

Dunn M.J., dos Remedios C.G., Electrophoresis 1999, 20, 2086-2093.

31

Table 1:

Scores for the assessment of registration performance.

Score Description of match 5 P = 0, i.e. Gels are perfectly registered 4 0 < P < 6 3 5 < P <11 2 10 < P < 20 1 20 < P but the area in which these spots lie is less than a quarter of the total

area of the Gel 0 All spots in more than a quarter of the total area of the Gel are mis-

registered.

32

Table 2: Mean, standard deviation, max and min over all scores and execution times for all 106

test runs using MIR on Human Serum and Plant Extract.

Score Time Cost

Mean 3.47 5.46 sec StDev 1.12 1.05 sec Max 5 9 sec Min 0 3 sec

33

Figure legends

Figure 1: Definition of the bilinear mapping of point p, using the square

1,1,1,,1, ,, ++++ jijijiji aaaa in the target image and the control point quadrilateral

1,1,1,,1, ,, ++++ jijijiji cccc in the source image.

Figure 2: Schematic illustration of all the processing steps involved in the proposed

algorithm.

Figure 3: Optimisation results at different detail levels for a typical gel pair. The gel

images are overlaid using a special subtractive colour scheme. The PBM

mapping used for image registration is indicated by the red control grid. In

order to make the control grid visible along with the overlaid images, it was

necessary to transform the first (magenta) image instead of the second, using

the inverse mapping of the PBM.

Figure 4: Illustration of the best approximations to the (final) optimal transformation

found in different detail levels. The grids are similar to Fig. 3, however this

time they are subdivided such that they contain the same number of

quadrilaterals as the grid of the optimal transformation, which is overlaid in

red.

34

Figure 5: Optimisation results at different detail levels similar to Fig. 3. The images are

shown at the resolution from which the transformations are determined.

Figure 6: The effect of misleading local maxima and how it can be removed near the

global maximum by blurring.

Figure 7: Clarification of the grouping for the candidate and control 2-D gels in the

three different biological experiments used to assess the proposed method.

Figure 8: Three typical 2-D gels from each of the three biological experiments.

Figure 9: Illustration of the 2-D gel pair selection for the Intra and Inter Assessment.

Figure 10: Illustration of several registered and un-registered candidate spots as

examples for the scoring scheme.

Figure 11: Frequencies of scores for MIR and Z3 on 2-D gels of human serum

Figure 12: Frequencies of scores for MIR and Z3 on 2-D gels from the plant extract

experiment.

Figure 13: Frequencies of scores for MIR in Inter Assessment on the human serum

experiment.

35

Figure 14: Frequencies of scores for MIR in Inter Assessment on plant extract

experiment.

Figure 15: Illustration of a successful analysis of the 2-D gels in the heart time course

experiment using overlaid images generated by MIR.

Figure 16: Illustrates an example where MIR was not able to register corresponding

sections in two 2-D gels. The manually found transformation, shows the

extreme local distortions present in this unusual case.

Date post:	09-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Multiresolution image registration for two -dimensional ... · images. This permits the removal of...

Documents