+ All Categories
Home > Documents > Cross-correlation model for pattern acuity

Cross-correlation model for pattern acuity

Date post: 02-Oct-2016
Category:
Upload: ingo
View: 214 times
Download: 0 times
Share this document with a friend
9
1948 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986 Cross-correlation model for pattern acuity Terry Caelli Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2E9, Canada Ingo Rentschler Institute for Medical Psychology, University of Munich, D-8000 Munich, Federal Republic of Germany Received March 3, 1985; accepted June 19, 1986 We have investigated whether the perceptual alignment of two images can be predicted from their cross-correlation characteristics. Results from two experiments demonstrate this to be the case with a variety of images varying from Gaussian-modulated sinusoid gratings to two-dimensional textures and face images. 1. INTRODUCTION Techniques for image matching (alignment) in computer vision vary from attempts to minimize image differences (for example, in subtraction radiography') to hierarchical corre- lation techniques involving the correlations among various resolution levels of the images directly or after employing Laplacians. 2 - 4 Until now, however, no direct comparison of cross-correlation measures with human alignment perfor- mance have been made with such images. In the recent literature it has been proposed to determine the degree to which observers can align isolated spots, bars, or grating patterns by the underlying localized detector pro- files, their interactions, and intrinsic noise-response charac- teristics. 5 ' 6 Such proposed profiles include ON-center OFF- surround detectors fitting a v 2 G structure (Laplacian opera- tor following a Gaussian low-pass filter) or profiles corresponding, in general, to bandpass filters. The underly- ing assumption of such formulations and experiments is that vernier acuity requires the observer to align local signs or image features in specific positional registration. Conse- quently, the detectors involved in such acuity tasks are pre- cisely aimed at extracting the critical features. The ques- tions remain, however, (a) whether such processes occur in the alignment of arbitrary images as textures, faces, and scenes and (b) whether a single mechanism can sufficiently represent behavior in such complex patterns. The latter alignment we term pattern acuity, in contrast to vernier acuity, since many components have to be considered before adequate alignment criteria can be found. Since such alignment tasks usually involvethe correlation of spatially overlapping information (at least in the radio- graphic areal), we have experimentally considered images viewed through an aperture and being digitally added with respect to a vernier (horiozontal) offset, as illustrated in Fig. 1. In Fig. 1(a) the two image pairs are added with zero offset (aligned state), while Fig. 1(b) shows the identical process with an offset of 20 pixels, each image being a 256 X 256, 7- bit pixel digital image, the aperture being 200 X 200 in dimensions. It is important to note that the spatial profiles need not be necessarily identical, as is shown in the bottom two photographs of Fig. 1, where two different frequency bands of an image are added. In the following experiments we have observed how well such images can be aligned as a function of their complexity, correlations, bandwidth differences, etc. In a control ex- periment we have also compared our results with the more usual vernier acuity paradigm employing spatially nonover- lapping signals. Of central interest to all these investiga- tions was whether a simple cross-correlation mechanism would be adequate to predict the alignment performance or whether some form of derivative matched filtering, involv- ing specific detector profiles as convolution kernels (as has been suggested above 56 ) is necessary. In our experiments observers are first shown the "correct- ly" aligned (zero-offset) image pair [as in Fig. 1(a)] and are then shown either the aligned or a misaligned version with randomly chosen horizontal pixel offset values. The observ- er is told simply to respond whether the two composite images are identical-a negative response indicating the de- tection of a misalignment with respect to the fixed (and so aligned) common image. Our cross-correlation model for this task is as follows. Let I(x, y) represent the luminance function of an image i with respect to the locally Euclidean retinotopic coordinate system about fixation. With Ax denoting the variable hori- zontal offset between the image components of the target stimulus, I(x, y) + Ij(x, y) and I(x, y) + Ij(x + Ax, y) correspond to the reference and the target images, respec- tively, used in our experiments. The (total) cross-correla- tion function between the two composite images is CijT(Ax, a, ) = J [Ii(x, y) + j(x, y)][(x + a, y + ) + Ij(x + Ax + a, y + )]dxdy. This function seems inadequate to capture the (psychophys- ical) performance of our observers, since the successively and briefly exposed reference (I,) and target () stimuli occurred at the same retinal location rather than being shift- ed relative to each other. Hence we consider the value of 0740-3232/86/111948-09$02.00 © 1986 Optical Society of America T. Caelli and I. Rentschler
Transcript
Page 1: Cross-correlation model for pattern acuity

1948 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986

Cross-correlation model for pattern acuity

Terry Caelli

Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2E9, Canada

Ingo Rentschler

Institute for Medical Psychology, University of Munich, D-8000 Munich, Federal Republic of Germany

Received March 3, 1985; accepted June 19, 1986

We have investigated whether the perceptual alignment of two images can be predicted from their cross-correlationcharacteristics. Results from two experiments demonstrate this to be the case with a variety of images varying fromGaussian-modulated sinusoid gratings to two-dimensional textures and face images.

1. INTRODUCTION

Techniques for image matching (alignment) in computervision vary from attempts to minimize image differences (forexample, in subtraction radiography') to hierarchical corre-lation techniques involving the correlations among variousresolution levels of the images directly or after employingLaplacians. 2-4 Until now, however, no direct comparison ofcross-correlation measures with human alignment perfor-mance have been made with such images.

In the recent literature it has been proposed to determinethe degree to which observers can align isolated spots, bars,or grating patterns by the underlying localized detector pro-files, their interactions, and intrinsic noise-response charac-teristics.5' 6 Such proposed profiles include ON-center OFF-surround detectors fitting a v2 G structure (Laplacian opera-tor following a Gaussian low-pass filter) or profilescorresponding, in general, to bandpass filters. The underly-ing assumption of such formulations and experiments is thatvernier acuity requires the observer to align local signs orimage features in specific positional registration. Conse-quently, the detectors involved in such acuity tasks are pre-cisely aimed at extracting the critical features. The ques-tions remain, however, (a) whether such processes occur inthe alignment of arbitrary images as textures, faces, andscenes and (b) whether a single mechanism can sufficientlyrepresent behavior in such complex patterns. The latteralignment we term pattern acuity, in contrast to vernieracuity, since many components have to be considered beforeadequate alignment criteria can be found.

Since such alignment tasks usually involve the correlationof spatially overlapping information (at least in the radio-graphic areal), we have experimentally considered imagesviewed through an aperture and being digitally added withrespect to a vernier (horiozontal) offset, as illustrated in Fig.1. In Fig. 1(a) the two image pairs are added with zero offset(aligned state), while Fig. 1(b) shows the identical processwith an offset of 20 pixels, each image being a 256 X 256, 7-bit pixel digital image, the aperture being 200 X 200 indimensions. It is important to note that the spatial profilesneed not be necessarily identical, as is shown in the bottom

two photographs of Fig. 1, where two different frequencybands of an image are added.

In the following experiments we have observed how wellsuch images can be aligned as a function of their complexity,correlations, bandwidth differences, etc. In a control ex-periment we have also compared our results with the moreusual vernier acuity paradigm employing spatially nonover-lapping signals. Of central interest to all these investiga-tions was whether a simple cross-correlation mechanismwould be adequate to predict the alignment performance orwhether some form of derivative matched filtering, involv-ing specific detector profiles as convolution kernels (as hasbeen suggested above5 6) is necessary.

In our experiments observers are first shown the "correct-ly" aligned (zero-offset) image pair [as in Fig. 1(a)] and arethen shown either the aligned or a misaligned version withrandomly chosen horizontal pixel offset values. The observ-er is told simply to respond whether the two compositeimages are identical-a negative response indicating the de-tection of a misalignment with respect to the fixed (and soaligned) common image. Our cross-correlation model forthis task is as follows.

Let I(x, y) represent the luminance function of an image iwith respect to the locally Euclidean retinotopic coordinatesystem about fixation. With Ax denoting the variable hori-zontal offset between the image components of the targetstimulus, I(x, y) + Ij(x, y) and I(x, y) + Ij(x + Ax, y)correspond to the reference and the target images, respec-tively, used in our experiments. The (total) cross-correla-tion function between the two composite images is

CijT(Ax, a, ) = J [Ii(x, y) + j(x, y)] [(x + a, y + )

+ Ij(x + Ax + a, y + )]dxdy.

This function seems inadequate to capture the (psychophys-ical) performance of our observers, since the successivelyand briefly exposed reference (I,) and target () stimulioccurred at the same retinal location rather than being shift-ed relative to each other. Hence we consider the value of

0740-3232/86/111948-09$02.00 © 1986 Optical Society of America

T. Caelli and I. Rentschler

Page 2: Cross-correlation model for pattern acuity

Vol. 3, No. 11/November 1986/J. Opt. Soc. Am. A 1949

(a) (b)

Fig. 1. Image misalignment. (a) Shows image pairs in perfect alignment, while (b) shows the same images horizontally misaligned by 20 pixels

with respect to a 200 X 200 pixel aperture. The upper image pairs were two identical low-pass versions of a cafe scene. The lower pair

consisted of low-plus-high-pass versions. Here the filters were isotropic Gaussian centered at 0 (low) and 32 (high) picture cycles, having

bandwidths of low: 3 octaves and high: 1 octave to 1/e decay.

the cross-correlation function at zero displacement betweenthe reference and the target:

CijT(Ax) CijT(Ax, 0, 0) = Jf [hi(x, y) + I(x, y)]

X [Ii(x, y) + Ij(x + Ax, y)] dxdy, (1)

or, in other words, simply the dot product between the twocomposite images. That is,

C iT(Ax) = f f Ii2(x, y)dxdy

+ J J 1Ii(x, y) X Ij (x, y)]dxdy

+ J J_ [Ii(x, y) X Ij(x + Ax, y)]dxdy

+ J _ [Ii(x, y) X Ij(x + Ax, y)] dxdy.

For

Ai(Ax) = f f [Ii(x, y) Ii(x + Ax, y)]dxdy,

the horizontal offset autocorrelation function for image Ii;

Aj(0) = En = I [Ii2(x, y)]dxdy,

T. Caelli and 1. Rentschler

Page 3: Cross-correlation model for pattern acuity

1950 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986

the energy of image I'; and

Cij(Ax) = J L [I (x, y) X Ij(x + Ax, y)]dxdy,

the horizontal offset cross-correlation function between im-ages I and Ij; Eq. (1) becomes

CiiT(Ax) = Aj(0) + Cj(0) + Ai(Ax) + Cij(Ax), (2)

that is, the sum of the horizontal offset autocorrelation andcross-correlation functions for the component images 1 i andIj at zero displacement and at an offset of Ax, respectively,Here it should be kept in mind that the latter is the indepen-dent variable in our pattern-alignment task. Consequently,the aim of the following experiments was to investigate towhat extent Eq. (2) can predict the pattern acuity of observ-ers with a variety of images.

2. EXPERIMENT 1: ALIGNMENT OFTWO-DIMENSIONAL GABOR SIGNALS

Stimulus and ApparatusIn the first experiment we observed the ability of observersto align two-dimensional Gabor signals that are cosine(even-case) modulated bivariate Gaussian windows of theform

10(x, y) = expf-[(x - xO)2/c 2 + (y - y0)2/1 2]j

X cos[2r(uox + voy)]. (3)

For a = and v = 0, Eq. (3) results in vertical gratings ofspatial frequency u modulated by an isotropic Gaussianaperture decaying to /e in a. Here (x, y) x represents theEuclidean retinotopic coordinate system within a 1.00 aper-ture about fixation. Since these signals are localized in theimage [being centered at (xo, yo)] and spatial-frequency [cen-

Fig. 2. (a) Basic images employed in Experiment 1. Two-dimen-sional Gabor signals defined over 128 X 128 pixels have Gaussianspace constants (/e decay values: a) of 4, 8, and 32 pixels andhorizontal frequency modulations of 0, 4, and 8 cycles per 128 pixels(f). One pixel corresponded to 36" of visual angle.

f

0

4 8 32

r

1.0-0o-06-

04-

0-2-

-4 r -

-0-4-

-0-46-O63

-1.0

10-

8 r

0-6

0'4

02.

-0-2

-04-

-06-

-1-0-

-2o -1o 0 10 -20 -10 0 10 -0 -10 0 10ax ax ox

Fig. 3. Normalized autocorelation functions [Eq. (6)] for the sevenimages used in Experiment 1 (Fig. 2) with respect to the vernieroffset (Ax, in pixels) range employed (see Fig. 4). Here a corre-sponds to the space constant in pixels, while f corresponds to thefrequency in cycles per 128 pixels.

tered at (uo, v)] domains7 8 having bandwidths determinedby their space constant [a in Eq. (3)], and a spectral band-width of 1/a, it is relatively simple to observe pattern acuityas a function of such signal frequency and bandwidth char-acteristics. We have studied the signals shown in Fig. 2.

Each signal was defined within a 128 X 128 pixel format,having three different space constants of 4, 8, and 32 pixels(isotropic Gaussian decay to le in the respective pixels) andthree different modulation frequencies of 0 (no modulation),4, and 8 cycles per picture (128 pixels). The last-named twofrequencies corresponded to 0, 3.3 and 6.6 cycles/degree,respectively. The amplitude modulations of each imagewere restricted to a linear (with respect to luminance) 7-bitluminance range (0-127) such that their addition within thedisplay aperture (Fig. 1) did not exceed the 8-bit linear rangeof the image-processing system. The peak contrast was setat 80% with a fixed-space average luminance of 30 cd/M 2.

In the main experiment the combined images were shownwithin a square 100 X 100 pixel aperture, which subtended a1.00 visual angle to the observer seated 160 cm from thescreen such that 1 pixel corresponded to 1/1000 (36"). Pixeloffsets of Ax: -20, -10, -5, -2, 0, 2, 10} were used in orderto ascertain P(Ax) curves (probability of "same" responsesas a function of pixel offset).

In an additional control experiment we used a vernierversion of the task in which the three component imagesused did not spatially overlap one another. In this case theobserver was to respond whether the center signal was collin-ear ("same") with two identical versions of the same imageabove and below it. That is, we employed a modified ver-

T. Caelli and 1. Rentschler

II

II

o-e

__11\

Page 4: Cross-correlation model for pattern acuity

Vol. 3, No. 11/November 1986/J. Opt. Soc. Am. A 1951

sion of Ludvigh's9 three-dot-alignment task with the dotsreplaced by Gabor signals. In this case we used only the

signals in row 1 of Fig. 2, all vertically separated by 128 pixels

from center to center.All images were generated on an ITI (Imaging Technol-

ogy) image-processing system attached to a PDP 11/23 com-

puter and displayed on an Electrohome monitor with P32phosphor in a semi-illuminated room.

Subjects and ProcedureTwo observers with normal and corrected-to-normal acuityparticipated in Experiments 1 and 2. On any given trial the

aligned pair was exposed for 200 msec, followed by a 1-sec

4

interstimulus interval, after which the misaligned (oraligned Ax = 0) pair was shown for 200 msec. Observers

were required to indicate whether the two composite imageswere the same ("identical"). An experimental session con-

sisted of 350 trials being determined from 50 presentationsof each of the 7 offset values (see above) for a given signal

shown in Fig. 2. This resulted in 7 X 350 = 2,450 trials per

observer.In the control experiment, rather than employing the two

frames as in the main experiment, observers were simplyshown one frame for 200 msec and were required to respond

whether the three component images were collinear for the 7different offset values-with 50 trials per condition.

8 32

Jo1 r 0.99 c,a 0.05 +b 0.95 1

*1

+

I

r 0 .99 /° %a - 0.11 +b 1.01 1

\ . ~~~~~~I~~~~~~~~I

\

t_-2 +* , . t~~~~~~+

rab

. I I

0.980.100.98

\

01

-

l~

r 0.98a -0.31 $b 1.04 014I\

.,! 0 /0

/0

0,~~~~

r 0.98 *,Ia 0.07 T \

b 0.80 I00\

I \

I \

I\

at___ -

r 0 . 9 7 + \ r 0.95 1lo

a 0.07 + a 0.05 °b 1.06 ? b 0.86 1

0~~~~~~~

-0 0 0lb-20 -10 0 \ 10 -0AdX Ax 6X

Fig. 4. Probability of "same" (aligned) Ps as a function of image offset (Ax: in pixels) with respect to a 100 X 100 pixel aperture, for observers

MM and GB (filled and open circles, respectively). One pixel = 36" of visual angle. a and f refer to space constants and modulation

frequencies (crosses) shown in Fig. 2. Dashed lines join the best-fitting predictions from Eq. (8).. Correlations (r's), intercepts (a's), and slopes

(b's) are shown for all curves.

1.0-

0.81

0.6-

-0 Ps

A4

-0.2-

1.0-

0.8-

0.6-

0

f 4 s0.4

0.2

1.0-

0.6-8 Ps

0.4-

0.2-

-2'

-

T. Caelli and . Rentschler

IIII

III

I -J

Page 5: Cross-correlation model for pattern acuity

1952 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986

SC4

PS

8P(Ax) = a + flrii(Ax).

32

-20 -io l1 -20 -0 0 10 -20 -10 0 10AX A~~~'X AX

Fig. 5. Vernier acuity [Ps(Ax)] determined from the probability ofobservers' responding that a center signal was collinear ("same")with two (vertical) flanking signals (open circles). Filled circlescorrespond to the equivalent results from Experiment 1-the spa-tially overlapping case. Results are averaged over observers. SCstands for the Gaussian space constants of each signal, while Axcorresponds to pixel offset values.

ResultsSince in this experiment only identical images were to bealigned, Eq. (2) reduces, for the seven signals shown in Fig. 2,to

Ci T(Ax) 2[Ai(O) + Ai(Ax)]. (4)

Now any correlation function Cij(/Ax) can be broken downinto

Cij(x) = nIiI + oiajrjj(Ax), (5)

where rij corresponds to the Pearson (normalized) correla-tion between the two images I and Ij:

cov[Ii, Ij(Ax)]ri 1(Ax) (6)

ai and aj being the standard deviations of luminances for Iiand Ij, respectively, and cov being their covariance for theshift value (Ax). n corresponds to the total number of imagepoints and I to the space average luminance of image Ii.Figure 3 shows rii(Ax) for each of the seven signals used.

Noting that

Aii(Ax) Ai(Ax) = n, 2 + or2rii(Ax)

and then substituting Eq. (5) into Eq. (4) results in the directproportionality between CQiT(AX) and rii(Ax) as

CiiT(Ax = a + brii(Ax) (7)

for a = 2(E + n2) b = 2 2. That is, the shape of C isdetermined by rii up to linear scaling that is due to the(fixed) energy, mean, and variance of the given image.

The proposed model here assumes that there is a directrelationship between alignment (acuity) and the alignment-correlation function. It does not assume that the coeffi-cients (a, b) in Eq. (7), however, are only physically defined,since the psychometric function underlying the matchingprocess must map such physical-alignment values into prob-abilities of alignment, or "same"judgment p(AX) values.For these reasons our theory holds up only to a linear trans-formation of the correlation function, i.e.,

(8)

Clearly (a, ). in Eq. (8) are related to the physical determi-nants of (a, b) in Eq. (7) since image contrast (variance, etc.)would affect performance. However, the main point aboutEq. (8) is that the shape of the P3(Ax) curve is determined bythe perceptual equivalent to the physical cross correlation ofthe images involved.

Figure 4 shows the probabilities of "same" (aligned im-ages) responses [P3(Ax), circles] for each observer over theseven experimental conditions. Superimposed upon theseexperimental results are the related least-squares best-fit-ting solutions for Eq. (8) compared with th, mean perfor-mance at each shift (Ax) value. The regression coefficientswere estimated with respect to the additional constraint thatall negative correlations were assumed to signal uncorrelated(or misaligned) information to the observer and were set tozero for purposes of regression. Under these conditions anaverage correlation of r = 0.98 was observed over the sevenfits of Eq. (8) to the data, so explaining, on average, 96% ofthe performance variance.

A comparison of the alignment performance with spatiallyoverlapping and nonoverlapping Gabor signals is providedby Fig. 5. As can be seen, there is virtually no differencebetween the conditions of the main experiment and thosethe control experiment with a vernier version of the task.

The question remains whether these results apply to arbi-trary images. As with spot, bar, or line signals, it is difficultto generalize our results from these images since the modula-tions of the images were exclusively in the same direction asthe offset parameter and, more importantly, they representa class of profiles recently proposed to capture salient prop-erties of simple cell-receptive fields.8"10 That is, such signalsare particularly sensitive to the 2 G operators proposed byWatt and Morgan6 to predict aspects of edge-specific vernieralignment. Consequently, we have conducted a second ex-periment with two different classes of images to investigatewhether the cross-correlation predictions still hold with ar-bitrary images-and not necessarily restricted to the auto-correlation form of Eq. (2).

EXPERIMENT 2

Figure 6 shows three bandpass versions of face and textureimages. The filters were Gaussian (annuli) centered at 0, 8,and 32 picture cycles 2 octaves in width (to 1/e decay inradial frequency). The images were initially constructed as256 X 256 pixel images having a maximum frequency of 128picture cycles.

This experiment employed both identical and nonidenti-cal images for the alignment task. That is, the fixed (refer-ence) and variable (target) components of the image pairsconsisted of either the same bandpass (diagonals in Fig. 7) ordifferent bandpass components (off diagonals in Fig. 7).Otherwise the procedure (exposure times, apertures, spaceaverage luminance, contrasts, number of alignment trialsand observers) was identical with the main component ofExperiment 1.

The cross-correlation formulation for this experiment dif-fered from the previous one insofar as Eq. (4) substitutedinto Eq. (2) resulted in

T. Caelli and L Rentschler

Page 6: Cross-correlation model for pattern acuity

Vol. 3, No. 11/November 1986/J. Opt. Soc. Am. A 1953

TextureCF

0

8

32

Face

Fig. 6. Six images used in Experiment 2. The three bandpassescorresponded to isotropic Gaussian filters centered at 0, 8, and 32

.picture cycles (CF's) and decaying to 1/e in +1 octave with respect toa 256 X 256 8-bit pixel format.

C 1T(Ax) = a + brii(Ax) + crij(Ax) (9)

for

a = E + n( 12 + 2I7Ij) + o-ojrjj(O),

b ori2

c = fiaj (i 5- j).

The normalized autocorrelation (ri) and cross-correlation(rij) functions for the 3 X 3 = 9 bandwidth mixing conditionsare shown in Fig. 7 for both the face and the texture images.

Again, under the assumption that the psychometric func-tion P8(Ax) is linearly related to CijT(Ax), the equivalent toEq. (8) in this experiment is

P(/Ax) = a + Orii + yrij(Ax). (10)

However, as can be seen from Fig. 7, the cross-correlationcomponents were either identically zero or near zero. Con-sequently we have fitted alignment [P8(Ax)] responses to

P(Ax) = a + Orii(Ax), (11)

where rii corresponded to the autocorrelations betweenImags within the same bandwidths.

Figures 8 and 9 show observed P5(Ax) results over the twoobservers compared (with the best-fitting predictions of Eq.

C

1.

t

I2orSu

(11) to their means. Again, relatively high (average) corre-lations were observed (face r = 0.91; texture, r = 0.89) with-out the use of any nonlinear power functions (etc.), which arecommonly used to improve the fit of such data to theory.

4. DISCUSSION

The distinction between cross-correlation [C/j(O), Cij(Ax)]and autocorrelation [Aj(Ax)] components in the full cross-correlation function CijT(AX) in Eq. (2) has important per-ceptual implications. In both experiments observers weresimply required to decide whether two composite imageswere identical. Since all images, except those used in thecontrol experiment, were spatially overlapping and observ-ers were to detect strict identity, the crucial functionuniquely depicting the misalignment of the variable (offset)image was its autocorrelation function Aj(Ax). The remain-ing cross-correlation functions, between the fixed and vari-able images, measured the degree to which the alignment-specific (variable) image (as a foreground) correlated withthe background fixed image in each composite frame.Rather than arguing for independence of processing theforeground from the background owing to the innovation ofdifferent spatial-frequency channels per se, we believe that a

Face:

0

08I.b /

or0-4

0-2

o~~~~~~~~~ -

10-

0-5-

06-

r0-4-

0-2

Center Frequencyl

Texture:

32

[32 r

0-4

0-2

2 10 0 1 10 20 10 0 10

Fig. 7. Normalized cross-correlation functions [Cij(Ax)]; see Sec-tion 1] between the three signals shown in Fig. 6 with respect to thevernier offset (Ax: in pixels) range used in Experiment 2. Diago-nals correspond to their autocorrelation functions.

T. Caelli and L Rentschler

I

Page 7: Cross-correlation model for pattern acuity

1954 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986

FIXED imageCf

8

r 0.99B a -3.28

b 4.0

l \\

/ I

/./

r 0.99a 0.10b 0.49

T- t//

, /

r 0.70a 0.40b 0.36

*/+a I'

P.

r 0.97a -4.05b 4.78

r 0.a -4.b 4.

I \-

I \

/

rab

\0

0.99 .- .08

0.90 4 +

, \1 0l

Il

i

r 0.85a 0.20b 0.64 *

ilt -1 0

0

A1+A- +-

ct . .

0 I0-l0 -10 0AX

//o

32

95 t05 \92 /t

\

// \

/ \

r 0.95a -0.11b 0.76

I A0\

I \

I-

r 0.82a 0. 23 tb 0.79 c

o 1

I\

I +An+ 0+_ -o a

.

-20 -10 0 10

Fig. 8. Face image: Probability of "same" (aligned) P,(Ax) as a function of image offset (Ax: in pixels) with respect to a 200 X 200 pixel aper-ture. One pixel = 36" of visual angle. Fixed and variable image refer to the nonmovable and vernier offset images, respectively. cfcorresponds to the center frequency of each filter in picture cycles (see Fig. 6). Dashed lines join the best-fitting predictions (crosses) from Eq.(11). Correlations (r's), intercepts (a's), and slopes (b's) are shown for all curves.

simpler explanation is that these two last-named compo-nents had near-zero correlation when they were not generat-ed in the same frequency band (see Fig. 7). That is, observ-ers are known (not only from this task; see Caelli and Yu-zyk" and Hbner et al.12 ) to segregate image luminanceprofiles in terms of the degree to which they are spatiallyuncorrelated.

With these results in mind, we conclude that the cross-correlation model for image registration proves to be suffi-cient for predicting alignment behavior, at least with thiskind of task. However, we have not exhausted the types ofdecomposition of the correlation functions that could beinvolved in other tasks. Also, the abilities of observers toalign such images is not restricted to the case when theyspatially overlap. We have shown that with the vernieracuity version of the task (where the observer is to align amoving center image such that it is collinear with two other

versions above and below it) identical tuning [P1(Ax)] curvesoccurred, as is shown in Fig. 5.

Such results demonstrate an apparent ability to computesomething perceptually akin to cross correlation in the pres-ence of an adequately small orthogonal offset and also ex-perimentally supports the conclusion that position uncer-tainty (lack of pattern acuity) increases with decreases insignal bandwidth. As the Gabor signals' space constantsincreased from 4 to 32 pixels, their bandwidths decreasedproportionately-and pattern acuity also decreased in bothtasks (Fig. 4 and 5). However, there are many cases of imagealignment for which CijT(AX) would not be an adequatedeterminant without some form of differentiation. For ex-ample, although observers can align an edge-only version ofan image with the original, the CT(Ax) function is relativelyflat unless the underlying image is specifically differentiatedby bandpass filtering.13

1.0-

0

0.6

2OD

E

-1 cf

-

-0 s

04

0.2

1.0

08

0.6-

- 8 Ps

OA.

0.2

1.0

0.8-

0.6-

32 Ps

0.4-

0.2,

-2.0 -1.0

or .- .

T. Caelli and 1. Rentschler

Page 8: Cross-correlation model for pattern acuity

Vol. 3, No. 11/November 1986/J. Opt. Soc. Am. A 1955

There are a number of reasons why such cross-correlationformulations for pattern acuity parallel behavior. From asignal-processing perspective the cross correlation peaksprecisely when two images are identical and aligned, owingto the Cauchy-Schwarz inequality.14 That is,

Cj(a, = f f fi(x, y)fj(x + a, y + ) dxdy

< [ f f2 (X, y) dxdy]

X [f. f fj 2 (X + a, y + ) dxdy] (12)

WMMOGB

with equality if and only if

f(x, Y) = f/(x + a, y + ), X - const. (13)

Further, the slope of the cross-correlation function aboutthis peak value determines the degree to which two suchimages can be regarded as identical. The benefit of usingthe complete luminance profiles fi(x, y) in the calculation isthat it includes all the important (or more perceptually sa-lient) information in the image without requiring multipa-rameter estimation procedures to extract the pattern fea-tures that best predict behavior. In fact, basing alignmentdecisions on the output of a cross-correlation process is anexample of an ideal detector strategy since this detectordecides on the position of the signal in terms of the crosscorrelation between signal and image.1 4 Evidence that this

FIXED image

cf

0o

r 0.97a -0.54b 1.11

0/ **\/+ \

/

0~V

/

/

8

'0

0+

r 0.94a 0.02b 0.63 0

0

I ~ a{ \I / 0

/ C/ .

r 0.80 1a 0.06 1b 0.95

0/ I -

r 0.90a -1.03 *ob 1 .73 °

//

jl/

/

/

r 0.75a -1.04b 1.65

\0

0 / X/X

I ., q I

r 0.94 r 0.87 0a -0.08 a -0.17 Or

b 0.82 Ho b 0.90 /00/ c t I

I t\ I \t *\ , \I II I, - ,. \

,- 0 / #-I - q~~~~~I

r 0.90 Ja -0.19 1zOb 1.34 J *

.1

I \

i ~~*1at

/k__ v 4, O

a/ 0

r 0.90a -0.20b 1.36

1.0-32

4 '+0\

0

E

:W cf

0 Ps

04-

02-

0.8-

0.6-

8 s0.4-

0.2-

1.0-

0.8-

0.6-

-32 Ps

0W4

02-

\ I

* ,/ \

-20 -10 0 10-20 - (1 0 10-20 -10 04X OX Mx

\+

Pt

.10

Fig. 9. Texture image: Probability of "same" (aligned) P(Ax) as a function of image offset (Ax: in pixels) with respect to a 200 X 200 pixel

aperture. One pixel = 36' of visual angle. Fixed and variable image refer to the nonmovable and vernier offset images, respectively. cf

corresponds to the center frequency of each filter in picture cycles (see Fig. 6). Dashed lines join the best-fitting predictions (crosses) from Eq.

(9). Correlations (r's), intercepts (a's), and slopes (b's) are shown for each curve.

T. Caelli and . Rentschler

0

�'j*I Is

III �

T \I\I

\

O7*

Page 9: Cross-correlation model for pattern acuity

1956 J. Opt. Soc. Am. A/Vol. 3, No. 11/November 1986

process is not just in the stimulus comes from the identicalbehavior of observers for when the signals were either super-imposed or vertically separated (Fig. 5).

Such cross-correlation processes are believed to occur inthe type of local image processing computed by retinal andcortical cells since the actual shape of the receptive fieldprofile is determined by Eqs. (12) and (13): the "principleof maximum response." Also, since cross correlation is lin-ear with respect to component filtering processes, then thenet effect of an ensemble of cross correlators, whose profilesare independent yet synthesize the input signals, would bethe full cross-correlation function.

In conclusion, then, we have investigated the idea thatpattern acuity performance can be modeled in terms of alinear transformation of the constituent image cross-correla-tion properties. Our results support the representationwith a variety of patterns and arrangements. The results donot contradict extant ideas in the literature that such acu-ities may be modeled in terms of the cross-correlation out-puts of specific filter profiles and the signals. Rather, theysimply show that the full cross-correlation information ap-parently is a sufficient statistic to represent behavior, with-out the need for parameter estimation and optimizationprocedures of the latter kind.

ACKNOWLEDGMENTS

This project was funded by grants A2568 from the NaturalScience and Engineering Research Council of Canada andRe 337/4-1 from the Deutsche Forschungsgemeinschaft.

REFERENCES

1. W. R. Brody, Digital Radiography (Raven, New York, 1984).2. W. Pratt, Digital Image Processing (Wiley, New York, 1978).3. F. Glazer, G. Reynolds, and P. Anandan, "Scene matching by

hierarchical correlation," in Proceedings of IEEE Meeting onComputer Vision and Pattern Recognition (Institute of Elec-trical and Electronics Engineers, New York, 1983), pp. 432-441.

4. P. Burt, "Fast filter transforms for image processing," Comput.Graphics Image Process. 16, 20-51 (1981).

5. G. Westheimer and S. P. McKee, "Spatial configurations forvisual hyperacuity," Vision Res. 17, 941-947 (1977).

6. R. J. Watt and M. J. Morgan, "The recognition and representa-tion of edge blur: evidence for spatial primitives in humanvision," Vision Res. 23, 1465-1477 (1983).

7. A. Papoulis, Systems and Transforms with Applications inOptics (McGraw-Hill, New York, 1968).

8. J. Daugman, "Six formal properties of two-dimensional aniso-tropic visual filters: structural principles and frequency/orien-tation selectivity," IEEE Trans. Syst. Man Cybern. SMC-13,882-888 (1983).

9. E. Ludvigh, "Direction sense of the eye," Am. J. Ophthal. 36,139-142 (1953).

10. J. J. Kulikowshi, S. Marcelja, and P. Bishop, "Theory of spatialposition and spatial frequency relations in the receptive fields ofsimple cells in the visual cortex," Biol. Cybern. 43, 187-198(1982).

11. T. M. Caelli and J. Yuzyk, "What is perceived when two imagesare combined," Perception 14, 14-48 (1985).

12. M. Hubner, I. Rentschler, and W. Encke, "Hidden-face recogni-tion: comparing foveal and extrafoveal performance," HumanNeurobiol. 4, 1-7 (1985).

13. T. M. Caelli, and J. Yuzyk, "On the extraction and alignment ofimage edges," Spatial Vis. 1, 205-217 (1986).

14. A. Rosenfeld and A. C. Kak, Digital Picture Processing (Aca-demic, New York, 1976).

T. Caelli and I. Rentschler


Recommended