+ All Categories
Home > Documents > Computational Modeling of Foveal Target Detection

Computational Modeling of Foveal Target Detection

Date post: 08-Jan-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
15
E _ 8 _ = a Computational Modeling of Foveal Target Detection Gary Witus, Turing Associates, Ann Arbor, Michigan, and R. Darin Ellis, Wayne State University, Detroit, Michigan This paper presents the VDM2000, a computational model of target detection designed for use in military developmental test and evaluation settings. The model integrates research results from the fields of early vision, object recognition, and psychophysics. The VDM2000 is image based and provides a criterion-independent measure of target conspicuity, referred to as the vehicle metric (VM). A large data set of human responses to photographs of military vehicles in a field setting was used to validate the model. The VM adjusted by a single calibration parameter accounts for approximately 80% of the variance in the validation data. The primary application of this model is to predict detection of military targets in daylight with the unaided eye. The model also has application to target detection prediction using infrared night vision systems. The model has potential as a tool to evaluate the visual properties of more general task settings. INTRODUCTION Need and Scope Visual detection by human observers to is considered to be the major operational threat to individual vehicle and unit operation security. The U.S. Army, as well as the NATO countries, have recognized the need for a greater under- standing of both the visual signature of their vehicles and the process of detection by human observers using unaided eyes and direct view optics. To this end, the military community im- plemented initiatives such as the U.S. Army Tar- get Acquisition Model Improvement Program in the early 1990s and the NATO Research and Technology Organization working group (SCI-12) on camouflage evaluation methods and models. This project was motivated by the need for tools for developmental test and evalu- ation of military vehicles employing designs and technologies for detection avoidance (i.e., "sig- nature management"). This project was specifi- cally motivated by the need for models of visual detection. The performance specification for develop- mental test and evaluation is expressed in terms of detection, given that the observer is looking at, or in the direction of, the vehicle (i.e., foveal target detection). It is not expressed in terms of search time or probability of detection during wide-field-of-view search. Search performance is of interest in operationaln.test and evaluation of large-scale combat, but it is not used in per- formance requirements for materiel develop- ment because search outcome and performance is influenced by many factors over which the materiel designer has no control (e.g., the tactics' and operational employment of 'friendly and threat forces; the large-scale terrain properties). The materiel designer has control over only the vehicle signature and, to some degree, how it in- teracts with its local surrounding. The perfor- mance specifications for signature management are expressed in terms of detection range and probability of detection. In the context of military target acquisition, detection means determin- ing that an object is a potential target, specifi- cally a military vehicle. Detection avoidance Address correspondence to R. Darin Ellis, Wayne State University, 87 E. Ferry St., Detroit, Ml 48187. HUMAN FACTORS, Vol. 45, No. 1, Spring 2003, pp. 47-60. Copyright C) 2003, Human Factors and Ergonomics Society. All rights reserved.
Transcript

E _ 8 _ = a

Computational Modeling of Foveal Target Detection

Gary Witus, Turing Associates, Ann Arbor, Michigan, and R. Darin Ellis, Wayne StateUniversity, Detroit, Michigan

This paper presents the VDM2000, a computational model of target detectiondesigned for use in military developmental test and evaluation settings. The modelintegrates research results from the fields of early vision, object recognition, andpsychophysics. The VDM2000 is image based and provides a criterion-independentmeasure of target conspicuity, referred to as the vehicle metric (VM). A large dataset of human responses to photographs of military vehicles in a field setting wasused to validate the model. The VM adjusted by a single calibration parameteraccounts for approximately 80% of the variance in the validation data. The primaryapplication of this model is to predict detection of military targets in daylight withthe unaided eye. The model also has application to target detection prediction usinginfrared night vision systems. The model has potential as a tool to evaluate the visualproperties of more general task settings.

INTRODUCTION

Need and ScopeVisual detection by human observers to is

considered to be the major operational threatto individual vehicle and unit operation security.The U.S. Army, as well as the NATO countries,have recognized the need for a greater under-standing of both the visual signature of theirvehicles and the process of detection by humanobservers using unaided eyes and direct viewoptics. To this end, the military community im-plemented initiatives such as the U.S. Army Tar-get Acquisition Model Improvement Programin the early 1990s and the NATO Researchand Technology Organization working group(SCI-12) on camouflage evaluation methodsand models. This project was motivated by theneed for tools for developmental test and evalu-ation of military vehicles employing designs andtechnologies for detection avoidance (i.e., "sig-nature management"). This project was specifi-cally motivated by the need for models of visualdetection.

The performance specification for develop-mental test and evaluation is expressed in termsof detection, given that the observer is lookingat, or in the direction of, the vehicle (i.e., fovealtarget detection). It is not expressed in terms ofsearch time or probability of detection duringwide-field-of-view search. Search performanceis of interest in operationaln.test and evaluationof large-scale combat, but it is not used in per-formance requirements for materiel develop-ment because search outcome and performanceis influenced by many factors over which themateriel designer has no control (e.g., the tactics'and operational employment of 'friendly andthreat forces; the large-scale terrain properties).The materiel designer has control over only thevehicle signature and, to some degree, how it in-teracts with its local surrounding. The perfor-mance specifications for signature managementare expressed in terms of detection range andprobability of detection. In the context of militarytarget acquisition, detection means determin-ing that an object is a potential target, specifi-cally a military vehicle. Detection avoidance

Address correspondence to R. Darin Ellis, Wayne State University, 87 E. Ferry St., Detroit, Ml 48187. HUMAN FACTORS,Vol. 45, No. 1, Spring 2003, pp. 47-60. Copyright C) 2003, Human Factors and Ergonomics Society. All rights reserved.

48 Sprinq 2003 - Human Factors

requirements for ground vehicles invariably ad-dress detection avoidance for a stationary vehi-cle. Moving target detection is considered incombat models but is not currently a major com-ponent in developmental test and evaluation.

Objectives

The goal of the project was to develop a robustand accurate analytic model to predict humanobserver performance in visual vehicle discrim-ination at the "detection" level for stationarytargets, given that the observer was looking at,or in the direction of, the target. The result ofthis work was a model of target detection calledVDM2000.

VDM2000 THEORY

Comparison with Recent Approaches

The visual search paradigm. An extensivebody of research in experimental psychologyhas been developed based on the visual searchparadigm. This paradigm, although useful instudies of basic visual attention and perception,does not apply to search for vehicles in naturalscenes. The visual search paradigm enables ex-perimental psychologists to control the visualcontent to which the participants respond andthereby to isolate specific aspects of vision. Thevisual search paradigm, as described by Wolfe(1998), is characterized by (a) discrete targetand distractor figures, (b) a well-defined spe-cific target description, (c) well-defined dis-tractors, (d) well-defined visual attributes withdistinctive and discrete values, (e) randomizedplacement, and (f) a noninterfering and nonin-formative background.

The standard experimental psychology visualsearch paradigm eliminates uncertainty on sev-eral dimensions important to understandingsearch in natural scenes. First, there is no uncer-tainty as to the appearance of the target or thequestion of what the objects are. Second, thisparadigm eliminates the contribution of localand global context in search. These propertiesmake the standard visual search paradigm use-ful for basic vision and attention research, butthey also make it inapplicable to search forvehicles in natural terrain.

Salience theory of search and computervision models. The salience theory of search

holds that the strength with which locationsdraw visual attention is proportional to the mag-nitude of resemblance between the target andscene locations (Itti & Koch, 2000; Wolfe, 1994).Computer-vision-based models of search andperformance based on salience theory have thechallenge of developing a computational metricof the extent to which locations resemble a tar-get. The computational salience models attemptto find targets in an image using cues and crite-ria that correspond, in theory, to those used bypeople. Computer vision systems have been ef-fective in structured environments and tasks, butthey have not proven effective in unconstrainedenvironments or for ill-structured tasks.

Even when computer vision systems are ef-fective, they do not use the same methods thathuman vision uses. Itti and Koch (2000) con-cluded that the performance of their algorithmis uncorrelated with human performance. Com-putational salience modeling is a special caseof the more general problem of automatic targetrecognition (ATR) algorithms. ATR algorithmsfocus on target detection as the end goal, ratherthan on increasing understanding of humanvision. Over the past 20 years, the U.S. Depart-ment of Defense has provided significant fund-ing for ATR development. The U.S. Army, Navy,Air Force, and Defense Advanced ResearchProjects Agency all have active ATR programs.To date, no robust and effective computationalmethod has been demonstrated to find loca-tions in images that resemble ground vehicles.No ATR algorithms have been successfullyplaced in the field, even with the more limitedgoal of cuing human observers.

Operational effectiveness models. The U.S.Army Night Vision Laboratory's target acquisi-tion models (O'Kane, 1995; Wilson, 2001) arethe preeminent examples of the "operationaleffectiveness" approach. These models haveshown some limited success in explaining per-formance for search and detection with low-resolution night vision devices. These modelswere intended to predict average performanceover a set of similar images, not to predict detec-tion performance for specific images (as the cur-rent modeling effort does). The Night Vision Labmodels have been less accurate when appliedto high-resolution visual search or to evaluatespecific images. In 1992 the U.S. Army initiated

48 Sprinq 2003 - Human Factors

FOVEAL TARGET DETECTION 49

a 3-year Target Acquisition Model ImprovementProgram, which ultimately failed to produce animproved model (Mazz, Kistner, Bushra, &Pibil, 1997).

Development and Historical Antecedents

The VDM2000 is a cascading sequence ofequations representing front-end vision, per-ceptual organization of the vehicle, local con-trast and clutter, evidence accumulation, andpsychophysical response. The richness of themodel comes primarily from the number ofdifferent factors and stages of processing thatare represented. It is a' low-threshold model,consistent with basic vision research results forsearch and cued detection.

The VDM2000 makes numerous contribu-tions to the science and practice of modelinghuman observers in areas such as contrast mech-anisms and measurement, the ability to accountfor masking attributable to local clutter, and rep-resentation of internal target structure. Table 1presents a list of VDM2000 contributions vis-

a-vis classical models (Matchko & Gerhart,2001; O'Kane, 1995; Wilson, 2001). A processflow of the VDM2000 is show in Figure 1.

Low-Level Vision Module: Achromatic andColor Vision

The model's front end (the side of the modelthat interacts with the inputs) represents bot-tom-up visual processing, including pupil reflex;cone saturation and spectral response; spatialfiltering and sampling resulting in the retinaloutput response to the image formed on the reti-na; and finally achromatic and color-opponentresponse (Kaiser & Boynton, 1996). See the boxlabeled "low-level vision module" in Figure 1 fora depiction of this module relative to the overallVDM2000 architecture.

The red-green-blue (RGB) image is convertedto the standard Commission Internationale del'Eclairage (CIE) tri-stimulus XYZ coordinates(CIE, 1932). The XYZ image is converted to animage of the long-, medium-, and short- (LMS)wavelength cone responses. The perceived color

TABLE 1: VDM2000 Contributions with Respect to Classical Models

Model Feature Classical Models" VDM2000

Basis for contrast Based on the input to Based on the output of the receptors (cones)measurement the visual system after luminance adaptation and cone nonlinearities

(Boynton & Whitten, 1970; Kaiser & Boynton,1996)

Contrast channels Evaluate only luminance Includes both luminance and chromatic contrastaccounted for contrast (Brainard, 1996; Wandell, 1995)

Ability to account Compute aggregate Uses a simple model of the structure andfor effects of object's statistics treating the entire appearance of 3-D objects under naturalinternal structure vehicle as homogeneous illumination (Moore & Cavanagh, 1998; Witus,

Gerhart, & Ellis, 2001)

Generality of Use an equation to Uses a band-limited, adaptive function tocontrast model compute contrast that compute local contrast in inhomogeneous

is well defined only for surroundings (Ahumada & Beard, 1998; Peli,uniform targets against 1990, 1997)uniform backgrounds

Ability to account Do not include a measure Computes local clutter and represents itsfor local clutter of local clutter or its effect masking effect on the efficiency of contrast for

on detection detection (DeValois & DeValois, 1990)

Model output Predict detection Generates a receiver operating characteristicindependent of the curve, expressing the probability of positiveresponse biases and false response as a function of the false alarm ratealarm context and the vehicle detection metric (Palmer et al.,

2000)

Matchko & Gerhart, 2001; O'Kane, 1995; Wilson, 2001.

FOVEAL TARGET DETECTION 49

Spring 2003 - Human Factors

1,'ILOCAL CLuErEmLocal Surround

AggClutter = [av,,,

S2.5DMask(Fron4 Side & Top

Surface Masks)

R MODULE vStd Dev, a Region2 I Area

I r

4PSYCinOPIIYSICAL RESPONSE MODULE

Probability of Detection,Pr(D) = VM / ( VM + a / Pr(FA))

Probability of Hit,Pr(H) = I - ( I-Pr(D)) S ( I-Pr(FA))

= ( VM + a) / ( VM + a / Pr(FA))

Figure 1. VDM2000 processing flow.

image (Acd; achromatic, red-green, and yellow-blue color-opponent channels) is modeled as alinear transformation of the LMS cone response.

Local Contrast and Local ClutterLocal contrast and clutter are calculated

based on a combination of the bottom-up infor-mation provided from the front-end vision mod-ule and some top-down assumptions regarding

the cognitive processes involved in the perceptu-al organization of the target. These modules areimplemented as independent modules in themodel architecture and are depicted in Figure 1.

Target organization: Shape from shading.With simple two-dimensional targets there istypically no need to consider the organizationof target perception. Ground vehicles are three-dimensional (3-D) structures that rarely present

LOW-LEVEL VISION MODULE

RGB -+ Human Perceptual Response(on each of the Acd channels)

4' ILOCAL CONTRASr MODULE

Contrast = SurfaceAverage -LocalSurroundAverage

AggContrast = YAZ I Contrast I

KIrVEHICLE METIUC MODULE

(Front/Side/Top)Metric = Region Area e Contrast a [AggContrast / AggClutter]Vehicle Metric, VM = Max { FrontMetric, SideMetric, TopMetric }

|r(FA), constanta,reaaee

I____l t3

50

FOVEAL TARGET DETECTION

a uniform appearance. The simple physics oflighting and geometry of configuration dictatethat some surfaces of a 3-D object will receivemore illumination than others because of shad-ing from surface orientation and shadowingfrom occlusion. Some surfaces are bright, someare dark. These systematic variations 'providethe pop-out cues that the human visual systemuses to segment objects from their surround-ings (Sun & Perona, 1996). Tarr, Kersten, andBuelthoff (1998) found that the human visualsystem encodes the direction of illuminationand its related effects (e.g., shading and shad-ows) and that these function to reveal a 3-Dobject's shape. Moore and Cavanagh (1998)found that two-tone images of 3-D objects werehighly effective at inducing 3-D object percep-tion despite their lack of shading, hue, or tex-ture. They concluded that the mechanisms of3-D object perception in natural scenes includea mechanism for processing scene illuminationwith respect to an internal memory representa-tion of a 3-D object's shaded appearance. Witusand Gerhart (2000) found that aggregate differ-ences among the front, side, and top regionsaccounted for more than 65% of the luminancevariance over the entire vehicle for a sample of44 images of military vehicles in natural settings.

VDM2000 does not attempt to emulate theprocess by which the visual system recognizesobjects. Rather, the model uses as an input arepresentation of the results of the top-downprocessing to measure the amount of, signalavailable for an observer to detect the object.In VDM2000, the generic representation of avehicle is the projection onto the image planeof the orthogonal front (or rear), side, and topvehicle surfaces - a 2.5-D representation.These vehicle surfaces have different orienta-tions with respect to the illumination sourceand observer. Consequently, there tends to below variation within a region,and high contrastbetween regions. These properties are charac-teristic of 3-D 'objects and are natural for asimplified representation of a military vehicle.

Clutter and contrast. Defining the local sur-round. For each of the three characteristic targetregions, and across the achromatic and two color-opponent channels of the Acd images in per-ceptual coordinates (the outputs of the model'sfront end), the VDM2000 computes local con-

trast and local clutter. The VDM2000 local-contrast and local-clutter measures are single-channel band-pass metrics. The upper limit onthe spatial band-pass is the Nyquist limit of theinput image (45 cycles/'). The lower,limit is afunction of the size and shape of the target re-gion and its local surround. The local surroundsare computed with the same algorithm; however,the local surround' for calculating contrast isnarrower than the local surround for calculatinglocal clutter. Aggregate values for each regionare then calculated for determination of the vehi-cle metric in the next step. Aggregate contrast isthe sum of the contrast magnitude on the indi-vidual achromatic and color-opponent chan-nels. Aggregate clutter is the root-sum-squareof the clutter (a) on the individual achromaticand color-opponent channels plus a componentrepresenting the internal noise of the visual sys-tem (CVIsION). See Figure 1 for a depiction of thecontrast and clutter modules as well as theirrelation to the overall model architecture.

Evidence Accumulation:The Vehicle Metric Module

The visual evidence is based on the spatialsignal from the bottom-up path (contrast andclutter metrics), organized by the region of inter-est from the top-down path (2.5-D mask). Thecontrast, clutter, and area are all combined intoa detectability metric for each surface region (seeFigure 1, vehicle metric module). The regionmetric is equal to region area times aggregatecontrast times contrast efficiency. The contrastefficiency is the ratio of aggregate contrast toaggregate clutter. Thus the region metric isequal to area times contrast squared, dividedby clutter.

The vehicle metric is the maximum of thethree surface region metrics. This is consistentwith the observation that detection is a functionof the dominant cue: Suppressing secondarycues has no effect if the dominant cue is nottreated, but suppressing the dominant cue re-duces detectability until it is reduced to the pointwhere it is no longer dominant.

Psychophysical Response Module

The vehicle metric is a criterion-independentmeasure of target signal. Observer detectionresponse also depends on a number of other

51

Spring 2003 - Human Factors

factors unrelated to the image (e.g., perceivedcost of a missed detection vs. perceived cost ofa false alarm; expectations regarding the fre-quency or density of targets; nontarget objectsthat resemble targets). The effect of these factorsis measured by the probability of false alarm(i.e., false alarm rate). The psychophysicalfunction expresses the probability of hit (i.e.,positive response in the presence of, but notnecessarily in response to, a target) as a functionof the criteria-independent measure of perceivedsignal and the probability of false alarm in theform of a receiver operating characteristiccurve. Several different psychophysical modelshave been used successfully in visual search anddetection (Palmer, Verghese, & Pavel 2000).VDM2000 uses a two-stage psychophysicalmodel. Pr(D), report of "detection" in responseto a target, is computed as a function of the vehi-cle metric (VM), the probability of false alarm,Pr(FA), and the calibration parameter, a. Thevalue of a is scaled to the false alarm rate, andone value of a is used for all response levels.The false alarm rate is used as a measure ofbias, or the willingness of an observer to callsome signal a target.

At any given response level characterized byPr(FA), Pr(D) is computed as the vehicle metricdivided by the vehicle metric plus a constant.The constant is divided by the false alarm rate.

Pr(D) = VMI[VM + a/Pr(FA)]. (1)

The equation for Pr(H) (probability of a"hit") as a function of Pr(D) and Pr(FA) is basedon a simple theoretical model. The model as-sumes that positive response to the target signaland positive response to other signals in the

image are processes that are parallel and inde-pendent. A "detection" is reported if there iseither a positive perception of the target (atprobability Pr[DI) or positive misperceptionof nontarget stimuli (at probability Pr[FA]).The standard equation for the probability of theunion of two events is used:

Pr(H) = 1 - [1 - Pr(D)][1 - Pr(FA)] (2)

- that is, no detection is reported only whenthere is no correct detection and no false alarm.These two equations can be combined into asingle psychometric equation for Pr(H) as afunction of the vehicle metric, Pr(FA), and thecalibration parameter a:

Pr(H) = (VM + a)/[VM + a/Pr(FA)]. (3)

VDM2000 IMPLEMENTATION ANDOPERATION

VDM2000 takes two images as inputs: animage of the vehicle in the scene and a maskimage designating the front (or rear), side, andtop surfaces of the vehicle. The specification ofthe projection of the front (rear), side, and topsurfaces onto the image plane is based on thegeometry of illumination, shadowing, and reflec-tion. The mask image is a three-color image inwhich the projections of the front, side, andtop are arbitrarily color coded red (255, 0, 0),green (0, 255, 0), and blue (0, 0, 255). The non-target area is black (0, 0, 0). An example tar-get image and accompanying mask is shown inFigure 2. Finally, the VDM2000 predicts Pr(H)and Pr(D) for several values of Pr(FA). Thevalues of Pr(FA) for which the analyst wants topredict Pr(H) and Pr(D) are entered as well.

Vt

P: - 7-, -+g 4,-

L -i _v 'ill-

I I I

Figure 2. Original image and "2.5-D" mask of the three characteristic surfaces.

52

FOVEAL TARGET DETECTION 53

A fully detailed discussion of the technicalrequirements for images (e.g., resolution, blur),calibration requirements (e.g. computing theRGB-to-XYZ transform), construction of masks,and other aspects of operating the model isbeyond the scope of this paper but appears inWitus (2001).

VALIDATION DATA: METHODS,PROCEDURE, AND RESULTS

Participants

The model validation experiments were con-ducted using paid participants recruited fromthe general population. Participants (18-45years of age) had vision corrected to 20/20 andnormal color vision (screened with a Bauschand Lomb Orthorater). Of the 20 participants,there were 19 men and 11 women.

Stimulus Material

The image set consisted of 1150 distinctimages, 800 images with vehicles in the sceneand 350 without. The images were created from44 color slide images taken during the 1995Distributed Interactive Simulation, Search, andTarget Acquisition Fidelity field test held by theU.S. Army Communications-Electronics Com-mand Night Vision and Electronic Sensors Di-rectorate (Toet, Bijl, Kooi, & Valeton, 1998).The color slides were digitized at high resolution

to produce a digital image of 6144 x 4096 pix-els. The original 44 images contained nine dif-ferent types of tracked and wheeled U.S. andforeign military vehicles in a variety of locations,aspects, postures (including partially obscured),and lighting conditions. Ranges were from 500to 5000 m. The original 44 images did not pro-duce wide variation in probability of detection insearch and detection test (Toet et al., 1998). The1150 images used for model validation in thisproject were down sampled with low-pass filter-ing, cropped to 1080 x 720 pixels, and then dig-itally manipulated to produce a wider range ofexpected' observer response. Details of the stim-ulus image manipulation process are given inW\itus (2001). Image manipulations were chosenso that the image set would vary with respect tofactors that are knowxn to influence visual per-ception, such as brightness, contrast, chromatic-ity, and spatial scale. Table 2 briefly describesthe image manipulations.

Apparatus

Stimuli were presented and responses col-lected via custom-made visual search experi-mentation software Written in Visual Basic 6.All responses were collected through a two-button mouse. The experimental software ran ona Gateway 2000 Pentium II PC using a 17-inch(43-cm) EV700 monitor with an ATI Rage Pro128 graphics card set to 24-bit color at 1280 x1024 pixel resolution.

TABLE 2: Important Dimensions of Variation in the Image Set

Dimension Source of Variation

Vehicle typeViewing azimuthViewing elevationRange

LightingUightingOverall illuminationContrast attenuationColorVehicle exposureClutterSignatureVehicle shadowsCamouflage netsCamouflage paint patterns

9 types of military vehicles360000 to 45013x variation in linear scale, resulting from a combination of originalimage target size and the down-sampling scale factor (2:1 and 3:1)direct, diffuse, or in shadowfrom front, back, side, or toplight or darkclear or hazefull color or gray scalefully exposed, partially exposed, or foreground foliagein the open or amid clutterbaseline, reduced contrast, or suppressed cue featurespresent or absentpresent or absentpresent or absent

FOVEAL TARGET DETECrION 53

Spring 2003 - Human Factors

The monitor was calibrated prior to testingby displaying red, green, and blue images at astaircase of intensities and measuring the CIExyY coordinates with a Minolta CS-100 col-orimeter. We also measured the display xyYwith only the low level of ambient illuminationmaintained for the test. These data were usedto estimate the parameters of the RGB-to-XYZtransfer function in the computer model (themonitor bias, the gamma exponent, and a linearmatrix). Ambient illumination from the wallbehind the monitor varied somewhat but wasapproximately 30 cd/m2 . Peak monitor outputwas 110 cd/m2. Reflection from the dark screenwas 5 cd/m2. The participants were seated 60inches (152 cm) from the monitor and behind atable to prevent them from moving forward. Atthis distance, the angular resolution of the mon-itor was 110 pixels/°, roughly equal to the limitof visual acuity for high-contrast signals at thecentral fovea.

Experimental Procedure

Instructions and pretest training. Observerswere individually tested in a self-paced manner.Prior to the experiment, the observers were pre-sented with a set of 27 closeup images of thevarious vehicles in the natural terrain. The close-up images were presented in a brief trainingsession to familiarize the participants with theprocedure. The results from the familiarizationtrials were not included in the experiment re-sults, and the training images were not used inthe experiment.

Block procedure. The test was organized intofour experimental blocks based on systematicdifferences in the overall scene appearance:baseline images, darkened images, lightenedimages with contrast attenuated, and gray-scaleimages. Within each block, image order wasrandomized without replacement across trials.

Trial procedure. The testing was self-paced.Before a scene was presented, a random spatialnoise pattern was displayed for 750 to 2000 ms.Target location was cued with a red circle 300pixels (approximately 30) in diameter, centeredon the target location. When a trial used animage that did not contain a target, the cuingcircle was centered at a location where it wasphysically possible for a vehicle to be. This cueoriented the participant to the vehicle location

without distracting from or interfering with ve-hicle perception. The participant would click themouse to display the image and then click againwhen he or she had decided whether or not avehicle was in the scene. For trials in which theparticipants identified a vehicle (or thought thatthey did), they were instructed to click on the ve-hicle itself to indicate its location in the image.The scene was then masked with the noiseimage again, and a response menu appeared.The observer selected from one of the follow-ing four responses: (a) "definitely no vehiclewas present," (b) "unsure whether or not avehicle was present," (c) "confident that a vehi-cle was present," and (d) "certain that a vehiclewas present."

In addition to the menu choice, the responsetime between image display and mouse clickwas recorded. There was a maximum responsetime of 60 s, at which time the four-choice re-sponse menu was displayed. Following menuselection, a dialog box appeared that providedfeedback on target presence, response time,and the number of trials remaining in the block.Clicking "OK" on this dialog box started thenext experimental trial.

Data Treatment

For the specific purposes of model valida-tion, each trial response provided a single datapoint: the rating of target-present confidence.The first step in data reduction and analysis wascoding the observers' target-present responselevel (on a 1-4 scale) into correct and incor-rect decisions. Three different response levelswere used: liberal, moderate, and conservative.For target-present images, hit rate (HR) at theliberal level was the proportion of responsesthat had a rating of 2 (maybe) or higher. HR atthe moderate and conservative levels were theproportion of responses that had a rating of 3(probably) or higher or 4 (definitely), respective-ly. The false alarm rate (FAR) at each responselevel was calculated from the 350 images with-out targets. Thus, for each of the images, weobtained three estimates of HR as a functionof FAR.

Aggregate Results

Empirical estimates of Pr(FA) were calculat-ed at each response level by aggregating over all

54

FOVEAL TARGET DETECTION

participants and over all images without targets.Empirical estimates of Pr(H) were calculated ateach response level for each image with' a ve-hicle present by aggregating over all partici-pants. Figure 3 shows a plot of Pr(H) versusPr(FA) at each of the three response levels. Pr(H)is aggregated over all vehicles, and Pr(FA) isaggregated over all vehicle-absent scenes. Wecomputed Pr(H) for an image by simply pool-ing the participants' responses and dividing thenumber ,of positive responses by the total num-ber of responses. Table 3 shows the mean Pr(H)for different partitions of the image set.

1.000-

0.800

0.600- f Confident

Pr(FA)= 0.13IL. Pr(H) = 0.58

0.400

CertainPr(FA) = 0.06

0.200- Pr(H) - 0.43

0.0000.000 0.200 0.400

Table 4 shows the proportion of variance inPr(H) explained by each of the major factorsover all images with vehicles, calculated as 2.

The base scene accounts for 49% to 66% of thevariance in Pr(H). Interaction effects accountfor 25% to 39% of the variance in Pr(H). In-dividually, the variations in vehicle signaturemodification, scene modification, and scale mo-dification had only small effects. Large vehiclesin the open were still large vehicles in the open.In combination, however, and in combinationwith the variation in the base scene, these fac-tors had significant effects;

0.600 0.800' 1.000

Pr(FA)

Figure 3. Aggregate Pr(H) versus Pr(FA).

TABLE 3: Mean Value of Pr(H) by Level of Factors in the Experimental Design

Factor Image Set Partition Certain Confident Unsure

Vehicle Baseline vehicles .51 .67 .86Low-contrast vehicles .34 .50 .74Special vehicle variations .39 .54 .76

Scene Unmodified scene .46 .61 .813:1 Scale factor .39 .53 .772:1 Scale factor .46 .62 .83Darkened scene .42 .58 .80Hazy scene .38 .52 .76Gray-scale scene .43 .59 .81

55

56 .pn 203-Hma atr

TABLE 4: Pr(H) 2 Results for All Images with Vehicles

,. Proportion of Variance Explained, .92

Factor Certain Confident Unsure

Base scene .66 .60 .49Vehicle modification .07 .07 .08Scene modification .01 .01 .01Scale modification .01 .02 .03Total main effects .75 .70 .61

Table 5 shows the average Pr(H) for each ofthe partitions of the major factors in the experi-mental design. Comparing the unmodified vehi-cle aggregates with the reduced contrast vehicleaggregates shows that the vehicle contrast re-duction lowered Pr(H) by .17 at the "certain"criterion, .13 at the "confident" criterion, and.09 at the "unsure" criterion. Comparing the un-modified scene aggregates with the darkenedscene aggregates shows that darkening the scenelowered Pr(H) by .04 at the "certain" criterion,.04 at the "confident" criterion, and .01 at the"unsure" criterion. Comparing the unmodifiedscene aggregates with the reduced contrast sceneaggregates shows that reducing contrast overthe entire scene lowered Pr(H) by .08 at the"certain" criterion, .09 at the "confident" crite-rion, and .06 at the "unsure" criterion. Com-paring the unmodified scene aggregates withthe gray-scale scene aggregates shows that re-moving color from the scene lowered Pr(H) by.04 at the "certain" criterion, .02 at the "confi-dent" criterion, and .00 at the "unsure" criterion.These are only aggregate effects. For specificscenes, the effects will be more or less depend-ing on the interactions in the specific scene.

MODEL EVALUATION

Our evaluation goal was to determine if themodel is accurate in aggregate and if the modelhas systematic biases with respect to identifi-able properties or characteristics of the inputimage. The factors and levels of the experimentwere designed to enable us to assess biaseswith respect to a variety of characteristicsknown to influence target detection (e.g., size,contrast, luminance).

Outliers. The validation data set was exam-ined to determine whether or not all the imageswere appropriate for applying the model. Ofthe 800 images with vehicles present in thevalidation data set used, 118 were not appropri-ate for applying the model. Of these 118 in-appropriate images, 95 were derived from 4 ofthe 44 base scenes. In these scenes, one edgeof the vehicle is aligned with a linear terrainfeature (such as a ridge), and on the remainingthree sides the vehicle had low contrast with itssurroundings. When this combination of condi-tions occurred, the observers tended to interpretthe contrast at the target edge as a continuationof the terrain feature. The model, which does

TABLE 5: Mean Pr(H) by Data Partition over All Scenes with Vehicles

Data Partition Certain Confident Unsure

BAII images with vehicles .43 .58 .80Unmodified vehicle .52 .68 .87Reduced contrast vehicle .35 .51 .75Special variation vehicle .40 .55 .78Unmodified scene .47 .62 .82Darkened scene .43 .58 .81Reduced contrast scene .39 .53 .76Gray-scale scene .43 .60 .823:1 scale factor .40 .54 , .772:1 scale factor .46 .63 .83

56 Sprinci 2003 - Human Factors

FOEA TAGE DEETO 57 w--w--w

not analyze terrain features, interpreted the con-trast across the edge as visual evidence for detec-tion. The other 23 inappropriate images werederived from 1 of the 44 base scenes: In thisscene there was unusual lighting and shadow-ing such that a patch of glint from an elevation-angle view of the side was all that was visible.Observers could see the patch of glint but didnot interpret it as evidence of a vehicle, where-as the model did. In both of these situations,the images violate the assumption of the natureof the top-down processing involved in detect-ing a 3-D object in a natural scene. The modelvalidation proceeded with 682 images and theirassociated observer responses.

Accuracy and Explanatory Power

The direct measure of accuracy is the root-mean-square (RMS) prediction error. Over aset of N images, the RMS prediction error, E, is

EF= [XJ Pr(H)prdicted ) Pr(H)obseed)N s2. (4)

A related measure of performance is the ex-planatory power of the model. It is equal to thefraction of the variance in experimental Pr(H)explained by the model. The fraction of varianceexplained by the model, 4, is 1 minus the ratioof the RMS prediction error squared to the vari-ance in Pr(H)Obse,, over the image set.

Model Fit -'Certain" Response(89% Base Scenes, 682 Images)

nnf fl I I T I 1 1 _am ci amI . U U D tX 3a C3 9D 0D

D o u n mo0.80 -nnci no0

B3 m oD I n2

0.60 m D A

,0.40. og r L |o13

0.20 aam

0.20. _ -- o0.20 0.40 0.60 0.80

Pr(Hit) Predicted0.00 1.00

) = 1 - EI2/variance[Pr(H)observed1). (5)

The term explanatory power is used ratherthan r2 to maintain the distinction that linearregression allows two free parameters, whereasVDM2000 has only one free parameter. Themeasures computed in Equations 4 and 5 under-state the performance of the model because theyinclude sampling error in empirical Pr(H) aspart of the prediction error.

Figures 4 to 6 show scatter plots of observedversus predicted Pr(H) at the three responselevels. The scatter between predicted Pr(H)and observed Pr(H) includes sampling error inobserved Pr(H) in addition'to error betweenpredicted Pr(H) and the underlying true Pr(H).Table 6 summarizes the accuracy and explana-tory power of the model and the mean error inobserved Pr(H).

The VDM2000 accounts for more than 80%of the variance in Pr(H). The model remainsaccurate, although with somewhat reduced ex-planatory power at the lower confidence re-sponse levels. The accuracy of the model isactually higher at the lower confidence responselevels (i.e., the RMS prediction error is lower;see Table 6). The RMS error is lower becausethe increased false alarm rate compresses therange of responses. The explanatory power ofthe model - that is, the fraction 'of variance in

Model Fit - "Confident" Response(89% Base Scenes, 682 Images)

1.00

' 0.80

0)U)

.0 0.600*

I 0.40

0.20

0.000. 00 0.20 0.40 0.60 0.80

Pr(Hit) Predicted1.00

Figure 4. Predicted versus observed Pr(H) at the Figure 5. Predicted versus observed Pr(H) at the'certain" response level. "confident" response level.

0

ai)U).00

Ir

C3no max

cmir o rp a

t o cOua I

. .C n [EC a

30 cDom v

iI a

im jn oO °

.P 1 n _

57FOVEAL TARGET DETECriON

Spring 2003 - Human Factors

Model Fit - Confident" Response(89% Base Scenes, 682 Images)

1.00 .n nal m CZ EDME

nmcor

'0.2> D.u . D-.3

Q 0.60 -_D__0 060 D _ a J30 [D

0.40 MD c

0.20

0.00

0.00 0.20 0.40 0.60 0.80 1.00

Pr(Hit) Predicted

Figure 6. Predicted versus observed Pr(H) at the"unsure" response level.

Pr(H) that is explained - is a better measure ofperformance because it normalizes to the vari-ance in the phenomena to be explained. The ex-planatory power of the model is lower at thelower confidence response levels because ofthe contribution of false alarms to Pr(H). Theaverage rate of false alarms is an input to themodel, but false positive responses to imagesadd variance to Pr(H).

Bias

The term model bias refers to a situation inwhich the model's prediction errors are notzero-mean normally distributed (i.e., the modelunderpredicts or overpredicts actual behaviorin a reliable manner). Bias in a model can be evi-dence that a model is improperly or incomplete-ly specified. The prediction bias with respect tosubset S, B(S), of the images is

B(S) = Es[Pr(H)pred() - Pr(H)Obs()] -EA1I[Pr(H)P.d(W) - Pr(H)ObO(j)]* (6)

The bias of the model is shown in Table 7.The net bias is the average prediction error forthe partition (i.e., factor of interest) minus theaverage prediction error over all the cases. Anegative bias means that the model's predictionof observer hit rate was, on average, less thanthe empirical Pr(H) - that is, that the modelunderestimated Pr(H). A positive bias meansthat the model overestimated Pr(H). For com-parison, the expected error magnitude - thesampling error in empirical Pr(H) (at the "cer-tain" response level) divided by the square rootof the number of cases - is also shown.

There are several important observations tomake with regard to the data in Table 7. Thereare no large biases. Except for the special vari-ations at the "unsure" response level, all of thebiases are less than 2%. Pr(H) is systematical-ly underestimated for the baseline vehicles andoverestimated for the reduced signature vari-ants, but the bias is not large compared withthe adjusted sampling error.

CONCLUSIONS

The vehicle metric as computed by the VDM-2000 does a good job of accounting for variancein foveal detection performance. The VDM2000represents a substantial contribution to boththe state of the art in vision modeling and thestate of the art in developmental test and evalu-ation tools. As previously noted, in some visualtarget acquisition situations, the model is notcompletely applicable. These are situations inwhich observers are deceived or misinterpretthe visual signals. VDM2000 implicitly assumesthat all of the perceived elements of vehicle

TABLE 6: VDM2000 Performance in Terms of Accuracy (RMS PredictionError), Explanatory Power (Fraction of Variance Explained), and Mean Error

RMS ExplanatoryRating Prediction Error Power Mean Error

Certain .136 .803 .017Confident .153 .740 .028Unsure .119 .603 .008

58

FOVEAL TARGET DETECTION 59

TABLE 7:Net Prediction Bias by Factor and Level in the Experimental Design

Data Partition Bias at "Certain" Bias at "Confident" Bias at "Unsure" Sampling Error/NV2

High resolution (2:1) .0042 -.0005 .0035 .0037Reduced resolution (3:1) -.0028 .0005 -.0032 .0034Darkened scene .0045 .0030 -.0031 .0051Lightened "haze" scene -.0136 -.0003 .0020 .0051Original scene .0044 .0019 .0069 .0051Baseline vehicle signature -.0173 -.0192 -.0179 .0039Reduced contrast signature .0153 .0153 .0100 .0039'Special variation signature .0051 .0117 .0228 .0061

signature constitute evidence for vehicle detec-tion. More development effort could be placedin a cognition/decision-making module to helpdisentangle some of these effects as well as inextending the model to applications requiringhigher levels of target discrimination (e.g.,recognition and identification). It is possiblethat a richer vehicle template may be needed(e.g., distinguishing turret and chassis regionsor other characteristic features, such as a guntube). This is a trivial extension to the software;however, there are no available data with whichto test and evaluate the model.

The model could potentially be adapted totarget detection tasks, such as viewing thermalimages. However, the characteristic vehicle re-gions for visual image detection will probablynot be the same for thermal image detection.The characteristic visual regions (i.e., the imageorganization that accounts for variance over thevehicle image) are created by the vehicle surfacegeometry and angles with respect to illumina-tion and observer, in concert with the observer'sinnate understanding of 3-D objects in a 3-Dworld. Thermal signatures are created by tem-perature gradients over the vehicle. The thermalregions (i.e., those characteristic regions thataccount for temperature variation) are createdby various heat-generation processes and lagsassociated with different thermal mass. In ther-mal imaging applications, the vehicle imageregions should reflect areas with different ther-mal mass (because they will heat and cool at dif-ferent rates), regions corresponding to differentheat sources (e.g., engine compartment, exhaust,and tracks), and surface geometry (reflectionand shadowing of infrared illumination). It maybe the case that the top-down processes involvedin interpreting a thermal image of a scene are

very different from those involved in the taskof interpreting a naturally lit scene.

Computer models of vehicle detection areneeded for early evaluation of design alternativesunder a wide variety of terrain, lighting, andweather conditions. Field exercises have a num-ber of key limitations that make the VDM2000an efficient alternative: They are expensive,they require physical prototypes for each designalternative, they typically have a limited varietyof terrain conditions, and their field atmosphericand environmental conditions are not controlled.The field of computer graphics is advancingrapidly in its ability to capture, represent, andrender highly realistic imagery. The combina-tion of these techniques with models of humanperformance such as the VDM2000 could con-ceivably give guidance to designers for any pos-sible combination of design alternatives andoperational conditions.

ACKNOWLEDGMENTS

This research was funded under Departmentof Defense Small Business Innovation ResearchProgram Contract DAAE07-C-97-X101 withthe U.S. Army Tank-Automotive and Arma-ments Command. The views and opinions ex-pressed in this paper are those of the authorsand do not represent a position of the sponsor-ing agency.

REFERENCES

Ahumada, A. J., Jr.' & Beard, B. L. (1998). A simple vision modelfor inhomogeneous image quality assessment. In 1. Morreale(Ed.), Society for Information Display international sympo-sium digest of technical papers (Vol. 29, paper 40. 1). SantaAna, CA: Society for Information Display.

Boynton, R. M., & Whitten, D. N. (1970). Selective chromaticadaptation in primate photoreceptors. Vision Research, 12,855-874.

59FOVEAL TARGE-r DETEcrioN

60

Brainard, D. H. (1996). Appendix IV. Cone contrast and opponentmodulation color spaces. In P. K. Kaiser & R. M. Boynton(Eds.), Human color vision (pp. 563-579). Washington, DC:Optical Society of America.

Commission Internationale de l'Eclairage. (1932). CommissionIntemationale de l'Eclairage Proceedings, 1931. Cambridge,UK. Cambridge University Press.

DeValois, R. L., & DeValois, K. K. (1990). Spatial vision. NewYork Oxford University Press.

Itti, L., & Koch, C. (2000). A saliency-based search mechanism forovert and covert shifts of visual attention. Vision Research, 40,1489-1506.

Kaiser, P. K., & Boynton, R. M. (1996). Human color vision.Washington, DC: Optical Society of America.

Matchko, R. M., & Gerhart, G. (2001). ABCs of foveal vision.Optical Engineering, 40, 2735-2745.

Mazz, J., Kistner, R., Bushra, A.. & Pibil, W. (1997). Search andtarget acquisition model comparison: Unaided eye analysis(Division Note No. DN-CI-I 1). Aberdeen Proving Ground,MD: U.S. Army Material Systems Analysis Activity.

Moore, C., & Cavanagh, P. (1998). Recovery of 3D volume from2-tone images of novel objects. Cognition, 67, 45-71.

O'Kane, B. L. (1995). Validation of prediction models for targetacquisition with electro-optical sensors. In E. Peli (Ed.), Visionmodels for target detection and recognition (pp. 192-218).River Edge, NJ: World Scientific.

Palmer, J., Verghese, P., & Pavel, M. (2000). The psychophysics ofvisual search. Vision Research, 40, 1227-1268.

Peli, E. (1990). Contrast in complex images. Joumal of the OpticalSociety ofAmerica A, 7, 2030-2040.

Peli, E. (1997). In search of a contrast metric: Matching the per-ceived contrast of Gabor patches at different phases and band-widths. Vision Research, 37, 3217-3224.

Sun, l., & Perona, P. (1996). Preattentive perception of elementarythree-dimensional shapes. Vision Research, 36, 2515-2529.

Tarr, M. l., Kersten, D., & Buelthoff, H. H. (1998). Why the visualrecognition system might encode the effects of illumination.Vision Research, 38, 2259-2275.

Toet, A., Bijl, P., Kooi, E L, &Valeton, M. (1998).A high-resolutionimage data set for testing search and detection models (TNOReport TM-98-A020). Soesterberg, Netherlands: TNO HumanFactors.

Wandell, B. A. (1995). Foundations of vision. Sunderland, MA:Sinauer Associates.

Wilson, D. (2001). Image-based contrast-to-clutter modeling ofdetection. Optical Engineering, 40, 1852-1857.

Witus, G. (2001). Vehicle discrimination model, VDM2000, ver-sion 2.001, final results and documentation (Final Report toU.S. Army Tank-Automotive and Armaments Command,Contract Number. DAAE07-97-C-X101, P00005). Ann Arbor,Ml: Turing Associates.

Witus, G., & Gerhart, G. (2000). A contrast metric for 3-D vehi-cles in natural lighting. In RTO Meeting Proceedings 45, searchand target acquisition (pp. 12.1-12.10). Neuilly-sur-Seine,France: North Atlantic Treaty Organization, Research andTechnology Organization.

Witus, G., Gerhart, G. R., & Ellis, R. D. (2001). Contrast modelfor three dimensional vehicles in natural lighting and searchperformance analysis. Optical Engineering, 40, 1858-1868.

Wolfe, l. M. (1994). Guided search 2.0: A revised model of visualsearch. Psychonomic Bulletin and Review, 1, 202-238.

Wolfe, l. M. (1998). What can I million trials tell us about visualsearch? Psychological Research, 9(1), 33-39.

Gary Witus is president of Turing Associates, AnnArbor, Michigan. He received his Ph.D. in industrialengineering from Wayne State University in 2002.

R. Darin Ellis is an associate professor at WayneState University, where he is on the faculty of theInstitute of Gerontology, Department of Industrialand Manufacturing Engineering, and Department ofBiomedical Engineering. He received his Ph.D. inindustrial engineering from Pennsylvania StateUniversity in 1994.

Date received: October 17, 2001Date accepted: January 15, 2003

Spring 2003 - Human Factors

COPYRIGHT INFORMATION

TITLE: Computational Modeling of Foveal Target DetectionSOURCE: Hum Factors 45 no1 Spr 2003

WN: 0310502939004

The magazine publisher is the copyright holder of this article and itis reproduced with permission. Further reproduction of this article inviolation of the copyright is prohibited.

Copyright 1982-2003 The H.W. Wilson Company. All rights reserved.


Recommended