Lauwereyns & Wisnewski Rewardoriented bias 1
A ReactionTime Paradigm to Measure
RewardOriented Bias in Rats
Johan Lauwereyns & Regan G. Wisnewski
Victoria University of Wellington
Running head: Rewardoriented bias
Journal of Experimental Psychology: Animal Behavior Processes
In press
Lines of text (main text + references): 414
Correspondence: Johan Lauwereyns
School of Psychology Victoria University of Wellington
P. O. Box 600 Wellington 6006
New Zealand Email: [email protected]
Phone ++6444635042 Fax: ++6444635402
Lauwereyns & Wisnewski Rewardoriented bias 2
Abstract
A nosepoke task with asymmetric positionreward mapping was devised to distinguish between
effects of bias and sensitivity in reaction times of rats. In all trials, the rats had to poke their nose
in the hole to the left or to the right of center, corresponding to the side where four lights were
illuminated, ignoring distracters on the other side. Reaction times were faster for largereward
trials than for smallreward trials. In largereward trials, there was no influence of the number of
distracters, whereas in smallreward trials, distracters produced an increase of reaction time.
Analysis of reactiontime distributions according to a linear model of decision making suggested
that most of the systematic variability was due to a rewardoriented bias.
(118 words)
Key words:
Nose poke, rat, reward, bias, reaction time
Lauwereyns & Wisnewski Rewardoriented bias 3
A ReactionTime Paradigm to Measure RewardOriented Bias in Rats
Successful behavior requires the ability to predict and exploit opportunities that lead to desirable
outcomes (Dickinson & Balleine, 1994). Recent behavioral research has focused on the ability of
animals such as rats to detect reward rates (e.g., Gallistel, Mark, Adam, & Latham, 2001; Gharib,
Gade, & Roberts, 2004). Concurrently, there has been a veritable explosion of
electrophysiological research on the rewardrelated activities of single neurons in rats and
monkeys (e.g., Kobayashi et al., 2002; Lauwereyns et al., 2002a,b; Pan, Schmidt, Wickens, &
Hyland, 2005; Platt & Glimcher, 1999; Pratt & Mizumori, 2001; SchmitzerTorbert & Redish,
2004; Schultz, Dayan, & Montague, 1997). Here, we present a new behavioral paradigm with
rats that will be particularly suited to examine the covariation between neurophysiological
measures and behavioral indices of rewardoriented perception and action.
Arguably the most promising approach to marry the two research fields would be to
capitalize on the information that can be read out of reactiontime distributions (Luce, 1986).
Indeed, this proposal has already been put forward by several researchers (Carpenter, 2004;
Smith & Ratcliff, 2004). One of the principle advantages of reaction times as a dependent
measure, rather than a more discontinuous measure such as spatial choice or percent correct, is
the increase in statistical power when comparing the variability of this behavioral measure with
variability in neuronal spike trains on a trialbytrial basis. This increased statistical power is
crucial when one considers that optimal recording conditions for individual neurons are difficult
to maintain for even a single session with an experimental animal.
Quite apart from logistic considerations, however, analyses of reactiontime distributions
may also be suitable to distinguish between alternative mechanisms of decision making. Taking
Lauwereyns & Wisnewski Rewardoriented bias 4
the view that the decisionmaking process consists of a “decision signal” that linearly rises to a
“decision threshold,” one can model reactiontime data using several parameters reflecting the
slope of the decision signal, the starting point of the decision signal, and the level of the
threshold (i.e., the Linear Approach to Threshold with Ergodic Rate or LATER model;
Carpenter, 2004; Carpenter & Williams, 1995; Reddi & Carpenter, 2000; see Figure 1a).
Changes to the slope of the decision signal would then be distinguishable from changes to the
starting point of the decision signal in the shapes of reactiontime distributions (see Method for
LATER analysis). This approach gives researchers very valuable computational tools in the
study of neuronal activity. Singleunit recordings in the frontal eye field of monkeys have
already shown slopes in the rise of neuronal activity that closely correlate with eye movement
latency, as if the activity builds up toward a fixed decision threshold for movement initiation
(Hanes & Schall, 1996; see also a comprehensive discussion of these ideas in Gold & Shadlen,
2001). Thus, analyses of reaction times and neurophysiological measures can be used as
convergent operations in the study of the mechanisms underlying decision making.
With respect to the reward factor in decision making, the typical reduction of reaction
times observed for largereward trials as compared to smallreward trials (Watanabe, K. et al,
2003; Watanabe, M. et al., 2001) may be underpinned by two different mechanisms: sensitivity
on the one hand, and bias on the other. Sensitivity refers to the quality of decisionmaking as a
function of the ratio between signal and noise, and would correspond to the slope of the decision
signal. The prospect of reward may improve the signaltonoise ratio (and lead to a steep rise of
the decision signal) for stimuli associated with a high reward value (see Figure 1a, left panel). In
contrast, bias refers to the a priori likelihood of making one response rather than another,
regardless of incoming perceptual information. The prospect of reward may create a bias by
Lauwereyns & Wisnewski Rewardoriented bias 5
increasing the likelihood of making a response with a high reward value, and would correspond
to moving the starting point of the decision signal closer to the decision threshold (see Figure 1a,
right panel). According to the LATER model, effects of rewardoriented sensitivity or bias
would leave different signatures in the reactiontime distributions. Our concrete aim in the
present study, then, was to develop a reactiontime paradigm with rats that would enable us to
examine these signatures.
Nosepoke paradigms may be the most appropriate for measuring reaction times in rats,
and have been used successfully with locationcueing tasks (Ward & Brown, 1996) and five
choice serial reactiontime tasks (Robbins, 2002). For the present study, we developed a nose
poke paradigm with a single spatialchoice task under an asymmetric reward schedule. Rats were
required to poke their nose in the hole adjacent to the center, corresponding to the side where
four lights were illuminated. To do so, they had to ignore distracters on the other side. For each
rat, one side was always associated with a large reward, whereas the other side was associated
with a small reward. We expected that reactions would be faster in largereward than in small
reward trials. If rats were biased to respond to the largereward side, their reaction times should
be at maximum speed in that direction, regardless of the level of visual stimulation on the other
side. In contrast, if the behavior was mainly determined by the efficiency of visuospatial
processing, the reaction times in largereward trials should be affected by the number of
distracters, with slower reaction times as the signaltonoise ratio decreases (i.e., due to an
increase in the number of distracters). Reactiontime analyses according to the LATER model
would enable us to independently evaluate the same hypothesized mechanisms.
Lauwereyns & Wisnewski Rewardoriented bias 6
Method
Subjects, Housing and General Procedures
Subjects were 12 male SpragueDawley rats, weighing 190 – 275 mg, approximately 3 months
old at the commencement of training. They were housed individually in home cages containing
untreated wood shavings, renewed on a regular basis. Water was available ad libitum. Subjects
were fed individually after completion of each testing session to preserve their 85% freefeeding
body weight throughout the duration of the experiment. The housing room was maintained at an
ambient temperature (22 O C); and humidity (74%); and a reversed 12 hr light/dark cycle (7.30
a.m. to 7.30 p.m.) to ensure that experimental sessions were conducted in darkened conditions, as
this is when rats are mainly active. The experiments were performed in adherence to the legal
laboratory animal care principles of the Victoria University of Wellington Animal Breeding
Facility, and the Victoria University of Wellington Animal Ethics Committee.
Behavioral Apparatus for Nose Poking
Two 9hole boxes (MEDNP9LB1; MED Associates, St Albans, VT) with dimensions
measuring 53.3 cm long x 34.9 cm wide x 26.0 cm high were used to conduct the experimental
procedure. Each chamber was fitted within a soundattenuating box. All events were scheduled
and recorded by a Dell personal computer running MEDPC software (MED Associates, St
Albans, VT). The front and rear walls of each chamber were constructed of metal. The left and
right walls and the ceiling were constructed of transparent plexiglas. The left wall also
functioned as the entrance to the chamber. The floor of the chamber was constructed of
horizontal metal rods spaced 1 cm apart. Both boxes contained an arc of 9 contiguous apertures
Lauwereyns & Wisnewski Rewardoriented bias 7
set into the curved front wall. Each aperture was 2.5 cm x 2.5 cm square and 2.2 cm deep. Light
emitting diodes (LED) at the rear of each hole could be turned on and off automatically to
provide visual cues specific to each hole. Vertical infrared detectors at the front of each nose
poke hole allowed the recording of the response latencies and locations. A 0.1 ml reinforcer
(20% sucrose solution; 400 gm caster sugar: 1600 ml water) was delivered via a metal dipper
centered in the rear wall. The light in the food aperture was illuminated when the reinforcer was
delivered and was extinguished when the reinforcer was collected.
Training
Rats were assigned to a specific experimental chamber where they participated individually in all
sessions of 30 min duration. First, using a sequential “autoshaping” program over a period of 1
2 weeks, the rats were trained to respond by nose poking to visual stimuli. Then, task parameters
were changed gradually, including the number of visual stimuli, the nosepoke duration and the
reward schedule, until the rats were able to perform at least 80 correct trials of the complete
asymmetric reward paradigm (as described below) in a session of 30 min for at least 3
consecutive sessions. After 5 weeks of training, 2 rats had not yet met this criterion. At this
point, the data collection for the present study commenced with the remaining 10 subjects.
Asymmetric Reward Paradigm (ARP)
Sessions were conducted daily, for a maximum of 200 trials or until 30 min had elapsed within a
session. The ARP comprised the following sequence of events (see also Figure 1b).
Centering. A trial started when only the center hole light was illuminated. This light
signalled that the rat was required to make a nosepoke response immediately and sustain it for a
Lauwereyns & Wisnewski Rewardoriented bias 8
duration of 500 ms. This requirement ensured that a rat always started a trial from the same
position (i.e., centered at the front wall of the chamber). If the rat did not make a nosepoke
response within 10 s, or if it did not keep its nose in the central hole for 500 ms, the light was
extinguished. After a delay of 30 s, the light in the front center hole was reilluminated to give
the rat a new opportunity to proceed with the trial.
Peripheral Stimulus Presentation. Once a nosepoke response had been sustained for 500
ms in the central hole, the central light was extinguished and peripheral stimuli were presented.
In each trial, the rat was required to respond with a nose poke to the hole adjacent to the center
hole in the direction where 4 LEDs were illuminated. The target side, then, was defined as the
side with 4 illuminated LEDs. On the other side, there could be between 03 LEDs illuminated.
These were termed ‘distracters.’ The distracter formation was always organized as a linear array
from center to periphery, making sure that there were no gaps (i.e., the LEDs that were not
illuminated were always further in the periphery than the distracters). In this way there were 8
possible stimulus configurations, consisting of 2 possible target sides combined with 4 possible
distracter arrangements.
The moment at which the rat broke away from the central hole following the peripheral
stimulus presentation was registered as the break time. Reaction time (RT) was defined as the
time between the onset of peripheral stimulus presentation and the moment at which the rat
reached the correct hole on the target side. The rat had to sustain this nose poke for a duration of
at least 200 ms. Note that in this procedure, the rat is not punished for poking its nose in different
holes than the one defined as the correct hole. Effectively, then, the procedure cannot induce
erroneous choice trials, even though the rat might take a very long time (theoretically, until
infinity) to make the correct response. In this way, our experimental paradigm accommodates
Lauwereyns & Wisnewski Rewardoriented bias 9
one of the most controversial features of the LATER model (Smith & Ratcliff, 2004), in that the
model is not capable of producing errors.
On each trial, the stimulus configuration was determined by a quasirandom sequence
with the constraints that, for every block of 16 trials, there was an equal number of trials for each
magnitude of reward, and no more than 4 consecutive trials with the same reward value.
Asymmetric reward. To investigate the influence of incentive, an asymmetric reward
schedule was used. Once the rat completed the peripheral nose poke, all LEDs were
extinguished. A reinforcer was delivered in accordance with the particular reward schedule at the
rear of the chamber. In order to minimize temporal dynamics in the mechanisms of reward
expectation, rats were permanently assigned to a particular positionreward mapping condition
throughout training and all experimental sessions. Specifically, for 5 of 10 rats, the left target
side was always worth 0.3 ml of reinforcer (3 x 0.1 ml dipper: large reward condition) and the
right target side was always worth 0.1 ml of reinforcer (1 x 0.1 ml dipper: small reward
condition). For the remaining 5 rats, the reward schedule was reversed, with the right target side
always delivering the large reward and the left target side always delivering the small reward.
Thus, before the experimental sessions started, a rat had acquired a fixed positionreward
association for the ARP task, but during any experimental session, it was impossible for the rat to
predict on a particular trial whether the target side would actually correspond to the position
associated with the large reward.
Analysis of Variance and LATER Analysis
For each rat, mean RTs were computed for each of the 2 x 4 conditions on the basis of the data
from 3 consecutive sessions, immediately following the 5 weeks of training. As preliminary
Lauwereyns & Wisnewski Rewardoriented bias 10
analyses showed no effects from the order of the sessions, the data from the 3 sessions were
combined. The mean RTs for each rat were then submitted to a repeated measures analysis of
variance (ANOVA) with Reward (Large or Small) and Distracters (0, 1, 2, or 3) as within
subject variables. The same analysis was also performed for break times, and for the number of
trials completed.
The same data were used for LATER analysis, following the framework proposed by
Carpenter and colleagues (Carpenter, 2004; Carpenter & Williams, 1995; Reddi & Carpenter,
2000). It is suggested that reaction times obey a simple stochastic law: The reciprocal of latency
follows a Gaussian distribution. Plotting cumulative latency distributions on a probit scale as a
function of reciprocal latency (a reciprobit plot) should therefore yield a straight line. The
LATER model postulates a decision signal S associated with a particular response. When an
appropriate stimulus appears, S starts to rise linearly from an initial level S0 at a rate r; when it
reaches a prespecified threshold ST, the response is triggered. If the variation of r is Gaussian
with mean μ and variance σ 2 , the reaction time is (ST S0)/r on any one trial and its distribution
will fall on a straight line on the reciprobit plot. This straight line will have a median of (ST
S0)/μ. It will intercept the infinity axis at I = μ/(σ√2), a value that is independent of ST and S0.
According to this model, varying the amount or quality of information for stimulus
discrimination would affect the rate r, as when perceptual sensitivity would lead to an improved
processing of stimuli associated with reward. On the other hand, a rewardoriented bias in this
scheme would be the same as a change in the initial level S0 so that the distance to the threshold
level would be smaller in case the action is associated with a large reward.
The appeal of the LATER model derives from its clear quantitative prediction about what
should happen under these two cases. If reward expectation leads to a rewardoriented bias, that
Lauwereyns & Wisnewski Rewardoriented bias 11
is, an elevation of S0, the reciprobit plot should swivel about a fixed infinitetime intercept, I.
This follows from the LATER model since I is determined by the parameters of μ and σ, but not
S0. In other words, the plot should show a shallower slope for the distribution of reaction times
in trials with a large reward than in trials with a small reward. In contrast, if the change in
reaction time is due to improved perceptual processing with a large reward as compared to a
small reward, the change should be reflected in r, and so the line on the reciprobit plot would
undergo a parallel shift, the slope remaining constant. Thus trials with a large reward would
merely be shifted to the left, toward shorter reaction times.
The LATER analysis was conducted on the aggregated data from all rats, as well as on
the data from each rat individually. To evaluate the predictions from the LATER model
statistically, we computed the slope of each linear leastsquares fit (i.e., the reciprobit line) for
each rat, and submitted these slopes to a repeated measures ANOVA with Reward (Large or
Small) and Distracters (0, 1, 2, or 3) as withinsubject variables. The hypothesis of sensitivity
predicted no differences in the slopes, whereas the hypothesis of bias predicted shallower slopes
for responses associated with a large reward than for responses associated with a small reward.
Lauwereyns & Wisnewski Rewardoriented bias 12
Results
The 10 rats completed a total of 4,624 trials, or an average of 154 trials per 30 min session. A
repeated measures ANOVA, with the factors of Reward and Distracters as withinsubjects
variables, on the number of trials completed showed no significant differences, all F values < 1.
This result confirms that the current paradigm maximizes variability in the domain of latency,
without any possibility for speedaccuracy tradeoff. The same repeated measures ANOVA was
performed with break time as dependent measure, that is, the time at which the rat breaks
fixation from the central hole following the onset of peripheral stimulation. The repeated
measures showed no significant effects, all F values < 1.8. The average break time for all 8 types
of trial was 206 ms, with a standard deviation of 72.5. All remaining analyses were therefore
concentrated on reaction times.
ANOVA on Mean RT
The mean RTs and standard deviations are presented in Figure 2a. A repeated measures ANOVA
on RT showed that there was a highly significant effect of the factor Reward, F(1,9) = 229.08,
MSE = 183614, p < .001, with faster reaction times in the direction associated with a large
reward (636 ms) than in the direction associated with a small reward (2086 ms). There was also a
very reliable main effect of the factor Distracters, F(3,27) = 41.27, MSE = 43371, p < .001, with
slower reaction times as the number of distracters increased: 929 ms for 0 distracters, 1348 ms
for 1 distracter, 1548 ms for 2 distracters, and 1620 ms for 3 distracters. Finally, there was also a
highly significant interaction between Reward and Distracters, F(3,27) = 13.35, MSE = 53274, p
< .005.
Lauwereyns & Wisnewski Rewardoriented bias 13
To gain further insights in the nature of the interaction, we conducted repeated measures
ANOVAs on the data for small and largerewards separately, using the number of distracters as
the single factor. With the data from the largereward conditions, the effect of Distracters was
not significant, F < 1. With the data from the smallreward conditions, the effect of Distracters
was statistically reliable, F(3,27) = 39.97, MSE = 86665, p < .001. Posthoc Tukey HSD tests
with alpha at .05 showed that, in smallreward trials, reaction times were faster without
distracters (1277 ms) than in any of the 3 types of smallreward trial with distracters (2044 ms
for 1 distracter, 2414 ms for 2 distracters, and 2611 ms for 3 distracters). The smallreward
condition with 1 distracter also produced significantly faster reaction times than the conditions
with 2 or 3 distracters. There was no significant difference in the reaction times between the
conditions with 2 versus 3 distracters.
LATER Analysis
The aggregated reciprocal reaction time data from all 10 rats are plotted in the form of
cumulative percentage probability, on a probit scale, in Figure 2b. A total of 4,624 individual
trials are plotted separately for each of the 8 conditions, along with the linear leastsquares fit.
The four distributions from conditions with a large reward (Figure 2b, data in black, indicated as
‘a’) appeared to have a shallower slope than the distributions from conditions with a small
reward (Figure 2b, data in gray, indicated as ‘b’ and ‘c’). Among the smallreward conditions,
the distribution from the condition without distracters (‘b’) appeared to have a shallower slope
than the three distributions from conditions with distracters (‘c’).
A repeated measures ANOVA on the slopes of the reciprobit lines for each condition, for
each rat, showed a significant main effect of the factor Reward, F(1,9) = 50.85, MSE = 1576694,
Lauwereyns & Wisnewski Rewardoriented bias 14
p < .001, with shallower slopes in the direction associated with a large reward (1424) than in the
direction associated with a small reward (3427). There was also a reliable main effect of the
factor Distracters, F(3,27) = 20.80, MSE = 333979, p < .001, with shallow slopes for conditions
with less than two distracters: 1723 for 0 distracters, 2156 for 1 distracter, 2946 for 2 distracters,
and 2877 for 3 distracters. Finally, there was also a significant interaction between Reward and
Distracters, F(3,27) = 31.87, MSE = 308667, p < .001.
To gain further insights in the nature of the interaction, we conducted repeated measures
ANOVAs on the data for small and largerewards separately, using the number of distracters as
the single factor. With the data from the largereward conditions, the effect of Distracters was
not significant, F < 1. With the data from the smallreward conditions, the effect of Distracters
was statistically reliable, F(3,27) = 32.51, MSE = 502993, p < .001. Posthoc Tukey HSD tests
with alpha at .05 showed that, in smallreward trials, slopes were shallower without distracters
(1747) than in any of the 3 types of smallreward trial with distracters (3119 for 1 distracter,
4502 for 2 distracters, and 4339 for 3 distracters). The smallreward condition with 1 distracter
also produced significantly shallower slopes than the conditions with 2 or 3 distracters. There
was no significant difference in the slopes between the conditions with 2 versus 3 distracters.
Lauwereyns & Wisnewski Rewardoriented bias 15
Discussion
Ten rats performed in the nosepoke paradigm with a single spatialchoice task under an
asymmetric reward schedule. In sessions of 30 min, the rats were able to complete an average of
more than 150 nosepoke responses, providing data for reactiontime analysis with sufficient
statistical power not only to observe significant differences between means of distributions, but
also to consider the shapes of distributions. From a logistic viewpoint, then, the current paradigm
may be particularly appealing for investigations such as those in the areas of neurophysiology
and psychopharmacology, which require the collection of the largest possible amount of data in
short time periods.
Over and above this practical merit, however, the current paradigm enables researchers to
address theoretical questions on the mechanisms that underlie rewardoriented behavior.
Replicating previous studies using spatial choice tasks under asymmetric reward schedules with
monkeys (Watanabe K. et al., 2003; Watanabe M. et al., 2001), we found that rats responded
faster in trials with a large reward than in trials with a small reward. In addition, by varying the
number of distracters, we obtained a conspicuous interaction effect between the level of reward
and the number of distracters: In trials with a large reward, reaction times were unaffected by the
number of distracters, whereas in trials with a small reward, reaction times increased with more
distracters. Particularly the absence of a distracter effect in largereward trials is consistent with
the hypothesis that rats were biased to respond to the largereward side. The result suggests that
the rats’ reaction times were at maximum speed in the direction associated with a large reward,
regardless of the level of visual stimulation on the other side.
Lauwereyns & Wisnewski Rewardoriented bias 16
The hypothesis of rewardoriented response bias was corroborated by the LATER
analysis. The slopes of the reciprobit lines were shallower for the distributions from largereward
conditions than for those from smallreward conditions. Thus, the lines appeared to swivel, as
was predicted by a change in the starting point, S0, of the decision signal, not a change in the rate
r of the linear rise to the decision threshold. This result again suggests that the rats were biased
to respond to the largereward side. Taken together, then, the effects of reward and distracters on
mean RT and the shape of RT distributions make a strong case for the operation of a reward
oriented bias in the rats’ behavior.
As such, the current data may also shed new light on neurophysiological data that were
previously obtained with a similar asymmetric reward paradigm in monkeys (Lauwereyns et al.,
2002a). In that study, dorsal striatal (caudate nucleus) neurons increased their activity in advance
of a peripheral visual cue, but only when the contralateral side (i.e., the hemifield opposite to the
recording site) was associated with a large reward. It was argued that these neurons created a
rewardoriented spatial bias that was responsible for the reward effect as observed in the
monkeys’ spatial reaction times (i.c., eye movements). In terms of the LATER model, the
activity of the dorsal striatal neurons would represent the change in the starting point of the
decision signal. However, in the neurophysiological study no behavioral analysis was presented
to sustain the proposal that response bias produced the reward effect in reaction times. Instead,
the current paradigm succeeds in presenting such behavioral analysis. Thus, the present data
raise the question whether similar dorsal striatal activity may be the basis for the rewardoriented
bias observed in rats. This line of reasoning illustrates that the combination of behavioral
analytic techniques on the basis of reaction times with singleunit recording may lead to a fuller
Lauwereyns & Wisnewski Rewardoriented bias 17
understanding of how reward expectation influences brain mechanisms for decision making and
voluntary control of action.
The fact that reaction times in smallreward trials were affected by the number of
distracters, however, may be due to processes in addition to rewardoriented bias. Particularly
interesting in this regard is the observation that RTs in smallreward trials without a distracter
were markedly faster than RTs in smallreward trials with one or more distracters. One
possibility is that in smallreward trials a complementary mechanism was needed to counteract
response bias and initiate a movement in the direction associated with a small reward. Such a
complementary mechanism has already been recorded with an asymmetric reward paradigm in
neurons of the centromedian nucleus in the thalamus (Minamimoto, Hori, & Kimura, 2005; for
discussion in relation to response bias, see Lauwereyns, 2006). In the present nosepoke
paradigm with rats, it seems plausible that the complementary mechanism would be activated
faster in smallreward trials without a distracter than in smallreward trials with one or more
distracters. In the nodistracter case, the visual stimulation on the smallreward side would
suffice to activate the complementary mechanism. When there is at least one distracter on the
largereward side, however, an additional perceptualdecision mechanism may be required to
confirm that the visual stimulation on the largereward side does not fit the profile of the target
side (i.e., four illuminated LEDs).
Since rats were not punished for poking their nose in different holes than the one defined
as the correct hole in the present paradigm, it is possible that, on a proportion of smallreward
trials, they first poked their nose into the hole adjacent to the center that is associated with a large
reward, particularly in smallreward trials with distracters. In this regard, it is also interesting to
note that, in smallreward conditions, the faster portions of the RT distributions appeared to
Lauwereyns & Wisnewski Rewardoriented bias 18
deviate from the reciprobit lines, consistent with previous observations using the LATER
analysis (Carpenter & Williams, 1995; Reddi & Carpenter, 2000). This component forms only a
small proportion of the whole, but is made conspicuous by the reciprobit plot that exaggerates
the first and last 5% of the cumulative distribution. Similarly, in largereward trials the shallower
slope of the main component reveals a third, longlatency component that may also be present in
smallreward trials. Again, all of these observations suggest that, in the present asymmetrical
reward paradigm, there might be other processes at work in addition to rewardoriented bias. A
promising line for future research will be to elucidate the behavioral and neurophysiological
mechanisms that are complementary to, or counteract, response bias. This can be done, for
instance, by comparing the current version of the paradigm with one that introduces punishments
or more complicated reward schedules depending on the rats’ spatial choice. In doing so,
however, the LATER analysis may become problematic as it cannot produce erroneous
decisions.
Further limitations, inherent to the LATER model, should be noted. For instance, the
assumption of a linear rise to threshold may be a particularly vulnerable one in many reallife
and even laboratory settings. Also, in the LATER model as presented here, perceptual processes
are not dissociated from response processes. Thus, claims that sensitivity effects pertain to
perceptual processes, and that bias effects pertain to response processes, remain unchecked in the
present data. It will be a continuing task, then, to search for models that best fit reactiontime
distributions in different experimental situations that implicate different behavioral processes (for
an enlightening overview of existing RT models, see Smith & Ratcliff, 2004). Nevertheless, the
LATER model has an undeniable appeal because of its simplicity, and the relative ease with
Lauwereyns & Wisnewski Rewardoriented bias 19
which it can be translated into predictions with respect to underlying neural mechanisms. Thus, it
may be a fruitful starting point for such investigations.
With the above caveats in mind, the current data with an asymmetric reward paradigm do
succeed in implicating a rewardoriented bias mechanism in the advantage of largereward trials
over smallreward trials. Neurophysiological investigations may already benefit from the current
version of the paradigm, by correlating putative neural signals of response bias with RTs in
largereward trials. More generally, the current data extend an invitation to researchers in the
field of behavioral analysis and neurophysiology to consider the statistical and computational, as
well as the logistic, advantages of studying reactiontime distributions in nosepoke tasks with
rats.
Lauwereyns & Wisnewski Rewardoriented bias 20
Acknowledgments
We thank Dave Harper, Debbie Whare, Doug Drysdale and Richard Moore for technical
assistance. The research was supported by grant 0313PG of the Neurological Foundation of
New Zealand and grant 04VUW052 of the Royal Society of New Zealand Marsden Fund.
Correspondence concerning this article may be sent to J. Lauwereyns, School of Psychology,
Victoria University of Wellington, P.O. Box 600, Wellington 6006, New Zealand (
Lauwereyns & Wisnewski Rewardoriented bias 21
References
Carpenter, R. H. S. (2004). Contrast, probability, and saccadic latency: Evidence for
independence of detection and decision. Current Biology, 14, 15761580.
Carpenter, R. H. S., & Williams, M. L. L. (1995). Neural computation of log likelihood
in control of saccadic eye movements. Nature, 377, 5962.
Dickinson, A., & Balleine, B. (1994). Motivational control of goaldirected action.
Animal Learning & Behavior, 22, 118.
Gallistel, C. R., Mark, T. A., Adam, P., & Latham, P. E. (2001). The rat approximates an
ideal detector of changes in rates of reward: Implications for the law of effect. Journal of
Experimental Psychology: Animal Behavior Processes, 27, 354372.
Gharib, A., Gade, C., & Roberts, S. (2004). Control of variation by reward probability.
Journal of Experimental Psychology: Animal Behavior Processes, 30, 271282.
Gold, J.I., & Shadlen, M. N. (2001). Neural computations that underlie decisions about
sensory stimuli. Trends in Cognitive Sciences, 5, 1016.
Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation.
Science, 274, 427430.
Kobayashi, S., Lauwereyns, J., Koizumi, M., Sakagami, M., & Hikosaka, O. (2002).
Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex.
Journal of Neurophysiology, 87, 14881498.
Lauwereyns, J. (2006). Voluntary control of unavoidable action. Trends in Cognitive
Sciences, 10, 4749.
Lauwereyns & Wisnewski Rewardoriented bias 22
Lauwereyns, J., Watanabe, K., Coe, B., & Hikosaka, O. (2002a). A neural correlate of
response bias in monkey caudate nucleus. Nature, 418, 413417.
Lauwereyns, J., Takikawa, Y., Kawagoe, R., Kobayashi, S., Koizumi, M., Coe, B.,
Sakagami, M., & Hikosaka, O. (2002b). Featurebased anticipation of cues that predict reward in
monkey caudate neurons. Neuron, 33, 463473.
Luce, R. D. (1986). Response Times: Their Role in Inferring Elementary Mental
Organization. London, UK: Oxford University Press.
Minamimoto, T., Hori, Y., & Kimura, M. (2005). Complementary process to response
bias in the centromedian nucleus of the thalamus. Science, 308, 17981801.
Pan, WX., Schmidt, R., Wickens, J.R., & Hyland, B.I. (2005). Dopamine cells respond
to predicted events during classical conditioning: Evidence for eligibility traces in the reward
learning network. Journal of Neuroscience, 25, 62356242.
Platt, M. L., & Glimcher, P.W. (1999). Neuronal correlates of decision variables in
parietal cortex. Nature, 400, 233238.
Pratt, W.E., & Mizumori, S.J. (2001). Neurons in rat medial prefrontal cortex show
anticipatory rate changes to predictable differential rewards in a spatial memory task.
Behavioural Brain Research, 123, 165183.
Reddi, B. A. J., & Carpenter, R. H. S. (2000). The influence of urgency on decision time.
Nature Neuroscience, 3, 827830.
Robbins, T.W. (2002). The 5choice serial reaction time task: Behavioural pharmacology
and neurochemistry. Psychopharmacology, 163, 362380.
Lauwereyns & Wisnewski Rewardoriented bias 23
SchmitzerTorbert, N., & Redish, A.D. (2004). Neuronal activity in the rodent dorsal
striatum in sequential navigation: Separation of spatial and reward responses on the multiple T
task. Journal of Neurophysiology, 91, 22592272.
Schultz, W., Dayan, P., & Montague, P.R. (1997). A neural substrate of prediction and
reward. Science, 275, 15931599.
Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions.
Trends in Neurosciences, 27, 161168.
Ward, N.M., & Brown, V.J. (1996). Covert orienting of attention in the rat and the role of
striatal dopamine. Journal of Neuroscience, 16, 30823088.
Watanabe, K., Lauwereyns, J., & Hikosaka, O. (2003). Effects of motivational conflicts
on visually elicited saccades in monkeys. Experimental Brain Research, 152, 361367.
Watanabe, M., Cromwell, H. C., Tremblay, L., Hollerman, J. R., Hikosaka, K., &
Schultz, W. (2001). Behavioral reactions reflecting different reward expectations in monkeys.
Experimental Brain Research, 140, 511518.
Lauwereyns & Wisnewski Rewardoriented bias 24
Figure legends
Figure 1. A reactiontime paradigm for rats. a) Two alternative hypotheses for the
mechanism that underlies the effect of reward magnitude on reaction time. Decision making is
conceived as a linear rise of a “decision signal” (indicated as a gray line) to a “decision
threshold” (m, for a movement to the position associated with a large reward; –m, for a
movement to the position associated with a small reward). According to the hypothesis of
“sensitivity” (left panel), faster reaction times for positions associated with a large reward as
compared to a small reward would be due to changes in the slope of the decision signal (r, for a
decision associated with a large reward; r’, for a decision associated with a small reward).
According to the hypothesis of “bias” (right panel), faster reaction times for positions associated
with a large reward as compared to a small reward would be due to a positive bias (b > 0), which
brings the decision signal closer to the decision threshold m, but further away from the decision
threshold –m, even before the onset of the peripheral target. b) Schematic representation of the
sequence of events in a single trial. The trial started with the onset of the center LED. The rat
was required to poke its nose in the corresponding hole, and stay in this position for 500 ms. At
this time, the peripheral stimulation was presented and the center LED was extinguished. The
trial ended when the rat poked its nose and stayed for 200 ms in the hole adjacent to the center
hole, corresponding to the side where four LEDs were illuminated. Break time was defined as
the time duration between onset of peripheral stimulation and the moment when the rat broke
away from fixation in the center hole. Reaction time was defined as the time duration between
onset of peripheral stimulation and the moment when the rat poked its nose in the correct
response hole, provided that it stayed there for 200 ms.
Lauwereyns & Wisnewski Rewardoriented bias 25
Figure 2. Data obtained with the reactiontime paradigm for rats. a) Mean reaction times
(ms) and standard deviations as a function of the number of distracters (abscissa) and the reward
magnitude (large reward, black line; small reward, gray line). RTs were faster for large reward
than for small reward. In largereward trials, RTs were unaffected by distracters, whereas
distracters produced an increase in RT in smallreward trials. b) Reciprobit plot of reaction
times according to the LATER model (i.e., plotting the cumulative RT distributions on a probit
scale as a function of reciprocal RT). For each of the 8 conditions (2 Reward levels x 4 Distracter
levels), the actual data points (marked as small “+” symbols) are superimposed on the least
squares fit line. The largereward conditions are shown in black; the smallreward conditions are
shown in gray. Three groups of data distributions can be distinguished. Group a consists of the
four largereward conditions (i.e., with 0, 1, 2, or 3 distracters), and thus shows four leastsquares
fit lines (and their actual data points). These four data distributions are not significantly different
from each other. Group b contains only the smallreward condition without distracters; it shows
just one leastsquares fit line and its actual data points. This data distribution is significantly
different from all other smallreward conditions. Finally, Group c consists of the three remaining
smallreward conditions (i.e., with 1, 2, or 3 distracters); shown in this group are three least
squares fit lines and their actual data points. The differences between the distributions are
consistent with swiveling rather than parallel shifts. This suggests that the effects in RT are due
to changes in the starting point (or b) rather than the slope (r or r’) of the linear rise to threshold.
Lauwereyns & Wisnewski Rewardoriented bias 26
Lauwereyns & Wisnewski Rewardoriented bias 27