+ All Categories
Home > Documents > Saving performance and cognitive abilities

Saving performance and cognitive abilities

Date post: 13-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
37
Saving Performance and Cognitive Abilities by T. Parker Ballinger, Eric Hudson, Leonie Karkoviata and Nathaniel T. Wilcox * Abstract Experiments on saving behavior reveal substantial heterogeneity of performance. We show that this heterogeneity is reliable and examine several potential sources of it, including cognitive ability and personality measures. The strongest predictors of performance are two cognitive ability measures. We conclude that complete explanations of heterogeneity in dynamic decision making requires attention to complexity and individual differences in cognitive constraints. First Draft: June 2005 This Draft: February 2006 JEL Classification Codes: C91, D91, E21. Keywords: Bounded Rationality, Cognitive Ability, Heterogeneity, Performance, Saving * Ballinger—Department of Economics and Finance, Stephen F. Austin State University, Nacogdoches, TX 75962-3009. Hudson—School of Law, Loyola University New Orleans, New Orleans, LA 70118. Karkoviata and Wilcox—Department of Economics, University of Houston, Houston, TX 77204-5019. This research was supported by grants from the National Science Foundation with award numbers 0350748 (Ballinger) and 0350565 (Wilcox), as well as by the University of Houston Research Council. We are grateful to Randall Engle and Richard Heitz of the Attention and Working Memory Lab at Georgia Tech for providing us with their “automatic operation span” software (and advice from D. Stephen Lindsay that led us to them). Bram Cadsby, Glenn Harrison, Ondrej Rydval, Tomomi Tanaka and Mark Thompson provided help or useful commentary, and Sharon O’Donnell and Rick Wilson provided valuable advice on software programming. None of these people are responsible for any errors that remain.
Transcript

Saving Performance and Cognitive Abilities

by

T. Parker Ballinger, Eric Hudson, Leonie Karkoviata and Nathaniel T. Wilcox*

Abstract Experiments on saving behavior reveal substantial heterogeneity of performance. We show that this heterogeneity is reliable and examine several potential sources of it, including cognitive ability and personality measures. The strongest predictors of performance are two cognitive ability measures. We conclude that complete explanations of heterogeneity in dynamic decision making requires attention to complexity and individual differences in cognitive constraints.

First Draft: June 2005

This Draft: February 2006 JEL Classification Codes: C91, D91, E21. Keywords: Bounded Rationality, Cognitive Ability, Heterogeneity, Performance, Saving

*Ballinger—Department of Economics and Finance, Stephen F. Austin State University, Nacogdoches, TX 75962-3009. Hudson—School of Law, Loyola University New Orleans, New Orleans, LA 70118. Karkoviata and Wilcox—Department of Economics, University of Houston, Houston, TX 77204-5019. This research was supported by grants from the National Science Foundation with award numbers 0350748 (Ballinger) and 0350565 (Wilcox), as well as by the University of Houston Research Council. We are grateful to Randall Engle and Richard Heitz of the Attention and Working Memory Lab at Georgia Tech for providing us with their “automatic operation span” software (and advice from D. Stephen Lindsay that led us to them). Bram Cadsby, Glenn Harrison, Ondrej Rydval, Tomomi Tanaka and Mark Thompson provided help or useful commentary, and Sharon O’Donnell and Rick Wilson provided valuable advice on software programming. None of these people are responsible for any errors that remain.

1

Introduction

Experiments on dynamic saving behavior with income uncertainty reveal apparently

marked departures from rational behavior. But they also reveal enormous heterogeneity of

behavior (Hey and Dardanoni 1988; Ballinger, Palumbo and Wilcox 2003; Carbone 2004).1 We

ask whether differences in cognitive abilities explain this variation, while controlling for several

other motivational and personality differences. We find that measures of cognitive abilities are

the strongest factor in explaining individual differences in experimental saving task performance.

Two decades ago, the predominant psychological perspective on decision making was the

one originally called “cognitive.” This view held that constraints on the information processing

capacities of humans, rather than their emotional and motivational machinery or preferential

dispositions, could explain most human deviations from canonical theories of rational reasoning,

statistical inference and decision making (Nisbett and Ross 1980). This was the psychology of

heuristic decision and judgment algorithms, mental models and perceptual hard-wiring. Older

psychological traditions, though, placed an emphasis on emotional and motivational causes, and

these causes are enjoying a resurgence not only in psychology but in behavioral economics as

well (Loewenstein 1996; Slovic et al. 2002; Loewenstein and Lerner 2003).

Behavioral or neoclassical, many economists reveal a preference for preference-based

explanations of seemingly anomalous behavior. By and large, theories of anomalies in risky

and/or intertemporal choice, as well as in games, reformulate preferences (e.g., Quiggin 1982;

Loewenstein and Prelec 1992; Fehr and Schmidt 1999), rather than model failures of

optimization, systematic judgment errors or limited depth of reasoning stemming from cognitive

constraints. To be sure, the latter approach does appear in the work of behavioral and

1 The only exception we know of is Camerer and Chua (2005), who use subjects drawn from the student populations of two of the most selective universities on Earth. As will be clear, this matches our expectations.

2

experimental economists (e.g., Friedman 1989; Stahl and Wilson 1995; Gabaix and Laibson

2000), but we believe that judging simply by numbers of studies, it is fair to say that

motivational and emotional reasons (usually in the form of new preference specifications) are the

most widely studied explanations of anomalous behavior even among these scholars.

Experiments in both economics and psychology tend to use very simple decision

situations to study choice under risk and over time. This has obvious merits. For instance, subject

confusion is kept to a minimum; and relatively simple lab decision tasks might be ideal for

revealing motivational, emotional and preferential differences across subjects precisely because

complexity has been “designed out” as a causal factor. Of course, this is not obvious: Decision

making skill may be primarily adapted to handle field environments with specific complexities

and/or institutional features, and may function poorly in lab situations stripped of these (Winkler

and Murphy 1973; Dyer and Kagel 1996; Harrison and List 2004). But suppose for the sake of

argument that this is a minor issue. It is still true that if cognitive abilities vary substantially, and

field environments are characterized by relatively complex decision problems, a significant part

of the variance of field behavior and performance could be due to variation in cognitive

constraints—even if (as is true) preference-based explanations for anomalous behavior and its

variance across subjects frequently seem successful in simple laboratory tasks.

We have no doubt that preference-based explanations for seemingly anomalous decision

making are useful, but we smell danger in relying on them to the near exclusion of computational

ones. The danger is most pronounced for certain varieties of economic decision making, such as

consumption and saving, where doing descriptive justice to the field situation of agents requires

famously complex and computationally burdensome models. Yet as far as sources of behavioral

heterogeneity go, increased task complexity may produce ironic results. It may cause an

3

increased propensity to give up on conscious analysis and deliberation and fall back on affective

and emotional reactions or other largely automatic and unconscious processes. Or, it might

encourage more agents to employ similar simplification procedures. In either case, increased

complexity could actually decrease the explanatory force of differences in cognitive abilities and

enhance the predictive force of emotional, affective and preferential differences.

Our specific theoretical perspective is that people have (at any instant) a set of “behavior

rules” (Conlisk 1980), an “algorithm set” (Wilcox 1993) or stock of “cognitive capital” (Camerer

and Hogarth 1999) which places constraints on what they can achieve in decision making.2 We

refer collectively to such limits on cognition as cognitive constraints, and think of them as

(perhaps imperfectly) measured by various tests of cognitive abilities.3 Our initial empirical

inspiration came from Stanovich and West (2000), who argued that relationships between task

performance and individual differences in cognitive ability (for them, SAT scores) inform

debates about rationality in unique ways; it ended with the extensive contemporary literature on

“working memory span” in cognitive psychology (Kane et al. 2004).

We focus on performance in a structurally simple but computationally challenging saving

task, and ask whether cognitive abilities explain its variance. Because cognitive abilities may be

correlated with various motivational, emotional and preferential differences that may also

account for heterogeneity in saving behavior, we simultaneously attend to several potentially

relevant personality characteristics, to guard against the possibility that cognitive ability

measures are just instruments for these kinds of differences. Even controlling for these, however,

2 For a survey, see Conlisk (1996); for a discussion of outstanding research questions, see Rydval (2003). 3 There are many different mental capacities, “modules” and so forth that interact in complex ways, so cognitive constraints are best regarded as a vector. Some elements of this vector may resemble augmentable stocks of capital, and hence may be time-varying; others could resemble more or less fixed endowments. We assume that the vector varies across individuals at any instant, but take no position on the source of this variation (e.g., “nature versus nurture” issues) since this is mostly unnecessary here, though we return to the possibility of augmentable stocks of skill in our conclusions.

4

we find a robustly significant effect of measured cognitive abilities on saving performance. This

suggests that complete explanations of heterogeneity in saving behavior will need to consider the

interaction between the complexity of saving and heterogeneous cognitive abilities. We conclude

with some remarks on potential policy implications and examples of other experimental and

behavioral research where measures of working memory may be empirically useful.

I. The Saving Task and Experimental Design.

We examine a 20-period version of the “standard additive model” (Browning and Lusardi

1996) of the life-cycle consumption-and-saving task. Such models assume that agents choose

real consumption ct to maximize their expected discounted utility, subject to an intertemporal

budget constraint, in every period t of the 20-period task, written as follows:

)()(max20

10∑

+=≥+

tjjtt

ccuEcu

t

subject to jjj cXA −=+1 ∀ j = t, t+1,…19, given Xt = At + yt , (1)

where u(ct) is the utility of real consumption in period t, At is the real value of assets accumulated

prior to period t (A1 given exogenously), and yt is real exogenous income, which is imperfectly

predictable in earlier periods and is realized at the start of each period t (prior to choosing ct), and

Xt = At + yt is period t “cash-on-hand.”

A saving task based on the maximand in equation (1) involves no discounting of future

utility nor any intertemporal growth of assets, which are both usually part of such models. This

makes the basic task features and demands relatively transparent and easy to explain to subjects

(though good performance can still be made challenging; more on this shortly). It also puts the

focus on “precautionary” motives for saving, which have attracted much theoretical attention for

various empirical reasons (Browning and Lusardi 1996). In precautionary saving tasks, optimal

asset accumulation depends monotonically (ceteris paribus) on future income variability. Utility

5

functions u(ct) with a convex marginal utility of income (Kimball 1990) and/or “strict borrowing

constraints” At ≥ 0 ∀ t (Carroll 1997; Deaton 1992) yield precautionary motives; our task has

(mostly) both features.4 We use a simple i.i.d. binomial income process for yt such that yt is either

zero or six ECUs (experimental currency units), with equal probability; again, this very simple

process enhances the surface transparency of the task and simplifies instruction.

In spite of its surface simplicity, it is well-known that optimal solutions to such problems

generally have no closed-form analytic solution and computational methods are required to find

optimal solutions. Deaton (1992) provides a useful summary of those methods, which we use to

perform analyses described below. Since finding optimal solutions to these problems is so

computationally burdensome, we think it unlikely that subjects will find optimal consumption

policies. Subjects do well by constructing rules of thumb based on suitable but decidedly

heuristic reasoning about the general nature of the task, by avoiding typical biases of

probabilistic judgment (e.g., the gambler’s fallacy), and perhaps by controlling the influence of

automatic affective reactions to the task environment—not by solving it optimally, which they

almost certainly cannot do. In other words, we do not expect optimal performance. In fact, since

we want to measure performance and see what explains its variability, we do not want the

potential performance of the most able subjects to be censored by a task that is too easy for them.

Therefore we choose the utility function u(ct) and income process so that we expect it to

challenge almost all subjects, given prior knowledge of subject behavior in similar tasks. This

also creates substantial opportunity costs from employing policies similar to what has been

observed in past studies to make certain that motivation for improvement is strong. To explain

this, we introduce some notation and concepts. Let Y = {y1, y2,…y20} denote any particular 20-

4 In fact, our experimentally induced marginal utility function is not uniformly convex, but in the large is mainly so. This and the strict borrowing constraints combine to create a substantial precautionary motive for saving.

6

period income sequence, and define U(Y|τ) as the ex post total utility earned by a policy that is ex

ante optimal for a τ-period saving task when that policy is applied to Y, where τ ≤ 20. Thus

U(Y|20) is the ex post value of the sum of u(ct) across all 20 periods t, given that the ex ante

optimal policy is employed and the income stream is Y. Notice that when τ < 20, the optimal

policy for a τ-period task is not optimal for the 20-period task we use here. But because it is

simpler to plan for fewer periods than actually remain, such “myopically optimal” planning

characterizes a class of boundedly rational consumption policies. Ballinger, Palumbo and Wilcox

(2003) found that subjects who had no benefit of previous experience seemed to plan ahead

about two periods (which is optimal for a 3-period game). Thus we regard U(Y|3) as a forecast of

the ex post total utility achieved by an inexperienced subject drawn from the same subject

population5 in this task, given income sequence Y. Also, note that U(Y|1) is the ex-post total

utility of a policy that saves nothing, spending everything it has in every period.

Let Us(Y) denote the total utility achieved by subject s given income sequence Y. Our

reward scheme for subjects is based on this performance measure (in ways described shortly):

Ps(Y) = )1()20(

)1()(

YUYU

YUYU s

−−

. (2)

Noting again that U(Y|3) is a forecast of Us(Y) for average inexperienced subjects in Ballinger,

Palumbo and Wilcox (2003), our performance forecast for average inexperienced subjects is:

PF(Y) = )1()20(

)1()3(

YUYU

YUYU

−−

. (3)

5 The subject pool used in Ballinger, Palumbo and Wilcox (2003) is identical to the one we use for first and third samples. The second sample consists of students at a somewhat less selective university, so U(Y|3) might perhaps be somewhat above the expectation of Us(Y) there, but this also suits our purposes.

7

Our income process and utility function u(ct) were chosen (by numerical simulation methods) to

make the median value of this performance forecast (across all possible income streams Y)

relatively small (other things equal). This was meant to ensure that most subjects will have

something to learn, and an appreciable monetary incentive to learn it, as they repeat the saving

task and gain experience performing it. Here is the resulting utility function:6

ct 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ≥15

u(ct) 0 26 51 65 77 89 101 104 107 110 111 112 113 114 115 116

We calculated the performance forecast PF(Y) for each of 104 randomly generated

income streams Y, given this utility function and the income process described earlier. The

median value of PF(Y) across all these income streams is about 0.585. We selected a set M of 16

income streams from this simulated distribution with two properties: (1) all values of PF(Y) are

very close to this median simulated value, and (2) the distribution of total income ∑Y = ∑yt

across the chosen streams in the set resembles the true distribution of total income across all

income streams.7 We call these “moderate” (difficulty) income streams. Similarly, E and H are

sets of 16 “easy” and “hard” incomes streams, with the properties that (1) all values of PF(Y) are

very close to the 90th and 10th percentiles of the simulated distribution of PF(Y) (about 0.73 for Y

6 Ballinger, Palumbo and Wilcox (2003) used a discrete approximation of a normalized CRRA-like utility function with σ = 3, well within the range of prior estimates based on survey data. The utility function used here most resembles a CRRA function with σ ≈ 2, though only roughly. In spite of this difference, PF(Y) (which is based on Ballinger, Palumbo and Wilcox’s data) predicts median difficulty income stream performance in this experiment very well (see footnote 23 below). Experimentalists who are used to seeing estimates of σ ≈ 0.5 in their risky choice data should remember that those estimates are based on a utility function over monetary gains, rather than over absolute consumption levels (put differently, field estimates generally assume full asset integration). Greater curvature of the estimated function will be required in the latter case (since the scale of consumption levels is much larger than the scale of monetary gains in experiments) to generate local degrees of absolute risk aversion similar to what is observed in experiments. We are most concerned that our induced utility function resembles what has been estimated in field studies of consumption and saving, rather than what has been estimated in experiments. 7 The distribution of total income across streams in each set is: 1 stream with ΣY = 42; 2 streams with ΣY = 48; 3 streams with ΣY = 54; 4 streams with ΣY = 60; 3 streams with ΣY = 66; 2 streams with ΣY = 72; and 1 stream with ΣY = 78. The set size (16 total income streams) was actually selected because this seemed to be the minimum size needed to even roughly approximate the true sampling distribution of total income in sequences of 20 income draws.

8

∈ E and 0.475 for Y ∈ H) and (2) total income in the streams are again chosen to roughly match

its distribution among all income streams. Note that these criteria for selecting income streams,

as well as the performance measure itself, are (in part) meant to mostly eliminate any likely

influence of ∑Y on measured performance. But these differences may have unanticipated

effects, which is why we balance them both within and between subjects by design.

Each subject s encounters five consecutive rounds of the saving task with four different

income streams—two different median income streams Ysm and Ysm25 from M, one hard income

stream Ysh from H, and one easy income stream Yse from E. The median income stream Ysm25 is

encountered by subject s in both her second and fifth rounds (subjects are not informed that an

income stream will be repeated). Let Psr(Y) denote the performance of subject s in round r, given

that income stream Y was encountered in round r. The difference Ls = Ps5(Ysm25) − Ps2(Ysm25) is

thus an unambiguous measure of learning by subject s, in terms of performance, between the

second and fifth rounds. Repeating the same income stream in the second and fifth rounds within

each subject, and using only income streams in M for those rounds across subjects, is meant to

reduce differences in income stream difficulty as a source of variance in the learning measure Ls.

The other three income streams Ysm, Ysh and Yse are encountered in the first, third and

fourth rounds by each subject s, with their order varied in one of the six possible ways (balanced

across subjects). This allows us to examine how income stream difficulty (as represented by

performance forecasts PF(Y)) and its sequencing affects performance levels and learning. We

also select income streams for each subject so that total income received across all five rounds is

300 (the expected value of the 100 draws of yt) to reduce total income history differences as a

source of variance in overall performance and learning across subjects. Across subjects, we

select the income streams so that over each sequence of 16 subjects, each income stream in M is

9

used exactly twice, and each income stream in H and E is used exactly once. To balance income

stream difficulty orders (of which there are 6) and use such complete sequences of 16 subjects,

we require 48 total subjects. We call each such balanced group of 48 subjects a “sample.”

We do not use first round performance as a dependent measure since subjects may still be

developing a full understanding of task features and requirements during the first round (that is,

some first round learning may simply be the establishment of experimental task salience), rather

than learning how to do the saving task better once it is fully understood (later, we will present

evidence suggesting this may be so). This is why the design repeats an income stream in the

second and fifth rounds, rather than the first and fifth rounds—we are purposely cautious about

measuring the magnitude of substantive learning. For pooled bivariate analysis of performance

levels, we use average performance across rounds two through five, Ps = [Ps2(Y) + Ps3(Y) +

Ps4(Y) + Ps5(Y)]/4, as “overall saving performance level” of subject s.8

We also use the performance measure Psr(Y) to reward subject s for her performance in

round r. We compare two ways to do this in our first sample of subjects. Let D be some dollar

amount available in each round, and let Psr*(Y) = min{1,max[0,Psr(Y)]}.9 One simple way to

reward subject s is to pay her Psr*(Y) ⋅ D for round r, and we call this the direct money method.

Another way is to pay her D with probability Psr*(Y) and zero otherwise, which is the well-

known binary lottery method. Theoretically, the latter method should be more incentive-

8 Because there is systematic variation across subjects in the difficulty of income streams presented in the third and fourth rounds, this is not a minimum variance measure of overall performance levels. As will be clear in panel analysis later, though, income stream difficulty (though highly significant) explains a relatively small amount of the variance in performance; so this does not turn out to be an empirically important matter here. 9 Although the expectation of Psr(Y) across all income streams is definitionally no greater than unity, suboptimal policies can by luck outperform the optimal policy for specific streams. For instance, consider a suboptimal policy that happens by luck to be “optimal with clairvoyance” for some particular stream Y/ (that is, it is optimal given perfect foresight of Y/): Obviously such a policy must outperform the ex ante optimal policy for stream Y/. This does happen (though rarely) in our sample, so truncation above unity is occasionally necessary. Truncation below zero occurs when Usr(Y) < U(Y|1). On the basis of past results we thought this would be very rare. In this study it occurs more than once for about 1 in 20 subjects and, as explained later, we omit such subjects from our statistical analysis.

10

compatible10 for any subject whose risk preferences are linear in probabilities (like expected

utility preferences) regardless of the subject’s risk attitude toward real currency (Berg et al.

1986). However, the binary lottery method is more complex (and so less transparent to subjects)

and also produces a higher variance of payoffs (which most subjects dislike) than the direct

money method. Moreover, the advertised incentive-compatibility of the binary lottery method is

empirically suspect (Millner and Pratt 1992; Selten, Sadrieh and Abbink 1999). We checked for

significant differences between the two methods in our first sample of subjects and, finding none,

used the direct money method in the three later samples.11

All subjects were run in individualized sessions that began with various cognitive tests

and/or survey item responses used to construct various personality scales. These potential

predictors of heterogeneity vary across samples (as described shortly): We used the first two

samples to select amongst promising measures, and then validated those we selected in the third

and fourth samples. All subjects received a flat $5.00 payment for showing up. Subjects in the

first (second, third and fourth) samples receive an additional flat $15.00 ($10.00) payment for

completing the measurement part of the protocol, which lasted about 90 to120 (60 to 75)

minutes, and then took a short break if desired. Saving task instructions were then presented (see

the Appendix), and subjects then completed five rounds of the task (all taking about 40-50

minutes). In the first sample, the available reward D per saving task was $8.00 ($7.00 in the

second, third and fourth samples). The implied opportunity cost of the behavior observed by

Ballinger, Palumbo and Wilcox (2003) is not trivial in this design. Subjects who perform as if

they plan optimally only two periods ahead will only earn about 58.5% of the available $35 ($40

10 By this, we mean that subjects should behave as if their utility function for ECUs spent on consumption are precisely the utility function we use, and hence should solve the maximization problem in equation (1). 11 Additionally, Ballinger, Palumbo and Wilcox (2003) use the binary lottery method and, as will be clear later (see footnote 23), median performance results in our study are remarkably similar to theirs.

11

in the first sample) across the five saving tasks, implying that they would be leaving $14.53

($16.60 in the first sample) on the table by persisting in such behavior. Upon finishing the saving

tasks, subjects wrote advice for a hypothetical future subject,12 were paid and then dismissed.

While our saving task implies substantial opportunity costs of median behavior observed

in similar experiments in the past, there are easily stated heuristic policies that perform quite well

in the saving task. For instance, consider this heuristic policy (written below as a computer

program) distilled from a single paragraph of advice written by a subject who did rather well:

If Xt ≤ 2 or t = 20, ct = Xt; Else do; If t ≤ 15, then do; If 3 ≤ Xt < 12, ct = 2; Else ct = 6; End; Else ct = Ceiling(Xt/2); End.

Across all income streams used in our experimental design, this policy has an average

performance of about 0.915—well above the median performance forecast of 0.585 based on the

results of Ballinger, Palumbo and Wilcox (2003). That is, some subjects can and do verbally

state relatively simple heuristic policies that are surprisingly good (cf. Thaler 1994), so good

performance in our saving task is not impossible and can be achieved by easily stated heuristics.

Table 1 lists the cognitive and personality measures collected in each sample. The “Beta

III” (Kellogg and Morten 1999) is the third generation of a general test of nonverbal cognitive

abilities originally developed during the First World War to screen literate and illiterate United

States military recruits on a more equal footing. In the first sample, we administered the entire

Beta III test. Results there suggested that the sum of its two analytical subtests (“picture

12 The advice-writing is in service of future studies planned by us. Except for the example discussed in the next paragraph, we do not analyze the written advice here.

12

absurdities” and “matrix reasoning”) was the best predictor of saving performance, so we

continued using these two subtests in the third and fourth samples.

The second test is the “Porteus Maze” (Porteus 1965), which requires subjects to thread a

pencil through mazes of increasing complexity without taking wrong turns. Some believe this

test measures both planning ability and impulse control, both of which may be relevant to saving

behavior. This test was a marginally significant predictor of saving performance in the first

sample, but less significant than the Beta III subtests; so we omitted it from later samples.

The third test is “Raven’s Standard Progressive Matrices Plus,” a visual pattern induction

test (Raven, Raven and Court 1998). The family of Raven matrix tests are widely used in

research on cognitive performance; they are regarded as measures of “fluid intelligence” or the

ability to learn about and adapt to novel situations or tasks. The “Plus” version of the test has an

extended sensitivity for distinguishing abilities in the upper 20% of the distribution of ability.

This test was a poor predictor of saving performance in our first sample, but it appeared that

many of these subjects “gave up” on the last third of the problems in this test—suggesting that

the “Plus” version was too difficult for most of those subjects. In the second sample, we

administered the fourth test—the “Raven SPM,” a simpler and briefer version of the Raven

“SPM-Plus” test. This test significantly explained variance in saving performance, but less

effectively than a “working memory span” test also administered in the second sample.

Therefore, we stopped using Raven tests after the second sample.

The fifth test—our last test of cognitive abilities—is a member of the class of tests

collectively known as “working memory span” or “WM span” tests. Specifically, it is a recently

developed computer-administered version of the “operation span” test of Turner and Engle

(1989). All working memory span tests have a common structure. The subject must remember

13

some collection of items (such as a sequence of briefly presented letters), while performing some

other “distractor” task involving conscious processing (such as ascertaining the truth value of

simple arithmetic equations appearing between each letter presented).13 These tests are regarded

as measuring the capacity for controlled attention and thought (Conway et al. 2005).

A large number of studies over the last two decades establish that WM span is a robust,

domain-general predictor of intelligent performance.14 WM span tests correlate more strongly

with performance in many different reasoning tests with widely varying surface features—such

as the Raven family of tests, the Tower of Hanoi puzzle (e.g., McDaniel and Rutström 2001), the

Beta III test and so forth—than performance in these same tests correlate with one another

(Engle et al. 1999; Engle and Kane 2004). WM span is also implicated in problem-solving

(Hambrick and Engle 2003) and some forms of learning (e.g., Reber and Kotovsky 1997). Based

on promising results in the second sample, we continued measuring WM span in later samples.

In addition to measures of cognitive abilities, we added an item-response survey to the

second, third and fourth sample protocols to measure several potentially relevant personality

characteristics. Each survey item is a statement: A subject decides whether the statement is

descriptive of himself or herself and selects one of four responses (completely false, mostly false,

mostly true or completely true). These are numerically coded with the integers 1,2, 3 and 4 in the

manner that reflects concordance with the characteristic being measured, and these integers are

13 The operation span test we use is in fact the test described parenthetically in this sentence. Sequences of letters are briefly presented, interleaved with simple arithmetic equations. Subjects must ascertain the truth value of each equation as it appears. At the conclusion of each sequence of letters, subjects are asked to recall the presented sequence of letters in correct order. This basic task is repeated many times for letter sequences of varying length; overall test scores are based on the number of letters correctly remembered in their correct serial locations after each basic task. For more details, see Conway et al. (2005). 14 Here, the word “domain” refers to specific features of the tasks that make up some test. In the Raven SPM+, for instance, the domain is visual pattern induction. In the Porteus Maze test, the domain is planning and executing paths through mazes. Part of test performance is almost always domain-specific because different subjects have different prior levels of ability and comfort with the specific features of different test domains. Cognitive

14

then summed across the relevant items to produce scales measuring the characteristic. We briefly

describe each scale and our reasons for measuring it below.

Subjects may vary in their intrinsic motivation to perform well in experimental tasks,

whether or not extrinsic motivators (usually, performance-contingent cash payment) are used.

This gives rise to interesting methodological questions examined elsewhere;15 for our purposes,

variation in intrinsic motivation could be a source of variance both in task performance and

measured cognitive abilities (perhaps especially the latter, since we do not provide any extrinsic

motivation for performance in them). Because of this, our first personality measure is an item-

response-based measure of the intrinsic motivation to engage in effortful thought called “need

for cognition” (Cacioppo et al. 1996), which is measured in the second, third and fourth samples.

Need for cognition is not cognitive ability. Some studies suggest that need for cognition

correlates only modestly, if at all, with cognitive abilities (e.g., Cacioppo, Petty and Morris

1983), but we do not want to confuse cognitive abilities with the intrinsic motivation to engage

in cognitively challenging tasks. Therefore, we selected twelve of the eighteen items

recommended by Cacioppo, Petty and Kao (1984) for the short version of their need for

cognition scale to measure this personality characteristic in the second, third and fourth

samples.16

psychologists are particularly interested in WM span precisely because it seems to explain performance in a domain-general manner, that is, across many different domains. 15 For instance, one may ask whether subjects’ ideas about what it means to “perform well,” and hence their goals, are the same as those of the experimenter, and whether extrinsic incentives are needed to better align their goals with the experimenter’s meaning of performance. This is an old, respectable view (Smith 1982) and many experimental tests of incentive effects are at least partially motivated by it (Camerer and Hogarth 1999). Even when the subject aims to do what the experimenter desires, there may be nontrivial interactions between extrinsic and intrinsic motivations that produce ironic results (Gneezy and Rustichini, 2000; McDaniel and Rutström 2001). Rydval (2003) discusses unanswered questions regarding interactions between cognitive capital, intrinsic motivation and extrinsic motivation in producing observed decisions. 16 Examples of the twelve items used are “I would prefer complex to simple problems,” “Thinking is not my idea of fun” and “I find satisfaction in deliberating hard and for long hours.” The response “completely true” would be numerically coded as “4” (high Need for Cognition) for the first and third statements, and as “1” (low Need for

15

Personality psychologists and clinicians have long regarded tendencies toward

procrastination and impulsiveness as potentially interesting personality characteristics, and scales

meant to measure these have a long history. At the same time, psychologists and behavioral

economists argue that procrastination and impulsiveness are outcomes of fundamental properties

of time preferences and/or the manner in which people weigh the present against the future when

making choices over time (Ainslie 1975; Thaler and Shefrin 1981; O’Donoghue and Rabin

1999). Procrastination scales are known to correlate negatively with need for cognition (Ferrari

1992), so a significant relationship between need for cognition and saving performance might

occur simply because need for cognition is an instrument for procrastination. Therefore, our

survey includes twelve items from a contemporary procrastination scale (Tuckman 1991).17

Whiteside and Lynam (2001) review past measures of impulsiveness and argue that

tendencies toward impulsive behavior actually arise from the interplay of several distinct

personality characteristics. On the basis of a new study examining a very large number of items

used to construct existing scales of impulsiveness, Whiteside and Lynam offer a new

“impulsiveness inventory” comprised of four subscales they call premeditation, sensation-

seeking, perseverance and urgency. In the second sample, we included the items used to measure

the first two of these subscales which, according to Whiteside and Lynam, are both highly

reliable, nearly orthogonal to one another, and somewhat correlated with the other two measures

(urgency and perseverance). Premeditation explained no variance in saving performance in the

second sample, but sensation-seeking did (though weakly). But to give these components of

Cognition) for the second statement. We deliberately omit items used by Cacioppo, Petty and Kao (1984) that seem to involve other personality characteristics of interest to us, using only twelve of their eighteen total items. 17 Examples of the twelve procrastination items are “I manage to find an excuse for not doing something,” “I put the necessary time into even boring tasks, like studying” and “I am an incurable time waster.” The response “completely true” would be numerically coded as “4” (high procrastination tendency) for the first and third statements, and as “1” (low procrastination tendency) for the second statement.

16

impulsiveness a good shot, we continued to measure them in the third and fourth samples, and

also added the perseverance and urgency scale items to our survey for good measure.18

Sensation-seeking is potentially interesting for reasons going beyond its contribution to

impulsiveness. It is positively correlated with the willingness to take many kinds of risks

(Zuckerman 1994) and has been found to explain risk-taking variance in some economics

experiments (e.g., Eckel and Wilson 2004). Additionally, sensation-seeking is known to be

positively correlated with need for cognition (Olson, Camp and Fuller 1984; Crowley and Hoyer

1989); therefore, including sensation-seeking in multivariate analyses could clarify the meaning

of any significant effect of need for cognition in those analyses.

Let us remove any ambiguities about the role these personality scales play in our

experimental design. The design uses an “induced” utility function—points are purchased with

experimental currency units according to a utility function we choose, and these points ultimately

pay the subject cash. Subjects who want more money should not (and mostly will not) act

directly on the basis of own “true” intertemporal preferences to make decisions in our saving

task. As a result, self-reported tendencies toward procrastination and impulsiveness may well

explain nothing in our experiment, even from the perspective of a behavioral economic theorist

who believes these things are caused by true intertemporal preferences. As a result, we do not

regard the procrastination and impulsiveness scales as a way to test behavioral theories of

intertemporal choice in this experiment.

18 Examples of the eleven premeditation items are “I like to stop and think things over before I do them,” “I don't like to start a project until I know exactly how to proceed” and “I tend to value and follow a rational, ‘sensible’ approach to things.” Examples of the twelve sensation-seeking items are “I’ll try anything once,” “I quite enjoy taking risks” and “I sometimes like doing things that are a bit frightening.” Examples of the ten perseverance items are “I generally like to see things through to the end,” “Unfinished tasks really bother me” and “I am a productive person who always gets the job done.” Examples of the twelve urgency items are “I have trouble controlling my impulses,” “When I am upset I often act without thinking” and “In the heat of an argument, I will often say things that I later regret.”

17

However, subjects might apply intertemporal decision making heuristics they use in real

life to our saving task. If those heuristics are adapted to serve their “true” intertemporal

preferences; and if those same preferences condition their tendencies toward procrastination and

impulsiveness in the manner suggested by various behavioral theories; then tendencies toward

procrastination and impulsiveness could well explain some differences in saving behavior in our

experiment. Our intent here is to take this possibility seriously since, as mentioned above, such

tendencies are known to be correlated with need for cognition, and could also be correlated with

cognitive abilities. We do not wish to confuse these potential, possibly correlated sources of

heterogeneity in saving performance by omitting attention to any one of them. The personality

scales are not included to test emotional or preferential theories of anomalies in intertemporal

choice (those theories are about “true” preferences, not our induced laboratory preferences).

Rather (as with need for cognition) we include them as a precaution, so that any significant

explanatory power of cognitive abilities can be given a clearer interpretation.

The samples were collected at two sites between August 2003 and April 2005. The first

and third samples were collected at the University of Houston, while the second and fourth

samples where collected at Stephen F. Austin State University. At both sites, volunteer subjects

were recruited from the general undergraduate populations.

II. Results.

In both the bivariate and multivariate analyses which follow, we pool observations across

samples containing the same cognitive and personality measures. In the analyses, we ignore

measures that were discarded at various points in our sequence of samples (either because of

insignificance, or because other measures performed better). While no cognitive or personality

18

measure is common to all four samples, we begin with an examination of the data pooled across

all four samples to establish certain overall facts and treatment effects, as well as to examine

potential demographic effects such as gender, age and experimental site. The other three

“poolings” we examine are: (1) The first, third and fourth samples, the broadest pooling where

the analytical component of the Beta III test was measured; (2) The second, third and fourth

samples, the broadest pooling where WM span was measured as well as need for cognition,

procrastination and two of the impulsiveness subscales (premeditation and sensation-seeking);

and (3) The third and fourth (validation) samples, where both the analytical component of the

Beta III test and WM span were measured, as well as need for cognition, procrastination and all

of the impulsiveness scales (premeditation, sensation-seeking, perseverance and urgency).

Across the four samples, there were nine subjects (out of 192 total subjects) with negative

values of the average performance measure Ps. This reveals a failure of imagination on our part:

On the basis of earlier results (Ballinger, Palumbo and Wilcox 2003), we did not expect any

performance levels to be this poor.19 These subjects are a methodological problem: Almost all of

them failed to earn any money in two or more of the saving tasks. This implies that as often as

not, we did not have control over marginal motivations of these subjects, so methodological

reasons exist for excluding them from our analyses. Too, their round-to-round performance is so

erratic that, for instance, adding these nine subjects to the panel regressions described below

triples the estimated residual variance in those regressions. Therefore, we omit these subjects

(roughly 1 in 20 subjects) from all of our statistical analyses below.

19 These unexpectedly poor-performing subjects frequently used “save-and-binge” strategies that perform worse than simply spending all available assets in every period. Such subjects have not figured out that the concavity of the utility function u(ct) makes some effort at consumption smoothing a central feature of good saving strategies (“save and binge” strategies actually make consumption less smooth than it is if nothing is ever saved). This is a very deep and fundamental failure to grasp the essence of even minimally sensible saving policies, much less optimal ones.

19

Table 2 shows Pearson correlations between performance measures Psr in different

rounds r, illustrating two points. First, the correlations are quite strong across rounds two through

five (all significant at α = 0.0001), showing that between-subject performance differences are

reliable and, therefore, that there is plenty of potentially predictable variance of performance

across subjects. Second, correlations between first round performance and performance in later

rounds is noticeably weaker (though all significant at α = 0.01). First round performance is also

noticeably more variable than in later rounds. This suggests that subjects are still developing an

understanding of task features and demands during the first round, as we anticipated. So, as

originally planned, we omit first round data from all further analyses.

Mean performance Psr in rounds r = 2, 3, 4 and 5 is 0.59, 0.60, 0.65 and 0.64,

respectively. The mean and standard deviation of overall performance Ps (average performance

across all four rounds) are 0.62 and 0.22, respectively; its minimum and maximum are 0.060 and

0.996. Thus there is great variability in subject performance, as expected on the basis of past

work (Ballinger, Palumbo and Wilcox 2003) and as hoped for our purposes. The mean of the

learning measure Ls (change in performance between the second and fifth rounds) is 0.052. This

is significantly different from zero (p = 0.0009 by a t-test and p = 0.0016 by a signed ranks test),

but it is a small effect (about a quarter of the standard deviation of Ps) and Ls is almost as

variable across subjects (standard deviation of 0.21) as are average performance levels.

Table 3 presents evidence on bivariate relationships between average performance Ps,

demographic variables and our measures of cognitive abilities and personality scales, in the form

of Spearman correlations. In all sample poolings, women perform signficantly worse than men,

but we caution that this significance vanishes in our multivariate analyses (to come shortly).

20

Under the broadest pooling containing most of the personality scales (the second, third

and fourth samples), need for cognition and sensation-seeking are significantly related to

performance. The positive relationship between need for cognition and performance is expected

if intrinsic motivation plays a role in performance. However, the positive correlation of

performance with sensation-seeking is not the expected one if sensation-seeking contributes to

impulsiveness and impulsiveness is a negative influence on saving performance. Note, however,

that these significant correlations vanish in the third and fourth samples alone. Moreover, neither

need for cognition nor sensation-seeking are significant in the multivariate analyses that follow.

Finally, the cognitive scales (the analytical part of the Beta III, and WM span) are significant in

every pooling where they are available; and, as shown shortly, this significance survives in

multivariate analyses, suggesting that they are robust predictors of saving performance.20

We now turn to our multivariate analyses. For this purpose, we treat the experimental

data as a panel with four repeated measurements of performance (in each of rounds 2 through 5).

We use a random effects estimator to estimate any potentially predictable variance of

performance levels that remains after controlling for observed differences between the subjects

(demographic, cognitive and personality differences) and treatment variation within and between

subjects (the difficulty of, and total income in, the current income stream and the previous

income stream). Besides accounting for the lack of independence between the repeated

measurements on each subject, this also allows us to say how much of the potentially predictable

variance in performance is explained by the various measures, treatments and learning. All

standard errors and tests are computed using the heteroscedasticity-robust “sandwich”

20 As noted earlier, income stream difficulty in rounds 3 and 4 varies across subjects, so this is an experimentally induced part of the total variance of Ps. However, a similar bivariate analysis in which income stream difficulty effects are first partialled out of Ps is essentially identical in all respects to what we discuss here.

21

estimator.21 Degrees of freedom are adjusted downward for tests concerning all effects that vary

strictly between subjects, such as demographic, cognitive and personality measures.

Table 4 shows most of the results of the panel analyses. The parameterization of the

model and linear transformations of its regressors make the intercept interpretable as the mean

round 2 performance of the average male subject at Stephen F. Austin State University, when

facing a moderate difficulty income stream with total income of 60.22 Recall that the forecast

performance for such income streams, on the basis of previous work (Ballinger, Palumbo and

Wilcox 2003), is about 0.585; estimated intercepts are quite similar to this.23 Moreover, the

performance forecast PF(Yr) is a highly significant predictor of performance, indicating that our

scheme for rating income stream difficulty has some aggregate value. Still, if the performance

forecast was an unbiased predictor, its coefficient would be unity; and this hypothesis is easily

rejected. We believe this occurs (in part) because the performance forecast assumes a common

model for all subjects (planning in a myopically optimal way, just two periods ahead—the

median inexperienced behavior suggested by the results of Ballinger, Palumbo and Wilcox). As a

result, its predictive value is probably attenuated by policy heterogeneity across subjects.

When a subject completes a saving task round with a relatively hard income stream (or

one with relatively low total income), it is possible that he learns more from that experience, or

21 We find no evidence of significant autocorrelation of the residuals from these models. There is some weak evidence of heteroscedasticity conditioned on the cognitive ability measures though (as one might expect, lower variance with higher cognitive ability), so the precaution of the sandwich estimator seems appropriate. 22 Specifically, variables are transformed in the following ways. Performance forecasts PF (higher means an “easier” income stream) are differenced from 0.585—the approximate difficulty of all moderate difficulty income streams. Total income ∑Y is differenced from 60, its average sample value; and age, and all cognitive and personality scales, are standardized using the sample mean and sample standard deviation calculated within each pooling of samples. 23 This is actually quite remarkable. The forecast of the intercept is based on data from the earlier experiment of Ballinger, Palumbo and Wilcox (2003) which differs from this experiment in these ways: (a) Subject population; (b) motivational mechanism (binary lottery method in the earlier experiment, almost always direct money method here); (c) utility function (one with both losses and gains and a different shape in the earlier experiment, gains only here); (d) length of the saving task (60 periods in the earlier experiment, 20 periods here); and (e) software interface and instructional protocol. In spite of these differences, the one-parameter model estimated on the basis of the earlier experiment predicts mean subject performance in this experiment extremely well.

22

otherwise exercises more due caution, in the next round than does a subject who faced a

relatively easy income stream (or relatively high total income). Yet we find no evidence of this.

Once-lagged performance forecasts PF(Yr−1) and total incomes ∑Yr−1 are insignificant in all

regressions. Put differently, there is no evidence that performance depends on treatment history

(order effects). Finally, as we hoped would be true given the way we measure performance,

current round total income ∑Yr has no significant effect on current round performance either.

The significant gender effect found in bivariate analyses vanishes in the multivariate

analysis and, as a group, the demographic variables (gender, age and site) are jointly

insignificant. We note that this is not because women and men in these samples differ

systematically in their measured cognitive abilities: There is no significant difference in the WM

span or Beta III scores of women and men in any pooling of the samples.24 Additionally, as

mentioned earlier, the significance of personality scales mostly vanishes in the multivariate

analysis. While perseverance is marginally significant in the third and fourth samples where it is

measured, its negative sign is the opposite of what one would expect if perseverance inhibits

impulsiveness and impulsiveness is a negative influence on saving performance.

Finally, it is very clear that, wherever they are found, the cognitive measures are highly

significant predictors of saving performance. Even in the small pooling (just the third and fourth

samples) where both measures are available and hence in the regression together, they are both

individually significant (and so not mutually redundant, even though positively correlated).

24 Both gender and sensation-seeking sometimes correlate significantly with risk aversion (Eckel and Grossman 2003; Zuckerman 1994; Eckel and Wilson 2004), and their signs in the multivariate analyses are consistent with those findings if risk aversion manifests itself in some counterproductive fashion in the saving task. In fact an F-test rejects their joint insignificance (p < 0.029) in the pooling of the second, third and fourth samples (but not in the “validation” pooling of the third and fourth samples alone), which may explain why gender’s individual significance vanishes in the multivariate analysis. While future work should include a measure of risk aversion, we doubt this would alter our results on cognitive abilities and performance. In the two studies we know of where both task-specific ability and risk aversion are measured, no significant relationship between them has been found (Dohman and Falk 2005, and personal communication from Bram Cadsby based on data from Cadsby, Song and Tapon 2006).

23

In terms of explained variance, the Beta III and WM span alone explain about 19% of the

potentially predictable between-subjects variance (estimated as the variance of random effects in

the model with no demographic, cognitive or personality measures). This is not as large as one

might like, but it is by no means trivial: Their significance is practical as well as statistical.

III. Discussion and Conclusions.

It seems clear that differences in cognitive abilities are the best predictors of saving

performance in our task. This agrees with the great bulk of the evidence reviewed by Stanovich

and West (2000) showing that substantial variance of most (though not all) classic experimental

failures of reasoning are explained by measures of cognitive abilities. It also agrees with recent

suggestive results of Rydval and Ortmann (2004). By contrast, a collection of demographic

variables and scales meant to measure intrinsic motivation, procrastination or impulsiveness have

little consistent explanatory value, especially after controlling for cognitive ability differences.25

Many behavioral economists suspect that there are fundamental differences between the

way subjects behave in induced value experiments (like the imaginary “ECU” consumption

expenditures that yield “points” and ultimately money in our saving task), and the way they

behave toward real goods with “homegrown” utilities and their real money expenditures on those

real goods (Kahneman, Knetsch and Thaler 1990). The preferences we induce in our experiment

do not contain any discounting at all, much less the hyperbolic or quasi-hyperbolic discounting

that many behavioral economists believe are characteristic of actual preferences and that they

25 We deliberately refrain from multi-equation modeling here since our main purpose is to show that cognitive ability measures significantly explain variance even without attention to such possibilities. Nevertheless, note that need for cognition significantly explains some of the variance in WM span, as we thought might be true since WM span is measured without extrinsic rewards. When the residual of WM span (after regression on need for cognition) is substituted for WM span itself in the multivariate model, that residual is an even stronger predictor of saving performance, and need for cognition itself become a significant predictor of saving performance as well. This

24

implicate in phenomena such as procrastination, impulsiveness and poor saving behavior

(O’Donoghue and Rabin 1999; Laibson 1997, 1998). From this perspective, perhaps we should

expect no relationship between measures of personal tendencies toward procrastination and

impulsiveness and saving performance in our tasks. We accept this line of argument. As the

reader should recall, our intent was to avoid confusing effects of cognitive ability with other

potentially correlated things, such as intrinsic motivation, procrastination and impulsiveness.

Nevertheless, past experimental evidence strongly suggests that existing preference-based

explanations of saving behavior are necessarily incomplete. Ballinger, Palumbo and Wilcox

(2003) used laboratory saving tasks virtually identical to the ones we examine here: The induced

preferences use separable utility, no discounting and a finite horizon, and there are strict

borrowing constraints.26 They find under-saving early in the life-cycle and excess sensitivity of

consumption to lagged income changes (Flavin 1981) that is far, far greater than what is

predicted by the optimal precautionary policy. Thus hyperbolic discounting and precommitment

devices (Laibson 1997) are unnecessary to produce the under-saving and excess sensitivity

observed in saving experiments; moreover, optimization subject to prudent preferences (Kimball

1990) and borrowing constraints (Deaton 1992; Carroll 1997) is insufficient to explain the

amount of excess sensitivity observed in the same experiments. Carbone (2004) confirms the gist

of Ballinger, Palumbo and Wilcox’s results with a larger population-representative sample,

showing that models with simple computational interpretations (specifically, inappropriate

geometric discounting, and myopic maximization) describe the behavior of most subjects better

than (inappropriate) hyperbolic discounting models.

illustrates how it may sometimes be important to control for intrinsic motivation levels in economics experiments, particularly those that use covariates measured without any extrinsic incentive mechanism. 26 In fact the only differences are that the task lasted 60 periods (versus the 20 periods used here) and the period utility function u(ct) is different (thought it has essentially similar properties).

25

We do not argue that preferential, institutional or demographic explanations of field

saving anomalies lack merit; lab evidence based on induced values simply cannot prove this.

Rather, we maintain that lab evidence strongly suggests that strictly computational factors

contribute to those anomalies. In particular, the sheer complexity of saving problems, interacting

with heterogeneous cognitive constraints of savers, is a very likely determinant of differences in

saving behavior across individuals. This conclusion is supported by the results of this paper.

To the extent that cognitive constraints cannot be relaxed, policy options tend toward

regulations designed to have little deleterious effect on less constrained individuals but

substantial benefits for more constrained ones; a good example of this in the context of saving is

legislated default options for retirement plans, meant to overcome status quo bias (Camerer et al.

2003). Yet the cognitive phenomena examined here do not clearly resemble simple preferential

or emotional phenomena like status quo bias. Rather, we argue they stem from heterogeneous

abilities to construct high-performing heuristic policies that are determined by the interaction of

task complexity and heterogeneous cognitive constraints.27 There is good evidence that pedagogy

can relax such constraints. Ballinger, Palumbo and Wilcox (2003) and Camerer and Chua (2005)

show that subjects can learn better saving policies through the examples and/or advice provided

by other subjects’ experiences; similarly, Lusardi (2003) finds that people learn much about

retirement planning from older siblings. Put differently, the worst effects of these constraints are

known to be mitigated by informal education based on observation and advice. Perhaps a formal

27 We have heard it argued that if problem complexity is a major part of the problem, individuals will just buy readily available advice from the private sector (financial counselors). This is a non sequitor. Cognitively constrained individuals cannot know (nor would they necessarily be able to produce unbiased estimates of) the gains associated with improved policies; therefore, they cannot necessarily know their own demand for them. Moreover, sellers of financial information have very different motives from potential buyers. Potential buyers are well aware of that and may sensibly distrust seller-provided promotional information about financial planning services. Public education may be particularly effective in this instance: The curriculum can make gains to improved policies vivid and salient with concrete examples, and the issue of trust is a different and (we would argue) a less stark one in the context of the teacher-student relationship in a public school.

26

“financial health” class could be a highly useful part of public education. Pedagogy may have as

much of a role to play as paternalism in correcting poor saving habits.

To close, we think that measures of cognitive abilities in general, and working memory

span in particular, are likely to be useful in many areas of behavioral and experimental

economics. We discuss WM span since we are relatively familiar with it and because of its

ubiquity in contemporary psychological research on cognitive functioning. WM span is

frequently thought to primarily reflect a domain-general capacity for controlled attention (Kane

et al. 2004). Psychologists use managerial metaphors in discussing this capacity, speaking of its

“supervisory” or “executive” functions. It predicts performance on intelligence tests and in

problem-solving. Shah and Miyake (1999) remark that much complex cognition “involve[s]

multiple steps with intermediate results that need to be kept in mind temporarily to accomplish

the task at hand successfully” and identify the capacity to handle this, while processing

information, as working memory. Seen in this manner, decision performance may be mediated in

basic and deep ways by working memory capacity. For instance, the apparent failure of asset

integration (Kahneman and Tversky 1979), an important normative requirement in dynamic

decisions under risk, might be mediated by working memory capacity, since asset integration

(and more generally, normatively desirable “broad decision bracketing” of all kinds) requires

simultaneous attention to outcomes of old decisions and features of current and future ones

(Read, Loewenstein and Rabin 1999). It is also easy to see that WM capacity may predict depth

of reasoning in k-step reasoning and cognitive hierarchy theories of game play (Nagel 1995;

Stahl and Wilson 1995; Camerer, Ho and Chong 2004). In fact, WM capacity might well interact

with game complexity to predict how individual subjects change their depth of reasoning across

games with increasingly complex structural features.

27

WM capacity also plays a role in the ability to ignore distractions, differentially enhance

or inhibit outputs of automatic processes, and focus on what is important. Because of this, it has

been argued that WM span may play an important mediating role between effortful, conscious

processes and automatic processes in “dual process” theories of the mind (Feldman-Barrett,

Tugade and Engle 2004). This mediating role suggests several interesting possibilities. For

behavioral theorists who stress the role of outputs of automatic processes, such as immediate

affective reactions to alternatives and events, differences in WM span may be important in

determining their ultimate impact on the final behavior of individuals. For instance, consider the

potential role of immediate emotional reactions to “unfair” actions in some repeated game with

random rematching to new partners in each period. From a rational viewpoint, it can frequently

be sensible to inhibit the force of those emotions on immediate future actions since (due to

rematching) the subject is about to meet a new partner. “Low spans” may find this more difficult

than “high spans” because their attentional resources are so constrained that immediate

emotional reactions cannot be effectively inhibited. As a result, WM capacity might mediate the

frequency of inappropriately retaliatory actions (as well as other actions based on automatic

emotional responses) in games.

The discussion above is meant to be illustrative rather than exhaustive. There are many

phenomena of interest to behavioral and experimental economists that may be mediated in an

important way by executive control of attention. If so, WM span measures could be an extremely

useful “cognitive capital” measure for research in those areas. We believe the present study

illustrates its promise.

28

References

Ainslie, G. 1975. Specious reward: A behavioral theory of impulsiveness and impulse control. Psychological Bulletin 82:463-496. Ballinger, T. Parker, Michael G. Palumbo and Nathaniel T. Wilcox. 2003. Precautionary saving and social learning across generations: An experiment. Economic Journal 113:920-947. Berg, Joyce E., Lane A. Daley, John W. Dickhaut and John R. O’Brien. 1986. Controlling preferences for lotteries on units of experimental exchange. Quarterly Journal of Economics 101:281-306. Browning, M. and Lusardi, A. 1996. Household saving: Micro theories and micro facts. Journal of Economic Literature 34:1797-1855. Cacioppo, John T., Richard E. Petty and C. F. Kao. 1984. The efficient assessment of need for cognition. Journal of Personality Assessment 48:306-307. Cacioppo, John T., Richard E. Petty, Jeffrey A. Feinstein, and W. Blair G. Jarvis. 1996. Dispositional differences in cognitive motivation: the life and times of individuals varying in need for cognition. Psychological Bulletin 119(2):197-253. Cacioppo, J. T., R. E. Petty and K. J. Morris, K. J. 1983. Effects of need for cognition on message evaluation, recall, and persuasion. Journal of Personality and Social Psychology 45:805-818. Cadsby, C. Bram, Fei Song and Francis Tapon. 2006. Sorting and Incentive Effects of Pay-for-Performance: An Experimental Investigation. University of Guelph (Ontario, Canada) Department of Economics Working Paper. Camerer, Colin F. and Zhikang Chua. 2005. Experiments on intertemporal consumption with habit formation and social learning. California Institute of Technology Working Paper. Camerer, Colin F., Teck-Hua Ho and Kuan Chong. 2004. A cognitive hierarchy model of behavior in games. Quarterly Journal of Economics 119(3):861-898. Camerer, Colin F. and Robin M. Hogarth. 1999. The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty 19(1-3):7-42. Camerer, Colin F., Samuel Issacharoff, George Loewenstein, Ted O'Donoghue and Matthew Rabin. 2003. Regulation for conservatives: Behavioral economics and the case for “asymmetric paternalism.” University of Pennsylvania Law Review 151:1211-1254.

29

Carbone, Enrica. 2004. Understanding intertemporal choices. Paper presented at the June 2004 meeting of the Economic Science Association in Amsterdam. Carroll, C. 1997. Buffer-stock saving and the life cycle/permanent income hypothesis. Quarterly Journal of Economics 112:1-56. Conlisk, John. 1980. Costly optimization versus cheap imitators. Journal of Economic Behavior and Organization 1:275-93. ________. 1996. Why bounded rationality? Journal of Economic Literature 34:669-700. Conway, Andrew R. A., Michael J. Kane, Michael F. Bunting, D. Zach Hambrick, Oliver Wilhelm and Randall W. Engle. 2005. Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin and Review (forthcoming). Crowley, A. E. and W. D. Hoyer. 1989. The relationship between need for cognition and other individual difference variables: A two-dimensional framework. Advances in Consumer Research 16:37-43. Deaton, Angus S. 1992. Understanding Consumption. Oxford: Oxford University Press. Dohmen, Thomas and Armin Falk. 2005. Sorting incentives and performance. Paper delivered at the Economic Science Association meeting in Montreal, Quebec, CA, June 2005. Dyer, Douglas and John H. Kagel. 1996. Bidding in common value auctions: How the commercial construction industry corrects for the winner’s curse. Management Science 42:1463-75. Eckel, Catherine C. and Phillip J. Grossman. 2003. Forecasting risk attitudes: An experimental study of actual and forecast risk attitudes of women and men. Virginia Polytechnic Institute Department of Economics Working Paper. Eckel, Catherine and Rick Wilson. 2004. Is trust a risky decision? Journal of Economic Behavior and Organization 55(4):447-65. Engle, R. W. and M. J. Kane. 2004. Executive attention, working memory capacity, and a two-factor theory of cognitive control. In B. Ross, ed., The Psychology of Learning and Motivation Vol. 44 (pp. 145-199). NY: Elsevier. Engle, R.W., S. W. Tuholski, J. E. Laughlin, and A. R. A. Conway. 1999. Working memory, short-term memory, and general fluid intelligence: A latent variable approach. Journal of Experimental Psychology: General, 128:309–331. Fehr, Ernst and Klaus M. Schmidt. 1999. A theory of fairness, competition and cooperation. Quarterly Journal of Economics 114(3):817-868.

30

Feldman-Barrett, Lisa, Michele M. Tugade and Randall W. Engle. 2004. Individual differences in working memory capacity and dual-process theories of the mind. Psychological Bulletin 130:553-73. Ferrari, J. R. 1992. Psychometric validation of two procrastination inventories for adults: Arousal and avoidance measures. Journal of Psychopathology and Behavior Assessment 14:97-110. Flavin, M. 1981. The adjustment of consumption to changing expectations about future income. Journal of Political Economy 89:974-1009. Friedman, Daniel. 1989. The s-shaped value function as a constrained optimum. American Economic Review 1243-48. Gabaix, Xavier and David Laibson. 2000. A boundedly rational decision algorithm. American Economic Review 90:433-438. Gneezy, Uri and Aldo Rustichini. 2000. Pay enough or don’t pay at all. Quarterly Journal of Economics 115:791-810. Hambrick, D. Z., & Engle, R. W. 2003. The role of working memory in problem solving. In J. E. Davidson & R. J. Sternberg, eds., The psychology of problem solving, pp. 176-206. London: Cambridge Press. Harrison, Glenn and John A. List. 2004. Field experiments. Journal of Economic Literature 42(4):1009-55. Hey, J. and V. Dardanoni. 1988. Optimal consumption under uncertainty: An experimental investigation. Economic Journal 98(390):105-16 (supplement). Kahneman, D., J. Knetsch and R. Thaler. 1990. Experimental tests of the endowment effect and the Coase theorem. Journal of Political Economy 98:1325-1348. Kane, Michael J., David Z. Hambrick, Stephen W. Tuholski, Oliver Wilhelm, Tabitha W. Payne and Randall W. Engle. 2004. The generality of working-memory capacity: A latent-variable approach to verbal and visuo-spatial memory span and reasoning. Journal of Experimental Psychology: General 133(2):189-217. Kellogg, C. E. & N. W. Morton. 1999. Beta III Manual. San Antonio, TX: The Psychological Corporation. Kimball, Miles S. 1990. Precautionary saving in the small and in the large. Econometrica 58:53-73. Loewenstein, George. 1996. Out of control: Visceral influences on behavior. Organizational Behavior and Human Decision Processes 65:272-292.

31

Loewenstein, George, and Jennifer S. Lerner. 2003. The role of affect in decision making. In R. J. Davidson, K. R. Scherer and H. H. Goldsmith, eds., Handbook of Affective Sciences. Oxford, U.K.: Oxford University Press. Loewenstein, George and Drazen Prelec. 1992. Anomalies in intertemporal choice: Evidence and an interpretation. Quarterly Journal of Economics 107:573-597. McDaniel, Tanga M. and E. Elisabet Rutström. 2001. Decision making costs and problem solving performance. Experimental Economics 4:145-61. Millner, Edward and Michael Pratt. 1992. A test of risk inducement: Is inducement of risk-neutrality neutral? Virginia Commonwealth University Department of Economics Working Paper. Nagel, Rosemarie. 1995. Unraveling in guessing games: An experimental study. American Economic Review 85(5):1313-26. Nisbett, R. E. and L. D. Ross. 1980. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall. O’Donoghue, Ted and Matthew Rabin. 1999. Doing it now or later. American Economic Review 89:103-124. Olson, K., C. Camp and D. Fuller. 1984. Curiosity and need for cognition. Psychological Reports 54:71-74. Porteus, Stanley D. 1965. Porteus Maze Tests: Fifty Years’ Application. Palo Alto: Pacific Book Publishers. Quiggin, J. 1982. A theory of anticipated utility. Journal of Economic Behavior and Organization 3:323-343. Raven, J., Raven, J. C., & Court, J. H. 1998. Manual for Raven's Progressive Matrices and Vocabulary Scales. San Antonio, TX: The Psychological Corporation. Reber, Paul J., and Kenneth Kotovsky. 1997. Implicit learning in problem solving: the role of working memory capacity. Journal of Experimental Psychology: General 126:178-203. Read, Daniel, George Loewenstein and Matthew Rabin. 1999. Choice bracketing. Journal of Risk and Uncertainty 19:171-197. Rydval, Ondrej. 2003. The impact of financial incentives on task performance: The Role of cognitive abilities and intrinsic motivation. Prague, CZ: CERGE-EI Discussion Paper 112. Rydval, Ondrej and Andreas Ortmann. 2004. How financial incentives and cognitive abilities affect task performance in laboratory settings: An illustration. Economics Letters 85(3):315-320.

32

Selten, Reinhard, Abdolkarim Sadrieh and Klaus Abbink. 1999. Money does not induce risk neutral behavior, but binary lotteries do even worse. Theory and Decision 46:211-249. Shah, P. and A. Miyake. 1999. Models of working memory: An introduction. In A. Miyake and P. Shah, eds., Models of Working Memory: Mechanisms of Active Maintenance and Executive Control (pp. 1-26). New York: Cambridge University Press. Slovic, Paul, Melissa Finucane, Ellen Peters and Donald G. MacGregor. 2002. The affect heuristic. In T. Gilovich, D. Griffin, & D. Kahneman, eds., Intuitive Judgment: Heuristics and Biases. Cambridge, U.K.: Cambridge University Press. Smith, Vernon L. 1982. Microeconomic systems as an experimental science. American Economic Review 72:923-55. Stahl, D. and P. Wilson. 1995. On player’s models of other players: Theory and experimental evidence. Games and Economic Behavior 7:218-254. Stanovich, Keith E. and Richard F. West. 2000. Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences 23:645-665. Thaler, R. H. 1994. Psychology and savings policies. American Economic Review 84:186-92. Thaler, R. H. and H. M. Shefrin 1981. An economic theory of self-control. Journal of Political Economy 89:392-406. Tuckman, B. W. 1991. The development and concurrent validity of the procrastination scale. Educational and Psychological Measurement 51:473-480. Turner, M. L. and Randall W. Engle. 1989. Is working memory capacity task-dependent? Journal of Memory and Language 28:127-154. Wilcox, N. 1993. Lottery choice: Incentives, complexity and decision time. Economic Journal 103:1397-1417. Whiteside, Stephen P. and Donald R. Lynam. 2001. The five factor model and impulsivity: Using a structural model of personality to understand impulsivity. Personality and Individual Differences 30:669-689. Winkler, R. L. and A. M. Murphy. 1973. Experiments in the laboratory and the real world. Organizational Behavior and Human Performance 20:252-270. Zuckerman, M. 1979. Sensation Seeking. Hillsdale, N.J.: Erlbaum. Zuckerman, M. 1994. Behavioural Expressions and Biosocial Bases of Sensation Seeking. Cambridge: Cambridge University Press.

33

TABLE 1

Summary of Cognitive, Motivational and Personality Measures, and Which Samples Have Them

Samples Measure Class

Measure Chief Reference Works and Brief Description 1st 2nd 3rd & 4th

Beta III

Kellogg and Morten (1999). Tests of nonverbal reasoning and cognitive functions. Used for nearly a century. We use reasoning

part only (combination of scores on tests 2 and 5)

X X

Porteus Maze

Porteus (1965). Maze-threading with pencil. Claimed to measure both planning ability and impulse control.

X

Raven SPM+

Raven, Raven and Court (1998). Nonverbal pattern induction or “matrix reasoning” test. Widely used in contemporary research on

cognitive functioning and intelligence.

X

Raven SPM

Same as above, but somewhat abbreviated and simpler version (without the added upper-tail sensitivity of the “+” version)

X

Cognitive Scales

Working Memory Span

Conway et al. (2005). The “operation span” test. The ability to control attention and thought in the face of processing load. A central measurement in contemporary cognitive psychology.

X X

Intrinsic Motivation

Need for Cognition

Cacioppo et al. (1996). Item-response-base measure of intrinsic motivation to engage in effortful thought.

X X

Procrastination

Tuckman (1991). Item-response-based measure of the tendency to procrastinate.

X X

Premeditation

X X

Sensation-seeking

X X

Perseverance

X

Personality Scales

Four Dimensions

of Impulsivity

Urgency

Whiteside and Lynam (2001). Four item-response-based measures of personality characteristics, each thought to either contribute to,

or inhibit, impulsive behavior

X

34

TABLE 2

Pearson Correlations Between Performance Psr in Different Rounds r.

round 1 round 2 round 3 round 4 round 5

round 1

0.366

0.280 0.250 0.235

round 2 0.589 0.619 0.661

round 3 0.624 0.597

round 4 0.674

Notes: Results are for all four samples, subjects = 183. All correlations are highly significant.

35

TABLE 3

Spearman Correlations Between Average Performance Levels and Demographic, Cognitive and Personality Variables in Various Poolings of Samples

Variable Class

Variable All samples

Subjects=183

Samples 1, 3 & 4

Subjects=139

Samples 2, 3 & 4

Subjects=135

Samples 3 & 4

Subjects=91 Female

−0.14

p=0.059 −0.14

p=0.088 −0.19

p=0.024 −0.22

p=0.033 Age

0.031

p=0.67 −0.0058 p=0.12

0.084 p=0.33

0.073 p=0.49

Demographic Variables

Univ. of Houston

0.089 p=0.23

0.12 p=0.16

0.13 p=0.12

0.17 p=0.099

Beta III analytical

0.28 p=0.0009

0.24 p=0.021

Cognitive

scales WM span

0.31 p=0.0002

0.22 p=0.033

Need for cognition

0.16 p=0.061

0.11 p=0.29

Procrastination

0.075 p=0.39

0.021 p=0.84

Premeditation

−0.094 p=0.28

−0.12 p=0.26

Sensation-seeking

0.19 p=0.031

0.14 p=0.19

Perseverance

−0.13 p=0.23

Personality Scales

Urgency

−0.060 p=0.57

36

TABLE 4

Random Effects Panel Regressions of Performance on Trials, Treatments and Subject Characteristics in Various Combinations of Samples: Estimates and Significance Tests

All samples Subjects=183

Samples 1, 3 & 4 Subjects=139

Samples 2, 3 & 4 Subjects=135

Samples 3 & 4 Subjects=91

Regressor Class

Regressor

Estimate (std. err.)

Joint Tests

Estimate (std. err.)

Joint Tests

Estimate (std. err.)

Joint Tests

Estimate (std. err.)

Joint Tests

Intercept Intercept 0.55***

(0.061)

0.56***

(0.039)

0.63***

(0.034)

0.60***

(0.047)

round 3 difference

0.0062 (0.018)

0.021 (0.021)

−0.0040 (0.020)

0.014 (0.025)

round 4 difference

0.055***

(0.016) 0.083***

(0.019) 0.036**

(0.018) 0.070***

(0.022)

Learning

round 5 difference

0.052***

(0.015)

Yes (***)

0.067***

(0.018)

Yes (***)

0.035*

(0.018)

Yes (*)

0.052**

(0.023)

Yes (***)

PF(Yr) 0.37***

(0.086) 0.41***

(0.097) 0.35***

(0.098) 0.39***

(0.11) PF(Yr−1) 0.014

(0.065) 0.028

(0.075) −0.0055 (0.074)

0.0050 (0.093)

∑Yr 0.000027 (0.00082)

−0.00047 (0.00094)

0.00038 (0.00092)

−0.00017 (0.0011)

Treatments (Properties of current and once-

lagged income streams) ∑Yr−1 0.00079

(0.00075)

Yes (***)

0.00098 (0.00084)

Yes (***)

0.00093 (0.00092)

Yes (***)

0.0013 (0.0011)

Yes (**)

Female −0.053 (0.033)

−0.037 (0.036)

−0.065 (0.043)

−0.064 (0.054)

Age 0.0023 (0.0023)

−0.016 (0.012)

0.020 (0.016)

−0.0024 (0.018)

Demo-graphic

Variables

Univ. of Houston

0.024 (0.033)

No

0.047 (0.037)

No

0.026 (0.045)

No

0.027 (0.052)

No

Beta III analytical

0.075***

(0.018)

0.046**

(0.021)

Cognitive scales WM

span

0.067***

(0.017)

0.043**

(0.021)

Yes

(***)

Need for cognition

0.026 (0.016)

0.0082 (0.021)

Procras- tination

0.025 (0.019)

−0.0063 (0.027)

Premed- itation

−0.0031 (0.022)

−0.0016 (0.028)

Sensation-seeking

0.028 (0.023)

0.011 (0.029)

Perse-verance

−0.040*

(0.023)

Personality Scales

Urgency

No

−0.020 (0.023)

No

Notes: *, **, and *** indicate significance at α = 10%, 5% and 1%, respectively.


Recommended