INTEGRATING OVERCONFIDENCE AND OVERCLAIMING:
EXAGGERATION HARMS PERFORMANCE
by
PATRICK J. DUBOIS
M.A., The University of British Columbia, 2015
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES
(Psychology)
THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)
August 2021
© Patrick J. Dubois, 2021
ii
The following individuals certify that they have read, and recommend to the Faculty of
Graduate and Postdoctoral Studies for acceptance, the dissertation titled:
Integrating Overconfidence and Overclaiming:
Exaggeration Harms Performance
submitted by Patrick J. Dubois in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Psychology.
Examining Committee
Jeremy Biesanz, Associate Professor, Psychology, UBC
Supervisor
Peter Graf, Professor, Psychology, UBC
Supervisory Committee Member
Steven Heine, Professor, Psychology, UBC
Supervisory Committee Member
Kristin Laurin, Associate Professor, Psychology, UBC
University Examiner
Katherine White, Professor, Marketing and Behavioural Science, UBC
University Examiner
Joachim Krueger, Professor, Psychology, Brown University
External Examiner
iii
Abstract
Some people have an exaggerated self-image: They imagine their abilities to be
greater than they are. This discrepancy between self-perception and reality has been
studied for at least a century under the names of overstatement, overestimation,
overconfidence, and overclaiming, yet considering this research altogether reveals some
contradictions. By introducing a unified approach (the Residualized Exaggeration Index, or
RExI), the present research rectifies past oversights and shows that exaggeration reliably
predicts narcissism, entitlement, and impatience, as well as lower academic performance
regardless of cognitive ability. As a more precise operationalization of what is connoted by
“overconfidence”, the RExI approach can also easily be incorporated into common
educational practice to provide more accurate and wholistic learner assessment, and
perhaps provide a foundation for improving self-awareness and critical thinking skills.
iv
Lay Summary
How well do you know what you know? Our tendency to overstate, overestimate,
overclaim or otherwise exaggerate or be overconfident about our abilities has been studied
for at least a century, but curiously, has lacked a consistent analytic framework. By
introducing a unified approach, this research resolves previous inconsistencies and
demonstrates that knowledge exaggeration in university students undermines their
academic performance, regardless of their intelligence. This approach could also provide
more accurate and wholistic assessment in almost any educational context.
v
Preface
This dissertation is original, unpublished, independent work by the author, Patrick J.
Dubois. Data collection for Study 1 was covered by UBC Ethics Certificate number
H16-00753; for Studies 2 and 4, Certificate H18-02505.
vi
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Lay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Introduction: Defining Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Framing Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Overstatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Overestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Overclaiming: Foils Among Reals . . . . . . . . . . . . . . . . . . . . . . . . 9
Theoretical Causes of Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Exaggeration as Self-Enhancement . . . . . . . . . . . . . . . . . . . . . . . 13
Exaggeration as Cognitive Bias . . . . . . . . . . . . . . . . . . . . . . . . . 15
Exaggeration as Carelessness . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
All of the Above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
None of the Above . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Measuring Exaggeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Item Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
Analytic Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
A Unified Approach to Assessing Exaggeration . . . . . . . . . . . . . . . . . 26
Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Validation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Study 1: Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Study 2: Validating the Residualized Exaggeration Index (RExI) . . . . . . . 34
Study 3: Developing Better Measures . . . . . . . . . . . . . . . . . . . . . . 34
Study 4: Robustness of the RExI . . . . . . . . . . . . . . . . . . . . . . . . . 35
Reporting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Study 1: Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Predictive Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Incremental Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Study 2: Validating the RExI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Predictive Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Incremental Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
viii
Study 3: Developing Better Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Overclaiming Instrument Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Overstatement Instrument Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Study 4: Robustness of the RExI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Predictive Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Foil Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Incremental Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Research Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
What is this thing called Exaggeration? . . . . . . . . . . . . . . . . . . . . . . . 94
Potential Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Standardized Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
ix
Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Appendix: The Overclaiming Technique (OCT) . . . . . . . . . . . . . . . . . . . . . 118
x
List of Tables
Table 1 Correlations between Study 1 measures . . . . . . . . . . . . . . . . . . . 43
Table 2 Regression Model Predicting Course Grades from RExI in Study 1 . . . . 46
Table 3 Correlations between Study 2 measures . . . . . . . . . . . . . . . . . . . 54
Table 4 Regression Model Predicting GPA from OCQ RExI in Study 2 . . . . . . . 57
Table 5 Correlations between Study 3 measures . . . . . . . . . . . . . . . . . . . 66
Table 6 Correlations between Study 4 measures . . . . . . . . . . . . . . . . . . . 75
Table 7 Study 4 RExI Correlates, Selected from Table 6 . . . . . . . . . . . . . . 77
Table 8 Study 4 Overclaiming Response Times Predicting RExI. . . . . . . . . . . 83
Table 9 Regression Model Predicting GPA from VoKE RExI in Study 4 . . . . . . 84
Table 10 Regression Model Predicting GPA from VST RExI in Study 4 . . . . . . . 85
xi
List of Figures
Figure 1 Influence of Competence and Self-Image on Performance. . . . . . . . . . 2
xii
List of Abbreviations
AEQ Academic Entitlement Questionnaire: An 8-item measure of academic
entitlement developed by Kopp et al. (2011).
BIDR Balanced Inventory of Desirable Responding: A 3 ∗ 20 = 60-item set of measures
of Impression Management (IM), Self-Deceptive Enhancement (SDE), and
Self-Deceptive Denial (SDD) by Paulhus (1998); these scales are “balanced” in
that they contain equal numbers of forward- and reverse-scored items.
CIHS Comprehensive Intellectual Humility Scale: A four-factor measure of intellectual
humility developed by Krumrei-Mancuso and Rouse (2016).
CRT Cognitive Reflection Test: A set of mathematical reasoning word problems which
suggest intuitive but incorrect answers, so correct answers require more reflection.
Originated by Frederick (2005), then later expanded by Toplak et al. (2014) and
Thomson and Oppenheimer (2016).
ELP English Lexicon Project: A “multiuniversity effort to provide a standardized
behavioral and descriptive data set for 40,481 words and 40,481 nonwords . . .
available via the Internet at elexicon.wustl.edu” (Balota et al., 2007).
GPA Grade Point Average: In this context, the average grade achieved by participating
students across all their courses taken at the University of British Columbia.
IM Impression Management: Part of the BIDR.
LDT Lexical Decision Task: A test commonly used in psycholinguistics: participants
must decide as quickly and accurately as possible whether a word is real or not.
MSLQ Motivated Strategies for Learning Questionnaire: An inventory of Likert-style
items for measuring various attitudes relevant to academic performance (Pintrich,
1991).
MTurk Amazon Mechanical Turk: A service where humans volunteer to complete simple
xiii
tasks online for financial compensation. See Buhrmester et al. (2011).
NFCS Need for Cognition Scale: An 18-item measure developed by Cacioppo et al.
(1984).
NPI Narcissistic Personality Inventory: A popular measure of non-clinical narcissism
(Raskin & Terry, 1988); both 40-item and 16-item versions are used in this paper.
OCQ Overclaiming Questionnaire: A set of reals and foils, based on Hirsch Jr et al.
(1988), introduced by Paulhus et al., 2003 to demonstrate the OCT.
OCT Overclaiming Technique: An application of SDT to overclaiming introduced by
Paulhus et al. (2003).
OLD20 Average Orthographic Levenshtein Distance of the 20 Closest Neighbors: A
technique for measuring (un)wordlikeness by averaging the edit distances of a
letter string from the 20 most similar in a reference corpus of words.
PES Psychological Sense of Entitlement: a 9-item measure by Campbell, Bonacci,
et al. (2004).
RExI Residualized Exaggeration Index: A general technique for isolating exaggeration
in self-image of competence; Residualized incompetence evidence after controlling
for competence evidence.
SDD Self-Deceptive Denial: Part of the BIDR.
SDE Self-Deceptive Enhancement: Part of the BIDR.
SDT Signal Detection Theory: A well-established theoretical framework with analytic
techniques for distinguishing accuracy from response bias when discriminating
ambiguous signals (Macmillan, 2002).
TIPI Ten-Item Personality Inventory: A popular brief measure of the five-factor model
of personality by Gosling et al. (2003).
xiv
UBC University of British Columbia: The location where all studies presented here
took place, using their enrolled undergraduates.
VoKE Vocabulary Knowledge Exaggeration: An English vocabulary overclaiming
inventory (set of reals and foils) developed for this paper, with items selected
using theory from psycholinguistics and cognitive psychology, and empirical
testing.
VST Vocabulary Size Test: A multiple-choice test for assessing size of one’s English
vocabulary (Beglar, 2010).
xv
Acknowledgements
I would never have known about overclaiming, its limitations or potential, unless
Delroy Paulhus had asked my familiarity with a list of jazz musicians. Unfortunately, I did
not overclaim, sparking the suspicion that drove this research.
It was the important work of Steven Heine and Ara Norenzayan that showed me how
incredibly inappropriate it was to rely on convenience samples of undergraduates, unless, of
course, one makes that the population of interest.
I could not have completed this PhD had not Jeremy Biesanz patiently helped me
reformulate my heretical, unsupervised research into a coherent thesis.
I will be eternally grateful for the many kindnesses offered by so many of the faculty,
staff and students of the UBC department of psychology as I impostered my way through
grad school. If you’re reading this, I made it out alive.
I could not have afforded my time at UBC without substantial funding from The
Social Sciences and Humanities Research Council (SSHRC) of Canada; your tax dollars at
work.
Thank you.
1
Introduction: Defining Exaggeration
“I know words. I have the best words.” — Trump (2015).
Donald Trump has famously displayed exaggerated self assessment, claiming greater
ability than he demonstrates. Anyone not born yesterday will have encountered other
people who imagine themselves overly positively, and experience teaches us to not always
trust self-presentation of ability.
Such a person may be labeled braggart, boaster, or blusterer, and we might describe
such behavior as overstating, overestimating, or overclaiming ability, being overconfident
about performance, or having a self-image exceeding genuine competence. Why do we have
so many synonymous descriptions? Probably because such behavior is socially noteworthy.
According to the lexical hypothesis, the basis for much of personality psychology, “Those
individual differences that are of most significance in the daily transactions of persons with
each other will eventually become encoded into their language. The more important is such
a difference, the more people will notice it and wish to talk of it” (Goldberg, 1981, pp.
141-142). We use such labels to warn others about exaggerated self-report.
Merriam-Webster (2020a) defines the verb exaggerate as “to enlarge beyond bounds
or the truth : overstate”. Central to this definition is disparity from reality. Trump’s
monosyllabic boast above might be plausible if he had demonstrated a superior vocabulary,
but he did not. A language analysis by the Boston Globe of candidates’ 2015 campaign
announcements rated Bernie Sanders at school grade 10, Hillary Clinton near grade 8, and
Donald Trump as the lowest of all at grade 4 (Schumacher & Eskenazi, 2016).
Exaggeration is about excess.
While a cartoon or caricature might have exaggerated movements or expressions,
exaggeration here refers only to a person’s excessive self-image of their ability. This paper
examines exaggeration as a psychological phenomenon, how it differs among individuals,
and what those differences might mean. To that end, I define exaggeration as individual
differences in discrepancy between imagined and actual competence, unrelated to
2
competence.
I first begin with a conceptual model of how exaggeration may arise, then review how
this individual difference has been measured in the past, identifying some oversights and
contradictions in existing literature. In response to that, a new approach to exaggeration is
proposed, then implemented in four empirical, quantitative studies. The end result
validates a methodology for more clearly understanding this well-known but misunderstood
phenomenon, and establishes a foundation for future research.
Framing Exaggeration
Figure 1Influence of Competence and Self-Image on Performance.
Exaggeration is an example of a latent construct, a theoretical conception of a hidden,
unobservable psychological phenomenon. When Forrest Gump famously noted that “Stupid
is as stupid does” (Zemeckis, 1994), he was wisely noting that we can only infer a latent
construct (e.g. stupidity) through its expression in observable behavior. Similarly, we can
explore what exaggeration is by looking at what it does to performance of the exaggerated
ability.
The model shown in Figure 1 is based on existing understanding of how abilities are
manifested: “Performance is conceived as the observable solution behavior of a person on a
set of domain-specific problems. Competence (ability, skills) is understood as a theoretical
3
construct accounting for the performance.” (Korossy, 1999, p. 103, original emphasis).
Competence is positively correlated with performance (the ‘+’); they increase (or decrease)
together. Humans have been fascinated with comparing competencies through
performance, as the long history of the Olympics or other competitions shows. Our abilities
are likewise tested throughout our education with examinations and other performance
tests. We objectively evaluate such performances because we know that self-report is not
always an accurate indicator of genuine competence: We don’t just ask who is fastest or
smartest, we test people.
Nonetheless, the raw potential of competence is not the only determinant of
performance. Our beliefs about our competence, and how we will meet a challenge, are also
relevant. Some of our beliefs will be based on feelings of confidence, our internal sense of
competence, but other beliefs may interfere with the accuracy of this sensing. If I believe
running a marathon makes me a good person, need to affirm that belief may overshadow
accurate competence assessment, leaving me collapsed half-way through the race.
Our self-image is our mental construction of who we are, and this will include beliefs
about what we can do, should be able to do, and what effort is required for success. Ideally,
self-image of competence should positively correlate with genuine competence, even if
imperfectly. The question mark in Figure 1 indicates that self-image may contribute
positively or negatively to performance, depending on the harmony between self-image and
competence. Performance is thus shaped by both what we are capable of (competence),
and how we imagine that capability (self-image). Aesop’s fable of the tortoise and the hare
(in which the hare, far more competent yet arrogant, loses a race to the tortoise) eloquently
demonstrates how self-image can interfere with expression of competence.
Exaggeration can thus be seen as a way in which distorted self-image impairs
performance.1 This is similar to, but distinct from, typical conceptions of confidence, where
1 While exaggeration considers excessive perception of competence, performance may also be impaired byinadequate perception of competence. Such “underconfidence” is not considered here because it isapparently rare and involves the methodological challenge of measuring competence that is not expressed.
4
overconfidence implies an extension of a linear, unidimensional construct beyond some
optimal point. Instead, exaggeration allows for various aspects of self-image to interfere
with competence expression in undermining performance. Note that the model does not
necessarily imply any mediation or moderation relationship. A central goal of the current
research is to distinguish the influence excessive self-image has on performance, separate
from the influence of competence.
Additionally, situational factors may alter our self-image, or its impact on
performance, such as an audience boosting or shriveling our confidence, but for the
purposes of the current research, those many, complex situational factors are set aside in
order to address issues with isolating effects from self-image.
To capture exaggeration, we will need behavioral indications of what a person’s
genuine competence is, and what they mistakenly imagine it is. This can be done by
soliciting optional expressions of ability that provide evidence of competence or falsely
imagined competence: active incompetence.2 By making the expressions optional, one need
only express competence where one imagines competence, e.g. one can admit “I can’t do
that”, rather than pretend they can. In such a situation, active incompetence (e.g. failing
an optional task) indicates error in imagined competence, suggesting exaggerated
self-image: The person thought they could do something they could not.
To minimize confounding influences, all evidence should be collected at the same time
under similar circumstances. To maximize reliability, several ability expressions should be
solicited and aggregated. In other words, to gather evidence of exaggeration, get people to
repeatedly volunteer evidence of competence or incompetence, in comparable proportions.
Finally, because competence is a strong predictor of successful outcomes, care should be
taken that measurement of exaggeration is demonstrably distinct from evidence of
competence. Altogether, this framework presents three requirements for measuring
exaggeration: a) active competence, b) active incompetence, and c) isolation of self-image
2 In constrast to the passive incompetence of not responding to a question.
5
error from competence as evidence of exaggeration.
Because exaggeration of one’s abilities to others may yield rewards (e.g. winning an
election) and involves several complicated contextual factors, for simplicity, all the research
considered here minimizes social or situational influences, or obvious opportunities for gain
from manipulation or deceit. The goal is to understand exaggeration as an intrapersonal
(within self), not interpersonal (between people) phenomenon.
Conceptually, exaggeration can be considered synonymous with the terms
overstatement, overestimation, overclaiming and overconfidence, yet all those terms have
been used for distinct methodological approaches to measuring differences in one’s imagined
and actual abilities. This may be an example of the jangle fallacy : “the use of two separate
words or expressions covering in fact the same basic situation” (Kelley, 1927, p. 64). The
present research aims to integrate those approaches into one unified methodology.
A Brief History
Broadly speaking, there have been two approaches to simultaneously gathering
evidence of competence and imagined competence. One approach (overstatement and
overestimation tests) combines objective tests with (prior or post) estimates of success.
The number of correct answers (objectively scored) serves as evidence of competence, while
falsely imagined correctness (subjective statement or estimation) suggests unacknowledged
incompetence. The number of answers not claimed correct is ignored but allows a degree of
freedom between the other two scores.
Another approach (overclaiming) uses only ability claims, but embeds the competence
distinction in the items themselves. All items involve claiming ability, but some of the
items are fictitious, so claiming them requires active incompetence. This also yields two
scores: the rate or amount of claiming genuine items (reals), and the rate or amount of
claiming fictitious items (foils).
Both approaches allow someone to volunteer evidence of competence or
6
incompetence, all within the same test. While both have been around for nearly a century,
these two approaches have not been explicitly examined together.
Overstatement
The rising popularity of IQ tests at the start of the 20th century inspired a plethora
of attempts to quantify human potential (Richardson, 2002). Based on the belief that a
discrepancy between claimed and demonstrated ability was diagnostic of one’s “character”,
a review of research around that time (Symonds, 1924) listed the overstatement test
(Voelker, 1921) as an emerging assessment method.
As an example, Woodrow and Bemmels (1927) compared results of an overstatement
test to a “goodness” of character rating by teachers of pre-school children, reporting a
rank-order correlation of rs = .56 for a group of 17 five-year-olds and rs = .433 for a group
of 14 four-year-olds. The overstatement test involved a researcher interviewing children
individually, telling them “I’m going to ask you some questions to find out how many
things you can do. I want to find out who in your class can do the most.” (p. 241), then
asking a variety of questions such as “Can you write your name?”, “Can you stand on your
head?” and “Can you count up to ten?”, finally followed by the child demonstrating each
claimed ability, which was then liberally assessed. The younger group claimed 51% (while
performing 30%) and the older group claimed 75% (then performing 50%) of the queried
abilities, with only one child under-estimating their ability.
As an individual difference measure, the test was scored as the number claimed
divided by the number performed, with “The smaller this ratio, the better the score” (p.
242), presumably meaning that reverse ranking this score accounted for the positive
correlations (above) between less overstatement and teachers’ ratings of character goodness.
Considering statistical and practical issues (inherent in using ratios for scores), the
authors conclude that the issue of scoring the test “is not an altogether simple one.”
3 These correlations reported as ρ in the original paper.
7
(Woodrow & Bemmels, 1927, p. 243). These issues may have proven insurmountable, as
the overstatement test had faded to obscurity by the 1960s4.
Overestimation
Part of modern research on overconfidence is an approach called overestimation, a
methodology very similar to overstatement: “If a student who took a 10-item quiz believes
that he answered five of the questions correctly when, in fact, he got only three correct,
then he has overestimated his score. Roughly 64% of empirical studies on overconfidence
examined overestimation.” (D. A. Moore & Healy, 2008, p. 502). Overestimation is
typically calculated as a difference score, the arithmetic excess of estimate over
performance: 5− 3 = 2 in that example.
Both overestimation and overstatement gather ostensibly identical information: the
number imagined correct and the number objectively correct. The methodology of
overestimation, however, may be less psychologically direct than that of overstatement.
An overestimation test requires answers for every item (possibly by guessing),
regardless of one’s sense of ability. After the test, the participant must reflect on their
aggregate score in order to make an estimate. Retrospectively evaluating one’s performance
on a completed task may elicit “choice-induced preference change” (D. Lee & Daunizeau,
2020, p. 1). For example, it may be easier to rationalize that one answered a question
correctly after committing to an answer, if only because considering alternatives after such
commitment can induce cognitive dissonance (Joule & Azdia, 2003). There is also the
possibility that reflection on past performance may include recency effects (Murre & Dros,
2015), where experiences from the last few questions may carry more weight in an
aggregate estimate.
A further complication arises from the transparency of the research question (e.g.
“How many do you estimate you got correct?”). Made aware that accuracy of estimation is
4 As a shown by a Google Ngram search
8
under scrutiny, motives for social desirability or impression management may inspire a false
modesty: Having already completed the task, I will appear more humble if deflating my
estimated score. By triggering self-consciousness, overestimation methodology may be
distorted by the same psychological processes it attempts to measure.
Thus, retrospective assessment of aggregated past performance done during
overestimation tests may differ psychologically from the prospective assessment of specific
abilities done in overstatement tests. The (prospective) overstatement approach may be
more psychologically valuable simply because we are often more interested in predicting
somebody’s future behavior (e.g. likelihood of making an error) than predicting their
after-the-fact estimate. Overestimation tells you how people perceive past performance,
while overstatement tells you how they imagine future success.
An established observation about overestimation is that it varies with difficulty, i.e. is
relatively greater in hard tests and lower in easy tests. A multi-cultural examination of
overestimation replicated this hard-easy effect and found it much stronger than effects of
culture, sex or age (D. A. Moore et al., 2018). This effect, where excess rating is inversely
related to difficulty or ability, may be a methodological artifact: Low performers have more
range to overestimate than high performers. Seen another way, if estimates tend toward the
center of the distribution, difference scores will tend to be positive for hard tests and lower
(or negative) for easy tests. Even if estimates correlate perfectly with performance, but are
shifted centrally, difference scores will show the hard-easy effect. Such a scoring will likely
be negatively correlated with ability, e.g. Duttle (2016) found r = −.69 between
overestimation and performance on Raven’s Progressive Matrices. Difference scores yield
an effect similar to that found in the “unskilled and unaware” Dunning–Kruger effect
(Kruger & Dunning, 1999) which has been shown to be largely a statistical illusion
(Krueger & Mueller, 2002). More precisely, Cor(Score, Estimate− Score) approaches
Cor(Score,−Score) to the extent that V ar(Estimate) < V ar(Score). As mean Score
rises, Estimate range decreases. Evidence of exaggeration is mathematically confounded
9
with evidence of competence, so it’s impossible to cleanly distinguish effects.
Given the statistical problems noted with ratios used in overstatement tests, and the
artifactual correlations of difference scores used in overestimation (and sometimes
overstatement), neither overstatement nor overestimation, as conventionally implemented,
provide a clean measure of exaggeration separate from the ability being exaggerated.
Overclaiming: Foils Among Reals
As charming as it may have been in the 1920s to have children stand on their heads
for science, that approach can be difficult to scale up. Any objective ability test may take a
long time and induce stress in participants. A far more convenient ability to test is
knowledgeability, and probably the most face-valid or obvious test of exaggerated
knowledge is to query familiarity with something that does not exist.
For example, imagine a simple vocabulary test that only required the respondent to
rate their knowledge of words, without demonstrating that knowledge. If such a list
included the ostensible word covfefe, claiming to know the meaning of that5 would
demonstrate active incompetence.
Questions about fabricated, non-existent items, often labeled as bogus or foil,6 seem
ideal for capturing exaggeration because knowledge of such items is impossible (if the item
is designed appropriately, which may not be the case, as we shall see). Such items are often
combined with similar genuine or real items to also collect some evidence of competence.
In the book New Perspectives on Faking in Personality Assessment, the chapter on
“Overclaiming on Personality Questionnaires” surveys the use of foils claiming in
psychological research, describing “several historical precedents for the notion that claiming
familiarity with foils is a face-valid indicator of knowledge exaggeration”, with exaggeration
5 As many have: www.snopes.com/fact-check/covfefe-arabic-antediluvian/
6 While both these terms apply to impossible items that honest, attentive, rational people should neverclaim, the term bogus is typically used for items researchers expect to have no desirability, while foil refersto items which someone might have reason to claim falsely. As will be discussed below, such a distinction isnot always clear cut.
10
interpreted there as faking (Paulhus, 2012, p. 151). It reports the earliest use of foils in
psychological research as Raubenheimer (1925) where respondents indicated which books
they had read, with 10 of 25 titles presented being fictitious. For example, respondents
(boys being assessed for potential delinquency) could claim to have read the existing book
“Robinson Crusoe” (to indicate literary knowledge) or the nonexistent book “The
Prize-Fighters Story”, indicating an exaggerated self-report.
That chapter title refers to a study by Phillips and Clancy (1972) which used
overclaiming to describe foils claiming, a term which Merriam-Webster (2020b) reports first
appeared in 1824 and means “to claim too much of something”. That study queried
participants about “their use of several new products, books, television programs, and
movies — all of which were actually nonexistent” (p. 928; their emphasis). This
overclaiming behavior was found to be related to participants’ rating of the desirability of
being the kind of person who tries new products, etc. The association between foils
claiming and valuing being trendy suggests a motivated, self-enhancing exaggeration.
As noted above, foils claiming has also been interpreted as dishonesty or faking.
Anderson et al. (1984) assessed job applicants by having them rate their skill levels on a
variety of tasks, many of which were genuine (i.e. reals), while several of were fictional, i.e.
foils. For example, respondents were asked to rate their experience with the fictitious task
“Typing from audio-fortran reports” (Table 2, p. 577). The extent of foils claiming was
found to negatively predict a later objective test of job skills, especially when controlling
for self-assessment of genuine skills.
Misrepresentation of self is not the only interpretation given to foils claiming: Foils
have appeared in research with other goals. For improving validity in marketing tests of
advertising exposure, Lucas (1942) describes a technique originating in 1937: Participants
report their recognition of various advertisements, some of which are unpublished and
could not have been seen, i.e. foils. Following that methodology, Smith and Mason (1970)
reported that warning participants about the foil ads had no effect on claim rates. This
11
suggests a possible recognition memory bias; respondents genuinely believed they
recognized something they had not seen before. The aim of such research, however, was to
assess advertising efficacy, not psychological mechanisms driving false claims.
Among other applications, foils have also been used to check validity of traumatic
brain injury reports (e.g. Mackenzie & McMillan, 2005) and digital literacy surveys (e.g.
Hargittai, 2009), to assess pretrial prejudice in court cases (e.g. Moran & Cutler, 1997), or
to generally identify careless survey responses (e.g. Meade & Craig, 2012). For example, if
a North American respondent agrees to the statement “I have never brushed my teeth”
(Meade & Craig, 2012, Table 1, p. 5), they are probably not paying attention to that
question, nor possibly the rest of the survey. Such investigations typically treat foils
claiming as errors to be corrected for, with little consideration of what such aberrant
behavior might indicate.
To summarize, foils have been used in research to ostensibly assess self-enhancement
(ego-motivated misrepresentation, such as claiming to have used a fictitious product),
cognitive bias (falsely recognizing an ad they had not seen) or carelessness (lack of
attention in survey responding). To my knowledge, no previous research has adequately
considered these contrasting explanations simultaneously.
An issue worth noting has to do with the ethics of using foils, i.e. whether it is
deceptive to ask about impossible abilities, to confront people with un-winnable challenges,
or entrap them into failure. Typical ability tests do not ask trick questions; one may
assume that if asked, “Do you know X?”, X actually exists. Warning about foils or failure,
however, raises the practical issue of insuring that participants acknowledge and
understand such a warning. The more careless, for example, may not take heed.
Alternatively, the more cautious or risk-averse may alter their responding more drastically
than others. Inevitably, there will be new individual differences introduced by the warning
and how it is presented. In everyday life, we encounter impossible problems, areas where
nobody should claim answers, with no guardian warning us of potential failure. Thus, an
12
ecologically valid test should introduce no more protective measures than ordinary life.
The absence of warning may be more valid in another way. There may be many
individual, contextual, or cultural differences in the degree to which people believe they
could or should be able to confront any and all challenges successfully. Exaggeration may
reflect such a difference, i.e. the tendency to assume, or be entitled to, a certain level of
success, as if one expects or deserves it.
Regardless, warning about potential foils (vs not), while it discouraged claiming in
general, did not remove relationships between overclaiming and narcissism in a study done
by Paulhus et al. (2003). This is consistent with the findings noted earlier that warning
participants about bogus ads did not affect false recognition rates (Smith & Mason, 1970),
and that the relationship between false knowledge claims and self-perceived knowledge was
not altered by warning about foils (Atir et al., 2015). Consequently, for ecological validity,
simplicity and practicality, the current research does not warn about the presence of foils.
Theoretical Causes of Exaggeration
Why might someone exaggerate their competence? Broadly, we can consider two
kinds of possible causes: motivated and unmotivated.
Someone may be motivated to enhance their self-presentation, to overly state or claim
ability, simply because, socially, it can pay off. Threatened animals often present
themselves as larger or more ferocious to intimidate others. Mate selection often requires
putting one’s best foot, feathers, calls, dancing or behavior, forward to impress the other
sex. Human overconfidence yields status benefits (Kennedy et al., 2013) so exaggeration
may help to intimidate others, or to acquire mates, votes, jobs, or other resources. For
example, Trump may not have won the 2016 election if he acknowledged his several
business failures (Stuart, 2016). Given that relative neocortex size relates to use of tactical
deception (Byrne & Whiten, 1992), humans may be uniquely equipped for exaggeration.
While these examples refer to interpersonal exaggeration, boasting to others, it may be
13
that such strategies get reinforced and internalized, leading to habitual exaggeration even
in the absence of social context.
Alternatively, implausible evidence of ability may appear for unmotivated reasons.
The false recognition of an advertisement despite warning (noted above), suggests that
participants had no motivation to misrepresent. Similarly, careless inattention to survey
responses would not indicate motivation to misrepresent an ability. Finally, perhaps
related, such misrepresentation may be, like any error, due to lower cognitive ability, or
simply poor self-awareness.
Exaggeration as Self-Enhancement
Motivated misrepresentation of competence may simply indicate self-enhancement, or
“tendencies to dwell on and elaborate positive information about the self relative to
negative information” (Heine & Hamamura, 2007, p. 4). Self-enhancement should thus
predict a bias toward claiming ability while denying inability or ignorance, leading to
exaggeration. This was the assumption of early uses of overstatement and overclaiming
tests, that people will overstate or overclaim their competence in an ego-enhancing way.
Self-enhancement is considered to have both a social, interpersonal dimension of
impression management, “the goal-directed activity of controlling information in order to
influence the impressions formed by an audience” (Schlenker, 2012, p. 542), as well as an
intrapersonal dimension of self-deception (Paulhus, 1984): We may be bluffing to others
and ourselves.
The personality trait most associated with self-enhancement is narcissism, named for
the mythological Greek youth Narcissus tragically obsessed with his own beauty:
“Narcissism is arguably the personality construct (and pathological disorder) most
fundamentally defined by chronic pursuit of self-enhancement.” (Wallace, 2011, p. 309).
For example, John and Robins (1994) compared self-perception of performance with peer
ratings and evaluations by a staff of 11 trained psychologists. Self ratings related less to
14
staff evaluations than did peer ratings, and showed substantial individual differences:
“people whose self-evaluations are the most unrealistically positive tend to be narcissistic”
(p. 215). Narcissism is the part of the Dark Triad of personalities (overlapping with, yet
distinct from, scheming Machiavellianism and antisocial psychopathy) distinguished by
self-enhancement (Paulhus & Williams, 2002). This personality trait is typically measured
by the Narcissistic Personality Inventory (NPI) and is considered to have multiple facets,
e.g. “Leadership/Authority, Grandiose Exhibitionism, and Entitlement/Exploitativeness”,
with the first being considered generally adaptive, and the last being most maladaptive
(R. A. Ackerman et al., 2011). This suggests that self-enhancement may have both helpful
and harmful aspects.
As discussed above, exaggerating to others may pay off, but why exaggerate with
nobody to impress? Back et al. (2010) note that the entitlement facet of narcissism is most
attractive at first impression while being most maladaptive in the long term. Thus,
short-term social rewards (boasting so strangers like you) may reinforce a dysfunctional
habit: Exaggerating your self to others may lead you to start believing it. More
importantly, exaggeration in the absence of social reward may be maladaptive.
Related to narcissism is overconfidence, an association that “remained significant in a
regression that included self-esteem, self-efficacy, and self-control in the model: for
narcissism, b = 0.33, t(99) = 3.16, p < 0.01” (Campbell, Goodie, et al., 2004, p. 302). In
that study (and most, as noted above), overconfidence is operationalized as overestimation.
Another, more interpersonal, form of overconfidence is thinking you are better than others,
more precisely called overplacement (D. A. Moore & Healy, 2008), also known as the
better-than-average effect. We know this effect is based on an illusion of superiority because
of the observed reality (yet mathematical impossibility) that more than half of people think
they are better than half the population for several abilities; individuals tend to place
themselves higher, relative to others, than objectively warranted.
In examining why humans overplace their abilities, Burks et al. (2013) compared
15
information-processing biases with social goals and found evidence only for the latter,
concluding: “it is natural to consider the possibility that the roots of overconfidence lie in
the value of over-confidence as a social signal” (p. 979). This effect, however, depends on
social comparison; how people view themselves and how they view others (Guenther &
Alicke, 2010), which introduces several situational influences. Nonetheless, while
overplacement and overestimation are conceptually and methodologically distinct, they
have both been labeled overconfidence, and Macenczak et al. (2016, Tables 1 & 2, p.
115-116) reports a correlation of r = .50 between the two, as well as positive (albeit
weaker) correlations between both those measures and narcissism.
Altogether, in terms of ego-motivated behavior, exaggeration may relate to
self-enhancement as impression management, self-deception, or narcissism, and
overconfidence as overestimation or overplacement. As a kind of solitary self-enhancement,
the tendency to exaggerate in non-social situations should be broadly maladaptive.
Exaggeration as Cognitive Bias
While the label “exaggeration” inherently connotes self-enhancement (as do the terms
overstatement, overestimation, or overclaiming) it is conceivable that incompetence may be
demonstrated with no motivation or goal, having little to do with ego or identity. A survey
respondent claiming to recognize an advertisement they had never seen may be merely
demonstrating a memory malfunction. Given that warning about the presence of foil ads
had no effect on claim rates (Smith & Mason, 1970), and that foils claiming is essentially
unaffected by warning (as noted above), such apparent exaggeration may not be
ego-motivated.
An ability well-known to be influenced by cognitive biases is memory, and probably
the easiest to study is recognition memory. Recognition memory involves distinguishing
stimuli that have been previously experienced from novel stimuli, e.g. given a list of words,
some of which are old (seen before) among others that are new, the ability to distinguish
16
old from new. For example, after reading a list of words (e.g. “person, woman, man,
camera, TV”; Baker, 2020), can someone identify them (among other distractors) later?7
Recognition memory bias, the tendency to identify new items as old, has been shown
to be a stable individual trait (Kantner & Lindsay, 2012; Kantner & Lindsay, 2014),
suggesting that people vary in false recognition reports. That latter paper noted similar
individual patterns for susceptibility to some bias manipulations, e.g. falsely claiming
having seen the word ‘sleep’ when trying to remember related words like ‘bed’, ‘rest’, and
‘night’.8
Cognitive psychology has demonstrated several techniques for manipulating
recognition memory bias, such as the use of discrepant fluency (Whittlesea & Leboe, 2003).
When testing recognition of items, if a new item is unexpectedly easy to process
(discrepantly fluent), it becomes easier to mistakenly think it has been seen before.
Intuitively, some minimal level of fluency is required for any useful exaggeration item: It is
unlikely anyone would claim to have read a book with an unpronounceable title. Extending
that logic, it may be that more fluent items facilitate more exaggeration.
Beyond an overall main effect of fluency, given individual differences in memory bias
(Kantner & Lindsay, 2014), individuals may also differ in susceptibility to fluency cues. For
any given item set, some people may be more prone to false recognition, increasing their
chances for exaggeration. How such cognitive traits relate to personality traits such as
self-enhancement has yet to be adequately studied. However, apparent individual
differences in levels of exaggeration may reflect, at least in part, individual differences in
recognition bias.
7 The Montreal Cognitive Assessment given to Donald Trump actually tested recall rather thanrecognition, but you get the point. Those words he “recalled” probably represented only what he could seeat the moment, not recollections from the test.
8 The Deese/Roediger–McDermott paradigm.
17
Exaggeration as Carelessness
If a job application asked “How often have you used the Wentzel Technique to solve a
budgetary problem?” (a fictitious job skill used by Levashina et al. (2009) to measure
faking, p. 274), someone might think “I have no idea what that is, but it sounds like I
should know it, so I’ll pretend I do”; affirming that foil as an expression of self-enhancement.
Or, someone may think, “I remember using some technique for a budgetary problem, and I
think it started with ‘W’, so that must be it”; sincerely but mistakenly affirming because of
recognition memory bias. Alternatively, someone may not think about the question at all
and carelessly affirm it. All three possible thought processes lead to the same behavior, the
active incompetence of claiming a foil, but for different reasons. To better understand the
careless component, we need to examine what we mean by “carelessness”.
While the concept of carelessness may cover several kinds of undesirable or
unintentional behaviors, for this research (because it used surveys to collect data), its most
relevant manifestation is as a source of invalidity in survey responding. This has long been
of interest to researchers, but has been difficult to clearly define, given that there are so
many possible reasons survey responses may not be what we expect. For example, Bond
(1986) argue that what had been labeled carelessness as a cause of inconsistent responding
to the MMPI9 may really be indecision, thus dramatically changing the interpretation.
Nichols et al. (1989) made the distinction between content nonresponsivity (e.g. ignoring
instructions) and content-responsive faking. Huang et al. (2012) more precisely referred to
“insufficient effort responding” (p. 99).
Meade and Craig (2012) found three fairly distinct latent classes (factors), painting a
multi-dimensional picture of carelessness. Part of the distinction is methodological, because
researchers have explored several ways to measure carelessness in survey responses;
DeSimone et al. (2015) provide an overview of several popular techniques (see their Table
1). Many of these operationalizations of carelessness are based on content of responses:
9 Minnesota Multiphasic Personality Inventory: a venerable, widely-used questionnaire.
18
The use of semantic or psychometric synonyms or antonyms makes assumptions about
which responses should be similar or oppositional in meaning or response patterns. A
related approach uses Mahalanobis distance to detect response patterns distant from the
multivariate normal distribution of all responses. These content-based techniques assume
that all respondents interpret the items similarly, which may lead to a subtle researcher
confirmation bias: unusual response patterns may get pathologized as careless, interpreting
diversity as deviance.
If we remove effects of (apparent) carelessness, are we improving data quality, or
limiting representativeness? Bowling et al. (2016) examined insufficient effort responding
and made the distinction between treating such behavior as a methodological nuisance (e.g.
errors to discount) and seeing it as a substantive variable indicating a trait-like, enduring
individual difference that, in fact, predicted academic performance. Using similar measures,
McKay et al. (2018) examined a wider range of personality traits and found that
malevolent traits showed a stronger relationship with carelessness. Their carelessness
measure having the strongest personality correlates in almost every case was the number of
incorrect responses to instructed items, e.g. not responding “strongly agree” when the
question explicitly said to do so. This response style, disregard for item content, may also
influence foils claiming. Further, M. K. Ward et al. (2017) examined careless responding
and attrition in completing online surveys and found personality correlates for both
measures, suggesting that participants who complete a survey carefully are a biased
sample. Using different measures of carelessness and personality, Furnham et al. (2015) also
found associations between validity of self-reports and personality.
Given that apparent carelessness may signal important individual differences (e.g.
exaggeration), how might we distinguish careless responses from the careless person? One
clear indication that a respondent is not paying attention to a question is when the
response is unreasonably fast. After informing participants that they would be answering
the same questions twice, Wood et al. (2017) found that consistency dropped sharply when
19
response time fell below an average of 1 second per item. While that study (and others)
used aggregate response times (e.g. time to complete a page of questions, or the whole
survey), that can be a poor measure, because it indicates the mean (average) time.
Cognitive psychologists, who regularly use response time measures, know that a better
indicator of central tendency is the median, not the mean, because distributions can be
highly skewed (Rousselet & Wilcox, 2020), e.g. a few very long response times can easily
pull the average away from the peak of the distribution.
Because carelessness (however interpreted) may reflect individual differences relevant
to exaggeration, the current research will take the approach of analyzing all complete
response sets, i.e. make no exclusions due aberrant response style. Where carelessness is
measured, it will involve median response time over several items within individuals.
All of the Above
Each of the three speculated mechanisms above may contribute independently to the
behavior of exaggeration: self-enhancing motivation to misrepresent, cognitive bias in
internal representation, and / or careless disregard for accurate representation.
How might these work together to explain exaggeration behavior? First, there must
be some cognitive fluency to facilitate misrepresentation: An opportunity to volunteer
incompetence must be believable, e.g. it’s easier to imagine covfefe is a word than cffveeo
is. Similarly, a lure (incorrect) option in a multiple-choice test should seem correct to some
test takers. The more fluent or believable a claim is, the more self-enhancing motives can
manifest. A similar interdependence could work for carelessness: It takes more inattention
to claim something unpronounceable.
How might these influences be teased apart? One clue might be processing time:
Self-enhancing misrepresentation requires attentive processing to determine the most
positive presentation, whereas a careless claim can be done hastily. Another clue might be
found in differential item responses: Carelessness should affect all items whereas
20
exaggeration may be more apparent when volunteering incompetence.
None of the Above
Finally, the answer may also be none of the above; there may be other reasons people
exaggerate their abilities. One possibility is the “unskilled and unaware” Dunning–Kruger
effect, which posits that lower ability leads to greater error in self-estimates (Kruger &
Dunning, 1999). This effect has been criticized as an artifact of the better-than-average
effect and statistical regression (Krueger & Mueller, 2002), and evidence on foils claiming
shows the opposite effect. Atir et al. (2015) found positive relationships between knowledge
foils claiming and both genuine and self-perceived knowledge: Knowledge exaggeration
apparently increases when one thinks they know more, genuinely or not. P. L. Ackerman
and Ellingsen (2014) specifically tested this hypothesis, and found that unwarranted claims
of vocabulary knowledge increased with validated knowledge, noting that this was in
opposition to the Dunning–Kruger effect.
In a more general sense, beyond specific skills or knowledge, exaggeration could be a
side-effect of lower general cognitive ability, simply a sign of lower intelligence, so this
should be considered as a potential influence. Along the same lines, poor metacognition
(awareness of one’s thinking processes) may also play a role, given that metacognition
predicts academic performance beyond general intelligence (Ohtani & Hisasaka, 2018). Or,
exaggeration may be a side-effect of complex cultural biases in self-presentation.
Measuring Exaggeration
Probably the most common method for measuring individual differences is to ask
participants a series of question in a survey, so that technique has been employed for the
current research. The two crucial design components of such a survey are the items (e.g.
questions) and the analytic techniques used for scoring responses to those items. Both item
design and analytic choices can influence the integrity of a psychological instrument, so let
us consider how they may pertain to measuring exaggeration.
21
Item Design
For an overstatement test, the items are competency tests with a specific, correct
answer. Including a non-claiming option, such as “I can’t” or “I don’t know”, allows for
three outcomes, success, failure, and not claimed, yielding two degrees of freedom, and thus
two potentially independent variables. While the design of such ability tests can involve
many sophisticated decisions (e.g. 31 guidelines for writing multiple-choice tests, Haladyna
et al., 2002), the most critical issue is that there is one objectively correct answer, with
other possible answers being unequivocally incorrect. For example, if asked who won the
2020 U.S. presidential election, claiming “Trump” would be objectively wrong. That wrong
answer, however, would make a good lure on a multiple-choice test because it represents a
common error, making the question more discriminating.
There is considerable literature on how to best design such objective ability items
(e.g. Haladyna et al., 2002), so we will not consider that here. For overclaiming items,
however, there is a poverty of research on item design.
Consider this overclaiming question on geography: “Which of the following are
countries in Africa? Nigeria, Eswatini, oLpx3w, Nambia, Zanzibar.” Claiming the first
option is a reasonable indication of geography competence, Nigeria being a well-known
name and the most populous nation of the continent. The second option is also correct, but
Eswatini is much smaller and less famous (being formerly known as Swaziland), so claiming
it may indicate a higher level of knowledge, although it may be claimed for other reasons.
The third option (oLpx3w) is a foil, but unlikely to be claimed by anybody paying
attention because it is unpronounceable and disfluent. Claims for such an item might
indicate extreme carelessness, which could be useful for discarding the rare uncooperative
respondent, but unlikely to capture exaggeration. The fourth option, Nambia,10 is also a
foil, but discrepantly fluent or “truthy” (Newman et al., 2012) because it is similar to
10 Which Trump claimed was an African country:globalnews.ca/news/3760873/donald-trump-nambia-namibia/
22
names of genuine countries (Namibia and Zambia) and easy to read. Such truthiness11
could facilitate recognition ambiguity and thus potentiate exaggeration. Claiming the fifth
option would also be technically incorrect, because Zanzibar is not a country but a
semi-autonomous region of Tanzania in Africa. Having partial knowledge may lead to
mistakenly claiming something a more ignorant person wouldn’t: It is quite possible that
claim rates for foil item Zanzibar would be higher than rates for real item Eswatini. Is it
reasonable to assume that the only meaningful distinction between those five items is that
two are reals and three are foils?
Such differences are not always accounted for in research. After assessing items for
flagging carelessness in survey responses, Meade and Craig (2012)12 decided that “I do not
understand a word of English” and “I sleep less than one hour per night” both identified
10% of respondents as careless. Yet, while the former would be an impossible claim (as
intended), the latter might reasonably be claimed by people exaggerating their insomnia.
While agreeing to “All my friends say I would make a great poodle” was considered a valid
indicator of carelessness (ignoring metaphorical interpretations), claiming “I have never
spoken to anyone who was listening” was discarded as an indicator because of unexpectedly
high levels of agreement. Clearly, even if designed for the same purpose, foils are not
always interpreted as expected.
Foil design for capturing exaggeration presents a paradox; the item should represent
something nonexistent, yet still be alluring to a potential exaggerator. In the above
example, claiming “Nambia” would more likely indicate exaggeration than would “oLpx3w”,
which may only indicate extreme inattention or disregard. However, is “Zanzibar” truly
capturing exaggeration, since it might be claimed by people with partial knowledge?
Researchers wanting to capture exaggeration more than carelessness face a treacherous
temptation to create items of ambiguous validity, but this approach can backfire.
11 As used by Stephen Colbert: en.wikipedia.org/wiki/Truthiness.
12 Table 1 on page 5 of that article lists these items.
23
As an example of inappropriate foil design, Fell and Konig (2018) attempted to
measure “Academic Faking in 41 Nations” by asking secondary school students around the
world to rate their knowledge of terms from mathematics. Among those were three
fabricated terms (foils), one of which was “proper number”, which is very similar to the
genuine math concept of “proper fraction”, especially if one considers fractions as numbers.
Their data13 show that this foil item empirically behaved more like a real math term, with
more claims of knowledge than ignorance, suggesting that many students appropriately
recognized the concept and graciously allowed for some ambiguity in expression.
Interpreting such partial knowledge as faking seems unjustified.
Another example is the use of “ultra-lipid” as a foil for capturing exaggeration of
science knowledge (Paulhus & Bruce, 1990). Unfortunately, the term can be found via
Google search to be a genuine term used to market cosmetics and in an article in
Comparative Clinical Pathology (Safat et al., 2018). Claiming it may not indicate the
knowledge the researchers had imagined, but it still may indicate knowledge more than
exaggeration.
As those examples illustrate, the real / foil distinction is less categorical than
continuous, an issue not adequately addressed in existing research. Foils with the highest
claim rates may be altered, creative, unofficial (e.g. slang), or rare indicators of genuine
competence, just not what the researchers expected. At the same time, foils must be
seductive enough to avoid floor effects in claiming. For example, over a range 0 to 4,
Bynum and Davison (2014) reported both mean and standard deviation of foils claiming
being 0.51, suggesting compressed variance which would limit power of the measurement.
When foils claiming reaches zero, how is exaggeration measured?
The choice of real items also presents challenges. Without an objective test (as with
overstatement), there is no assurance that a real item is claimed based on ability rather
13 Available as “Codebook for student questionnaire data file” atwww.oecd.org/pisa/pisaproducts/pisa2012database-downloadabledata.htm. See item ST62Q04.
24
than exaggeration. Making real items too easy could lead to ceiling effects, leaving the foil
items conspicuous by contrast. Alternatively, if reals are too difficult, some effectively act
as foils, and this distinction will vary by individual ability.
Ideally, meaningful claiming of real and foil items should show some distinction. For
convergent validity, reals claiming should correlate with valid demonstrations of ability,
while foils claiming should relate to errors of commission, e.g. choosing a wrong answer
instead of admitting ignorance. As divergent validity, while reals and foils claiming may
necessarily relate (given the evidence discussed above) the overlap should be small.
The historical use of foils and the above discussion highlights several potential reasons
for claiming foils. The difficulty is that real items may be claimed for the same reasons, in
addition to indicating competence. Even with optimal item design, care must be taken in
analysis to disentangle these shared influences.
Analytic Issues
Both the overstatement and overestimation approaches provides a clean,
well-accepted measure of actual ability: the number of successes, or percent correct.
However, as noted above, difference scores from either approach do not separate
exaggeration of ability from the ability itself.
With overclaiming, one can easily calculate claiming rates for both reals and foils,
knowing there is no methodological constraint linking these two measures. On the surface,
this might seem ideal: Reals rate indicates ability and foils rate indicates exaggeration.
However, without some mechanism to validate claims on reals (as done with overstatement
or overestimation), how do we know which reals claims are not exaggerations? Likewise,
how do we know that foils claiming is not related to competence?
P. L. Ackerman and Ellingsen (2014) addressed these questions, within a larger goal
of testing accuracy of self-estimates of vocabulary ability. Kirkpatrick (1907) had
developed a simple vocabulary test in which respondents marked a ‘+’ or ‘−’ beside a list
25
of 100 words to indicate which they knew or did not, respectively, then, without warning,
tested understanding of words marked as known. P. L. Ackerman and Ellingsen (2014)
built on this method, which is essentially an overstatement test, because foils were not
used. However, the term overclaiming was used for claiming knowledge of a word that
could not be adequately defined in the later test. Such active incompetence claims were
called false alarms while claims later validated on the test were called hits. The researchers
reported that overall knowledge claims correlated similarly with hit rates (r = .79) and
false-alarm rates (r = .79), and that hit and false-alarm rates also correlated significantly,
at r = .24 (all p < .01). This would suggest that self estimates of ability are fairly accurate
but also influenced by exaggeration, and that exaggeration increases slightly with ability.
The researchers note that this finding is in opposition to the well-known “unskilled and
unaware” Dunning–Kruger effect which posits that lower ability leads to greater error in
self estimates (Kruger & Dunning, 1999). Instead, self-image error may increase with
competence.
Consistent with this, Atir et al. (2015) found that higher self-perceived knowledge
predicted more foil claiming, i.e. greater confidence meant more overclaiming of knowledge.
This relationship existed even when controlling for level of knowledge, and when being
warned of the presence of foils. By manipulating self-perceived knowledge via an either
easy or hard pre-test, they also showed a causal relationship: The easy test increased
subject confidence and overclaiming.
The relationship between reals and foils claiming may be even more complicated: In
an examination of faking in a genuine job application (with warning that faking could be
detected and penalized), Levashina et al. (2009) introduced three foil items (e.g. asking
applicants how often they have used a fictitious technique) and found that impossible
ability claiming increased with genuine claiming, but was negatively related to mental
ability. Yet, as number of foils endorsed increased, so did the positive relationship between
genuine claiming and both job knowledge and verbal ability (from about r = .20 to
26
r = .40). The paper concluded that “job candidates with higher levels of mental ability
might fake in less detectable ways” (p. 279).
Clearly, the behavior of claiming foils is not always independent of the claiming of
real items. Claiming of either reals or foils may reflect any of the factors noted above
(self-enhancement, cognitive bias, carelessness, etc.), appearing as an indiscriminate
response bias.
For overstatement or overestimation approaches, there may be similar issues
confounding genuine and exaggerated claims. Ability claims may also be susceptible to
fluency effects, carelessness, partial knowledge, or poor item design. In a multiple-choice
test, number correct will be affected by chance. Difference scores used for overestimation
(and sometimes for overstatement, e.g. Brogden, 1940) have long been criticized (e.g. Peter
et al., 1993; Edwards, 1994).
A Unified Approach to Assessing Exaggeration
The above discussion summarizes how different methodologies — overstatement,
overestimation (a form of overconfidence), and overclaiming — have all simultaneously
gathered both evidence for competence and mistaken self-image, yet we can note an
interesting contradiction. The overstatement and overestimation techniques produce results
suggesting that the discrepancy between imagined and actual ability decreases with
competence, e.g. the r = −.69 found between overestimation and performance on Raven’s
Progressive Matrices (Duttle, 2016). However, overclaiming approaches (e.g.
P. L. Ackerman & Ellingsen, 2014; Atir et al., 2015) tend to find a positive relationship
between the active incompetence of foils claiming and genuine competence (assessed
independently). Does exaggeration, mistaken self-image of ability, decrease or increase with
genuine ability? The contradictory results found in the literature may be a result of not
properly isolating exaggeration from competence.
To address this, and to integrate those methodologies, this paper proposes a unified,
27
linear regression approach for measuring exaggeration:
1. In the same test, gather repeated evidence of competence (e.g. correct answers, reals
claiming), and active incompetence (e.g. incorrect answers, foils claiming).
2. Statistically remove common variance by finding the residuals of predicting
incompetence from competence.14
Let the resulting measure be called the Residualized Exaggeration Index (RExI). The
idea here is that there may be many common influences driving expressions of either
competence or incompetence, such as exaggeration, cognitive bias, partial knowledge,
carelessness or other response bias. The residuals capture what is not common to the two
measures, but what is unique to the behavior of active incompetence. The RExI is thus
guaranteed to be uncorrelated with evidence of competence. In this way, it represents
exaggeration of ability unrelated to the ability being exaggerated.
It is important to note that what the RExI captures, while conceptually related to the
connotations of exaggeration, overconfidence, overstatement, overclaiming, etc., is more
precisely the error variance of self-perceived competence, which may arise from various
causes. Thus, the RExI is more a technology to isolate useful information about
self-perception than a theory-driven operationalization of a hypothetical construct. This
bottom-up approach avoids some potential researcher biases: The goal is not to validate a
theory as much as to understand a behavior by isolating its effects. While the RExI serves
as a standalone measure of individual differences, for regression modeling, simply include
competence evidence as a control variable; the standardized β for active incompetence then
indicates the influence of exaggeration.
The RExI can be extracted from any overstatement test by finding the residuals of
predicting the number of failed attempts from the number of successful attempts.
14 More precisely, here is computer code for calculating the index, using the statistical programminglanguage R (R Core Team, 2020):RExI <- resid(lm(scale(Incompetence) ∼ scale(Competence), na.action = "na.exclude"))
28
Furthermore, any objective test (e.g. a math quiz or multiple-choice test) can be converted
to an overstatement test by adding a non-claiming option to each question, e.g. the option
to respond “I don’t know”. This requires the test taker to self-assess their specific
competence at the moment they are addressing each question.
Similarly, for overestimation, find the residuals of predicting self-estimated number
correct from actual number correct.15 For both overstatement and overestimation methods,
the RExI will, by definition, be uncorrelated with demonstrated ability. This allows the
conventional ability measure (e.g. number correct on a test) and the RExI to both be used
as independent assessments.
(One might logically consider using a complementary approach for assessing
competence, e.g. finding the residuals after predicting number correct from estimated
performance. The complication is that this residualized ability measure is no longer
uncorrelated with the RExI, meaning that one no longer has separate measures of
competence and exaggeration. Given that the conventional estimate of competence,
number correct, has shown widespread utility and acceptance for over a century, there is
little reason to change that now.)
To assess exaggeration with an overclaiming inventory, find the residuals of foils
claiming rate predicted from reals claiming rate. This addresses the several issues with foils
noted above, because common influences to claiming any item (i.e. bias) is removed.
Unlike overstatement, however, an overclaiming approach does not provide a verifiable
measure of competence unrelated to exaggeration.
Residuals have been used in other research to remove confounding influences. An
influential example in the study of self-enhancement is the work of John and Robins (1994),
in which their self-enhancement indeces are residuals of self-ranking after removing
variance from ratings by peers or expert observers (Table 6). That research, like much on
15 However, as noted above, overestimation is a far less psychologically direct method of capturingimagined ability, and so not recommended for capturing exaggeration.
29
self-enhancement, compares self-perceptions (S) against perceptions of others (P ), and/or
with others’ perceptions of the self (O).
Krueger and Wright (2011) thoroughly discuss the many challenges arising from
deriving self-enhancement from those three measures, and from various analytic approaches
to combining them, including use of difference scores and residuals. That work considers
two contexts for measuring self-enhancement: an intrapersonal comparison of self to
perceived others known as social comparison theory (Festinger, 1954; Suls & Wheeler,
2013), and an interpersonal comparison using an observer-based paradigm, the social realist
approach (Funder, 1995; Kenny, 2004). The former frames self-enhancement as thinking
myself better than how I perceive others (S − P ), while the latter considers how I see
myself compared to how others see me (S −O). In both cases, there is a discrepancy to
measure, but from different reference points. The authors note that the social comparison
theory considers self-enhancement as beneficial (the Taylor and Brown hypothesis), while
social realist theory sees it as detrimental.
Curiously, that work introduces reality measures (R) such as test scores without
considering if the psychology of S −R self-enhancement differs from the S − P or S −O
framings. The S −R discrepancy is what exaggeration captures, avoiding the biases and
errors inherent in P and O measures which are enmeshed in social comparison. The lack of
social context for exaggeration suggests it may not fully fit under the umbrella of
self-enhancement, at least as conventionally studied.
More relevant to the current research is the inflation approach used by Anderson
et al. (1984). That research administered examinations to job applicants that included
self-assessments on a variety of job skills, some of which were nonexistent bogus (foil)
items. This was essentially an overclaiming test of job skills, and in their final analysis they
used linear regression to predict an objectively-measured job skill (typing performance)
from the two types of skill claiming, showing incremental validity from the foils claiming,
greater than the predictive validity of reals claiming alone. That study (like many dealing
30
with bias in self-report) focused on correcting estimates of some criterion, rather than using
the index to measure a separate psychological process.
Claiming that the RExI approach is fundamentally new would clearly be overstating
the case. However, previous literature tends to not consider exaggeration as a distinct
phenomenon, presumes it represents some pre-determined theoretical construct, or fails to
measure it cleanly. A goal of this paper is to present evidence that exaggeration deserves to
be examined separately from existing constructs of self-enhancement or cognitive function.
Exaggeration may be a functional conglomeration of several constructs, but it is worth
remembering that constructs are just theories, and exaggeration manifests as a reliable
reality. Investment in theory may explain why both overestimation and overstatement
literatures have persisted for so long with little recognition of their inherent contradictions.
Ironically, an exaggerated sense of knowing may have kept researchers from exploring what
exaggeration is.
The RExI thus provides a methodological integration uniting overstatement,
overestimation and overclaiming approaches, providing a comparable measure of self-image
error distinct from competence and other common influences.16 Armed with this technique,
we can explore the impact exaggeration has on more global performance, and what factors
may relate to it.
Current Research
If the RExI addresses contradictions in previous approaches, those approaches cannot
be used to consistently validate the RExI. If previous results were influenced by
competence, then removing that influence may result in weaker or null effects. Ability
tests, from school exams to IQ assessments, have a well-established history of predictive
validity using the number correct as the signal. Because the RExI removes such signal, it is
entirely possible that what is left over is essentially noise. The central research questions,
16 An approach to overclaiming purporting similar distinction, called the Overclaiming Technique (OCT), isdescribed in the Appendix.
31
then, are whether there is any useful new signal in the RExI, and if so, whether it is easily
explained away, or is something new.
As an initial exploration, the current research relies on convenience samples of
undergraduate students. For this population, a relevant ability to study is knowledgeability,
the ability to answer simple questions of fact. To be ecologically valid, the main outcome or
dependent variable (DV) used here is academic performance, which captures not just
knowledgeability in general, but a broad range of skills and abilities relevant to success in
life. The breadth of this DV means that expected effects should be small, but if still
significant, would indicate a meaningful relationship with broad implications.
Validation Criteria
To show that a measurement captures something useful, we need to show that it a)
has expected similarities (convergent validity), b) has expected differences (divergent
validity) and, c) tells us something we didn’t already know (incremental validity).
Convergent Validity. Following the connotations of the terms exaggeration,
overstatement, overestimation, overclaiming or overconfidence, we should expect that the
RExI, representing error in self-image, should indicate impaired performance of the ability
being exaggerated. Additionally, because the discrepancy captured is in excess of
competence, this should relate to self-enhancement, and, such discrepancy may be
facilitated by cognitive biases.
Broader Performance. An unrealistic view of one’s ability should predict
impairment of that ability. This is the logic behind preventing drunk driving: Even though
an inebriated driver may not have caused harm (yet), their exaggerated sense of ability to
drive predicts potentially catastrophic performance failure. Similarly, for students,
knowledge exaggeration should predict lower knowledge (academic) performance. For
exaggeration to be meaningful, it should generalize: Someone who can’t walk a straight line
probably can’t drive a car. Likewise, exaggeration of knowledge in a narrow domain should
predict impairment of broader academic performance, ideally, even if the knowledge being
32
exaggerated does not. Thus, when given even a trivial knowledge test, exaggeration
demonstrated there should predict lower academic performance overall.
Self-Enhancement. Beyond performance impairment, to fit an intuitive notion of
exaggeration, the RExI should align with self-enhancement: an exaggerated, narcissistic
sense of self. While self-enhancement is a fairly broad construct typically assessed via
self-reports, the RExI, being a behavioral measure, may relate in only some narrow, specific
ways. If exaggeration predicts performance impairment, it should relate more to
maladaptive aspects of narcissism, such as entitlement, perhaps because one believes they
deserve success. Exaggeration as an unrealistically positive self-view should also relate to
overconfidence as overplacement, i.e. seeing oneself as better-than-average.
Cognitive Bias. A less motivational and more “innocent” explanation of
exaggeration may be bias in information processing. Of the several heuristics that veer
from rational expectations (e.g. Kahneman et al., 1982), recognition memory bias is a good
starting point to compare with exaggeration, given the memory error findings noted above.
Alternatively, performance on a memory test may exhibit exaggeration as would any other
ability, which should have similar relationships.
Divergent Validity. While relationships between a RExI and performance and
self-enhancement would confirm an intuitive understanding of the measure, and
relationships with cognitive bias help explain some of the mechanism, such convergent
validity only paints part of the picture. The boundaries of the picture, evidence of
divergent validity (i.e. what exaggeration is not), should also be considered. Because the
RExI is based on residuals after removing competence variance, it may be influenced by
other, unexpected factors.
Carelessness. Carelessness, the ever-present threat to any survey validity, may
appear as exaggeration. Simple lack of attention can lead to invalid responses, and such
behavior could contaminate any measure, especially the RExI, because it removes variance
attributable to competence. However, carelessness as a substantive variable, an enduring
33
individual difference (Bowling et al., 2016), may explain part of exaggeration, and should
replicate that paper’s finding, predicting lower academic performance. Thus, carelessness
may be a meaningful component of exaggeration, but should not be the dominant one.
Other Explanations. If exaggeration affects performance, then it should not be
easily explained by other obvious predictors of performance. For the relationship between
knowledge ability and academic performance, the RExI design rules out influence from the
knowledge being exaggerated. Beyond that, general cognitive ability should also be ruled
out to show that exaggeration is not simply a side-effect of lower intelligence. Following
that logic, metacognition (awareness and management of cognitive processes) should also
be ruled out, as that is also a reliable predictor of academic outcomes (Ohtani & Hisasaka,
2018).
Cultural effects in psychology are often overlooked, leading to poor inferences of
generalizability (Henrich et al., 2010). While it is far beyond the scope of this paper to
consider the many known differences between cultures, given that the convenience samples
used in the current research are all university undergraduates in Canada, perhaps the most
relevant distinction is between Western and non-Western cultural backgrounds. That
difference, and sex, are two control variables considered in all studies.
Incremental Validity. If the impact exaggeration has on performance can be largely
explained by variables considered above, then the behavior of exaggerating one’s ability
will be better understood. If not, then the RExI may represent something distinct worth
further exploration. Because overall cognitive ability and memory performance should
logically affect academic performance, and substantive carelessness has also been shown to
lower academic performance (Bowling et al., 2016), all these should be considered in
examining the relationship between knowledge exaggeration and knowledge performance. If
a distinct relationship holds, even after further control for sex and basic cultural variables,
that would suggest that the RExI captures an important, but overlooked, non-cognitive
variable explaining academic performance.
34
Study 1: Proof of Concept
The main goal of this study was to establish that the RExI captured information
relevant to performance. Because insufficient effort responding (IER) has related to lower
academic performance (Bowling et al., 2016), this study sought to minimize such influence
by design. By using only students intrinsically motivated to complete the study for no
other reason than feedback about their personality, this initial exploration selected only
participants who, ostensibly, cared about their results and thus their responses. Some basic
personality, cognitive and metacognitive measures, were included as controls.
Study 2: Validating the RExI
Study 2 was designed to replicate and extend Study 1, using better measures and a
larger, broader sample. Overall university Grade Point Average (GPA) was used to measure
academic performance more broadly, accurately, and reliably. Exaggeration was derived
from a large, popular inventory of overclaiming items, self-enhancement captured through
measures of narcissism, impression management, self-deceptive enhancement and
self-deceptive denial, and recognition memory was tested with a large battery of items. To
examine how exaggeration, or its relationship with performance, overlaps with general
cognitive ability, a commercial IQ test was included as a control measure.
Study 3: Developing Better Measures
Having validated the RExI approach by re-purposing an existing overclaiming
inventory (in Study 2), Study 3 tested instruments designed specifically to capture
exaggeration. To capture a relevant, broad ability that university students might want to
exaggerate, English vocabulary was chosen as a knowledge domain to assess. An
overstatement test was developed by adding a non-claiming option to a commonly used
multiple-choice test. Addressing the issues raised above about overclaiming item design,
techniques informed by computational psycholinguistics and cognitive psychology were
35
employed to develop overclaiming items optimized for measuring exaggeration. Both of
these new instruments were empirically examined to confirm their suitability for measuring
exaggeration.
Study 4: Robustness of the RExI
Study 4 was designed to replicate and extend Study 2 by examining multiple different
measures of knowledge exaggeration. Retaining a briefer version of the exaggeration
measure used in Study 2 for comparison, the novel instruments from Study 3 were added to
see if the RExI could capture exaggeration similarly across different abilities, content, and
format. Hypothetically, all exaggeration instruments should show similar relationships with
other relevant measures.
To better understand what exaggeration means, self-enhancement aspects were more
precisely targeted as entitlement, overplacement, and intellectual humility. To examine the
link with carelessness and cognition more closely, special software was developed to capture
individual item response times, and implement a novel technique to detect motivated
carelessness, as persistent, intentional rushing of responses.
Altogether, the following studies examine how the RExI approach of separating the
effects of self-image from competence, the exaggeration of ability from the ability being
exaggerated, may yield a more accurate picture of what is connoted by “overconfidence”
than have previous attempts using overstatement, overestimation or overclaiming.
36
Reporting Conventions
Throughout this paper the following conventions for statistical reporting are adopted
and explained here for convenience.
Correlations are Pearson product moment, which are equivalent to point biserial when
one variable is dichotomous. Statistical significance is always two-tailed and is marked as
follows in text: *p < .05, **p < .01, ***p < .001. Group mean differences are shown using a
conservative t-test (assuming unequal variance, estimated separately for both groups, using
the Welch modification for degrees of freedom), followed by effect size (Cohen’s d). 95%
Confidence Intervals are shown in square brackets. Regression models always show
standardized beta (β) coefficients in order to compare the relative impact of predictors.
For thoroughness, this paper presents some large correlation tables, with hundreds of
elements. To facilitate compact representation and visual distinction, results shown in these
tables follow a different convention. Statistical significance is indicated by font intensity:
p >= .05, p < .05, p < .01. Where appropriate, Cohen’s α is shown in italics on the
diagonal for unidimensional measures of more than two items. For RExI measures, a similar
measure of internal consistency was calculated by correlating RExIs derived from half the
items with the same from the other half. These halves were randomly selected 1000 times
and the correlations averaged via Fisher transformation to estimate overall reliability.
In correlation tables, to concisely describe distributions, the M (SD) column reports
the mean (standard deviation) of data that has been normalized to a range of 0 to 1. This
choice is similar to the percent of maximum possible (POMP) approach advocated by
Cohen et al. (1999). That 0 to 1 range represents the theoretical limits of bounded
measures (e.g. 0 to 100 for grades, 1 to 7 on a Likert scale) and empirical extremes
otherwise. By scaling all data to the same range (for table reports), this convention allows
easier comparison of distributions and better appreciation of skew and dispersion. Thus,
(for example) standardized distributions (e.g. RExI measures) which are centered on zero
will show a positive mean here, which then indicates how far the center of the distribution
37
is from the extremes (0 and 1), providing information about skew that is commonly
overlooked.
Throughout, gender / sex measures have been collapsed to dichotomous, with 0
representing identification mostly as female, and 1 representing identifying mostly as male.
Similarly, “Native English” is 1 if English was reported as a first language, 0 otherwise, and
culture variables are 1 if 10 or more years lived in English / Western countries, 0 otherwise.
For all studies using student populations, these basic demographic measures were used as
controls, but age was not recorded because such variance is often small with potentially
misleading outliers, and ethical considerations recommend against collecting unnecessary
personal information.
All data was gathered via the Qualtrics survey platform (www.qualtrics.com), with
analysis done using the R statistical programming language (R Core Team, 2020) in the
RStudio development environment (RStudio Team, 2019), using LATEX for document
preparation (Lamport, 1986).
38
Study 1: Proof of Concept
Unlike most research using overstatement, overestimation (overconfidence), or
overclaiming approaches, the RExI removes evidence of ability from the measure of
exaggeration of that ability. The primary goal of this first study was to determine if this
residualized index captured useful information about broader performance of ability.
Within practical limits, this study also included control measures of cognitive and
metacognitive abilities, general personality, and basic demographics.
While carelessness is an important validity concern in any survey-based study, it
remains a somewhat nebulous concept lacking convenient, reliable, valid measurement, as
discussed earlier. The approach taken for this initial study was not to atomistically
measure carelessness, but rather to wholistically, ecologically minimize it, using motivation
and accountability.
Typical psychological experiments solicit participants by offering some financial
compensation or, as is common for many convenience samples of undergraduate students,
course credit. A long line of research has suggested that “tangible extrinsic rewards tend to
undermine intrinsic motivation for rewarded activities” (Deci et al., 2001, p. 1) which
could, ironically, mean that common research procedures actually encourage careless
responding: Any intrinsic desire to participate in research might be compromised by
framing it as paid labor. Both McKay et al. (2018) and M. Ward and Meade (2018) argue
that intrinsic motivation is important for attentive, accurate survey responding.
Another influence is accountability: After reviewing several techniques for dealing
with careless responding, the first recommendation of Meade and Craig (2012) was to
systematically reduce situational carelessness by removing anonymity and using identified
surveys. It is fairly easy to increase a sense of accountability simply by asking for
identifying information. However, how might intrinsic motivation be increased, especially
among students already overwhelmed by surveys from unknown researchers?
Of the many innovations flowering in the garden of the World Wide Web, free
39
“personality” tests appear to be persistent and flourishing weeds. People happily answer
questions in order to learn, for example, “What Type of Dragon Are You?”17 (or myriad
other mythical beings) in the pursuit of dubious self-knowledge. Researchers have taken
advantage of this intrinsic motivation to do extensive legitimate personality research online
at low cost (Joinson et al., 2007). For the present research, if someone completes a survey
solely to find out what their answers mean, they probably care about the responses they
give.
Additionally, only participants who submitted their email address in order to receive
personalized feedback were included. Thus, by relying only on intrinsically motivated,
explicitly accountable respondents, this study was designed to minimize the carelessness
typically found in convenience samples.
This design, however, comes at a cost. The survey had to be reasonably brief, and
focus on measures from which meaningful feedback could be given. (There is also the
burden of delivering on the promise of personalized feedback.) Nonetheless, it served the
purpose of conservatively testing if the RExI captures useful information.
Method
All data was gathered over the course of one term via online survey. Median recorded
completion time was 21.2 minutes, with 90% completing in 45.8 minutes.
Invitations to participate in an online survey were given to three local undergraduate
science classes (roughly 594 from Biology, 1004 from Chemistry, and 151 from Earth and
Ocean Sciences), with clear instructions that participation was completely voluntary, with
no compensation (e.g. cash or course credit) offered. The only incentive was that the study
was designed to measure aspects of “Academic Personality” with the promise of
personalized feedback for any student who completed the survey and provided their email
address. The pitch was that the survey would measure individual differences relevant to
17 A Google search of that question yields hundreds of millions of results:www.google.ca/search?q=What+Type+of+Dragon+Are+You
40
scholastic success and then tell students where they stood compared to others. (This was
not an empty offer: Personalized feedback was delivered at the end of data collection and
response to that, though meager, was entirely positive.)
Participants
The offer was successful in getting 537 students to show interest in the survey, but a
considerable portion did not persist to reasonable completion or provide an email address
for personalized feedback, leaving 316 usable records. This sampling, of course, represents a
fair degree of self-selection bias, but in the intended direction: These students cared enough
(at least about themselves) to complete the survey solely for the purpose of seeing their
results. Arguably, such self-selection may not be representative, and respondents may even
be misrepresenting themselves (in hopes of a flattering self-portrait) but it is very unlikely
they were careless, at least by choice.
Overall, 49% of the students reported English as a first language, 29% identified as
male, and 74% reported having lived 10 or more years in English-speaking countries (each
of those variables being dichotomous). Of the three courses, 58% were in 1st year biology,
31% were in 2nd year chemistry, and 12% were in 3rd year earth and ocean sciences
(rounding created a sum > 100). Age was not solicited.
Measures
To deliver on the promise of assessing “Academic Personality”, some measures were
included in order to provide meaningful feedback but are not reported; analysis including
those variables did not change results presented here.
Demographics. Participants were asked which gender they most closely identified
with, the age they became fluent in English, and the number of years lived in
English-speaking countries. These were all collapsed to dichotomous variables as described
in Reporting Conventions.
41
Course Grades. Instructors of the three science courses provided final course grades
after the term had ended. In procedures approved by the institution’s ethics review board,
survey respondents willingly provided their unique student numbers so that records could
be merged. To compensate for differences in grading between courses, grades were
standardized within each course. This was the central dependent variable, a measure of
academic performance.
Self-Report Secondary School GPA (SS GPA). Students were asked “As a number
out of 100, what was your overall high-school average (your GPA)?”.
Cognitive Reflection Test (CRT). The original 3-item test by Frederick (2005) was
expanded to 7 items by Toplak et al. (2014), all of which were used here. The original “A
bat and a ball cost $1.10 in total” question, having become so well known (and
economically out-dated), was re-worded to “A graduation cap and gown cost $110 in total”
to avoid cuing suspicion. Altogether, these conceptual math problems provide two scores:
number correct (a brief measure of cognitive ability relevant to science ability) and number
of incorrect intuitive answers given (a measure of cognitive impulsivity or lack of reflection,
possibly relevant to exaggeration). Note that these two scores will necessarily correlate
highly negatively.
Overestimation. After the CRT questions, respondents were asked to indicate how
many of those questions they thought they got correct. Overestimation was calculated as
the excess of that self estimate over number correct.
Personality. Probably the best-known short measure of the Five-Factor model of
personality is the Ten-Item Personality Inventory (TIPI) (Gosling et al., 2003). Agreement
/ disagreement with items was gathered by using a 6-step Likert scale to avoid the
ambiguity of mid-point responses. Here it provided a basis for feedback to students and for
examination of personality correlates of knowledge exaggeration.
Metacognition. A subset of 22 items from the Motivated Strategies for Learning
Questionnaire (MSLQ) developed by Pintrich (1991) were selected based on their reported
42
correlations (|r| >= .22) with school grades and used as a general measure of academic
metacognitive skills. Individual aspects (e.g. Self-Efficacy for Learning, Metacognitive
Self-Regulation, Test Anxiety) also provided information for student feedback.
Eight items on Fixed versus Growth mindset (Dweck, 2006) were also included in
order to provide feedback to students, based on Dweck’s evidence that more Fixed
mindsets may inhibit learning. Hypothetically, one might expect a Growth mindset to be
more willing to admit ignorance and thus exaggerate knowledge less.
Knowledge Exaggeration. An overclaiming inventory of 50 words (25 genuine
English words of moderate difficulty and 25 foils) that had been useful in previous
(unpublished) studies on overclaiming was employed here to measure knowledge
exaggeration. Real word meanings were not particularly related to science or any academic
field, being chosen simply because they represented vocabulary most undergraduates
should be familiar with. Participants were prompted “Please tell us which English words
you know or don’t know” with a binary choice for each word. Reals and foils rates were
calculated as the number of items of each type chosen. As described earlier, the RExI was
calculated as the residuals of predicting foils rate by reals rate.
43
Table 1: Correlations between Study 1 measures
Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 Course Grade .74 (.15) —
2 RExI .18 (.19) -.16 .77
3 Foils Rate .18 (.20) -.16 1.00 .87
4 Reals Rate .60 (.27) .07 .00 -.03 .91
5 Sex (M+) .29 (.45) -.05 -.07 -.07 .22 —
6 Native English .49 (.50) .08 -.06 -.07 .39 .08 —
7 English Culture .74 (.44) -.00 -.19 -.20 .45 .11 .50 —
8 Self-Report SS GPA .91 (.06) .29 -.08 -.08 .08 -.10 .14 .11 —
9 CRT Correct .66 (.30) .23 -.17 -.17 .10 .16 -.09 -.14 .14 .75
10 CRT Intuitive .25 (.25) -.20 .16 .16 -.07 -.13 .10 .16 -.07 -.90 .67
11 CRT Overestimation .47 (.19) -.10 .20 .20 -.04 .00 .12 .08 -.12 -.79 .76 —
12 Overall MSLQ .64 (.12) .28 .04 .04 .09 .04 -.01 -.06 .17 .17 -.14 .03 .64
13 Fixed Mindset .27 (.22) .06 -.03 -.03 -.12 -.00 -.07 -.10 .09 .01 .04 -.02 -.32 .93
14 Growth Mindset .71 (.21) -.13 .05 .05 .10 .02 .03 .07 -.12 -.03 -.01 .04 .30 -.76 .93
15 Openness .54 (.14) .13 -.14 -.14 -.07 -.07 -.00 -.01 .11 .09 -.14 -.10 .14 -.02 .04 —
16 Conscientiousness .54 (.13) .04 -.08 -.08 .03 .06 -.00 .11 .09 .09 -.05 -.03 -.02 .03 .09 .02 —
17 Extraversion .59 (.11) -.08 -.04 -.04 -.13 -.01 -.09 -.00 .05 -.03 .05 .11 -.03 -.10 .11 -.02 .16 —
18 Agreeableness .60 (.16) -.01 .12 .12 -.12 -.01 -.06 -.10 -.07 -.11 .13 .15 -.07 -.00 .03 -.02 .11 .17 —
19 Emotional Stability .56 (.12) .10 -.06 -.06 .05 -.11 .08 .06 .07 -.09 .10 .02 -.02 -.10 .05 -.01 .10 -.00 .02
Note: N = 316. Sex coded as binary, Male high. SS GPA: Secondary School Grade Point Average. CRT: Cognitive ReflectionTest. MSLQ: Subset of the Motivated Strategies for Learning Questionnaire. RExI: Residualized Exaggeration Index. Cohen’s α(bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p>= .05, p < .05, p < .01 (see Reporting Conventions).
44
Results
Predictive Validity
Table 1 shows correlations between study measures. Demographic measures (sex,
being a native English speaker, and having an English cultural background) did not relate
significantly to course grades.
Exaggeration. On the vocabulary overclaiming inventory, claiming for reals and foils
was uncorrelated (r(314) = -.03, 95% CI [-.14, .08]), with Cohen’s α for reals being .91 and
for foils, .87 (as shown on the diagonal). The absence of correlation between reals and foils
claiming indicates no inherent common variance that might have been caused by
carelessness or other response bias, suggesting that the overall design did suppress
inattentive responding. The RExI derived from this inventory significantly predicted lower
course grades, to the same degree as foil claiming alone did. This is because reals claiming,
which showed no relationship with academic performance, was also unrelated to foils
claiming, so the RExI had no competence evidence to remove. This pattern also suggests
that exaggeration alone can be a meaningful predictor of performance, even when
competence is not. The correlations between reals claiming and demographic measures
suggest that some genuine English vocabulary knowledge was being captured, even if not
relevant to science grades.
Cognitive Ability. While self-report SS GPA showed an expected relationship with
course grades, the lack of relationship with exaggeration may be due to it being self-report,
and thus potentially influenced by exaggeration, i.e. exaggerated self-report compensates
for exaggeration-caused lower performance. The CRT scores showed similar relationships
with higher academic performance and lower exaggeration, suggesting a more general
relationship between exaggeration and performance impairment.
Overestimation. Overestimation of performance on the CRT test was related to the
RExI, but not significantly to course grades, suggesting it was not as effective in capturing
error in self-perception. Consistent with the hard-easy effect in overestimation literature,
45
there was a strong negative relationship between the overestimation index and the ability
being overestimated.
Metacognition. The 22 items selected from the MSLQ for their relationship with
academic performance behaved as expected here, relating to course grades and cognitive
ability measures, but not to exaggeration. The Growth and Fixed Mindset measures
related to the MSLQ metacognition measure in an expected way, in that metacognitive skills
should relate more to a growth orientation. The negative relationship between grades and
growth mindset may be due to restriction of range in the sample (i.e. these students are
already proven to be high achievers), and that science students may find growth items (e.g.
“I have the ability to change my basic intelligence”) contrary to what they’ve been taught.
The lack of relationship between these metacognition measures and exaggeration suggests
some divergent validity.
Personality. Like other measures that related to academic performance, TIPI
openness showed a similar opposite relationship with exaggeration. It is not clear why
exaggeration would relate to agreeableness, especially given the opposite relationship to
reals claiming, i.e. this is not just an agreeable, acquiescent response bias. The lack of
relationship between conscientiousness and exaggeration does provide some divergent
validity, given that other researchers (e.g. Bowling et al., 2016) have found carelessness
related to both lower agreeableness and lower conscientiousness. Exaggeration here does
not fit that pattern. While the courseness of the TIPI means that subtle personality
correlates may not appear here, these results at least suggest that exaggeration is not easily
explained by the five-factor model of personality.
Incremental Validity
Table 2 shows standardized β coefficients (with standard error) of a linear regression
predicting course grades. Note that, because both reals and foils rate are in the same
model, the β for foils rate is exactly the RExI (foils rate controlled for reals rate), showing
that the impact of knowledge exaggeration persists after controlling for measures of
46
cognitive ability (SS GPA, the CRT), metacognition (MSLQ), personality (TIPI), sex and
culture.
Table 2: Regression Model Predicting Course Gradesfrom RExI in Study 1
Predictor β SE p value
Foils Rate (RExI) -.14 .06 .01
Reals Rate .02 .06 .72
Self-Report SS GPA .21 .05 <.001
CRT Correct .15 .05 .009
MSLQ Metacognition .22 .05 <.001
Native English .06 .06 .32
English Culture -.05 .07 .48
Sex (M+) -.06 .05 .25
Extraversion -.09 .05 .08
Agreeableness .07 .05 .16
Conscientiousness .01 .05 .84
Emotional Stability .08 .05 .14
Openness .04 .05 .41
Note: Overall R2 = .21, p < .001. N = 316. Sex codedas binary, Male high. SS GPA: Secondary School GradePoint Average. CRT: Cognitive Reflection Test. MSLQ:Subset of the Motivated Strategies for LearningQuestionnaire. RExI: Residualized Exaggeration Index.
47
Discussion
As proof of concept, science undergraduates, motivated only by their own curiosity,
completed an “Academic Personality” survey in order to get personal feedback. By
gathering knowledge claims of vocabulary unrelated to science grades, exaggeration was
measured using the RExI approach of residualizing foils rate from reals rate.
Students who cared enough about their responses that they wanted to see results still
demonstrated detectable exaggeration, enough to predict lower course grades beyond
measures of cognitive ability, metacognition, personality and demographic controls. This
exaggeration appeared to generalize: The knowledge that was exaggerated (ordinary,
non-science vocabulary) was unrelated to (science) academic performance, yet the tendency
to exaggerate that knowledge was.
While exaggeration related to academic performance but the knowledge being
exaggerated did not, overestimation did not relate to academic performance while the
ability estimated did. This suggests the RExI approach is providing more useful information
than the overestimation index.
These results cannot be explained by carelessness, not just because of study design,
but also because reals and foils claiming showed no common influence, e.g. inattentive
carelessness or response bias. We also see that the cognitive and metacognitive measures
predicted course grades in expected ways, indicating the integrity and validity of the survey
overall.
While this exploratory study was coarse, it confirms that this operationalization of
exaggeration, the RExI, captured a phenomenon worth examining more thoroughly, an
avenue we shall pursue in Study 2.
48
Study 2: Validating the RExI
Will the results of Study 1 replicate with better measures? While a relationship
between exaggeration and lower academic performance was found, the context and
selection was unusual: Only students wanting personality feedback were involved. While
this may have selected for lower inattentive carelessness, it may have also selected students
preoccupied with their self-image. The remaining student studies use a conventional
context for psychological research: undergraduates incentivized to participate in return for
course credit. While certainly not representative of humanity overall (Henrich et al., 2010),
these samples were at least more indicative of North American undergraduates in general.
This context should also now allow for more variance in careless responding as found in
similar samples (e.g. Meade & Craig, 2012), so carelessness is now measured.
To consider self-enhancement as an explanation for exaggeration, Study 2 used a
popular measure, the Narcissistic Personality Inventory (NPI) introduced by Raskin and
Terry (1988). An important quality of the NPI is that it involves forced-choice questions
where one must decide between two alternatives (unlike, say, a Likert question assessing
degree of agreement). This forced-choice format means that the measure is resilient to
response bias or carelessness: Answering uniformly or randomly produces a noisy, middling
score, not misleading variance, because both high and low scores require selective attention
to content. Likert scales, used in many personality measures, when answered inattentively
(e.g. longstrings) or with socially-desirable or other response bias, are more vulnerable to
such signal biases.
To examine self-enhancement in more detail, Study 2 also incorporated the Balanced
Inventory of Desirable Responding (BIDR) developed by Paulhus (1988). This set of three
instruments is designed to capture both interpersonal (impression management) and
intrapersonal (self-deceptive enhancement, self-deceptive denial) aspects of
self-enhancement. The “balanced” in the title refers to half the items being reverse-scored18
18 In psychometric instruments using Likert items to assess some quality (e.g. “From 1 to 10, how agitated
49
in order to compensate for superficial response biases. This 50-50 balance of item scoring
directions also allows for convenient measure of carelessness via longstrings: It is extremely
unlikely that a sincere, attentive respondent would give the same response to more than
half the items.
From the history of foils being used to test false recognition of advertisements, we
noted earlier that exaggeration may be related to recognition memory bias. To test that
relationship, a 100-item battery of words were presented at the start of the survey and then
tested for recognition at the end (with 50 old, and 50 new words). This allowed for testing
both individual differences in memory ability and also memory exaggeration, by applying
the RExI to false (relative to correct) claims of recognition. Note that the cognitive trait of
recognition bias mentioned earlier (Kantner & Lindsay, 2014), is about individual
differences in memory claims, whether valid or not. Memory bias thus represents
confidence (warranted or not) in claiming recognition, which will relate to genuine
recognition ability. This is different from memory exaggeration, which is about false
recognition, uncorrelated with rate of plausible memory claiming.
To get a broader measure of exaggeration, a large (150-item), commonly-used
overclaiming inventory was employed, the Overclaiming Questionnaire (OCQ) developed by
Paulhus et al. (2003). The content of that inventory is based on 1980s American cultural
knowledge (Hirsch Jr et al., 1988), so if knowledge exaggeration does not strictly depend on
the domain of knowledge assessed, as Study 1 suggests, then it should not matter that this
content should be largely irrelevant to academic performance of a 21st century
undergraduate at a Canadian university.
To broaden and generalize the measure of academic performance, for the remaining
studies, the central dependent variable was University of British Columbia (UBC) GPA, a
metric of high ecological validity that reflects not just knowledge, but a broad range of
do you feel?”) a reverse-scored item would assess that quality from the opposite direction, e.g. “From 1 to10, how calm do you feel?”.
50
decisions: cognitive, metacognitive, strategic, social, emotional and more. Like intelligence
or personality, tendency to exaggerate one’s knowledge may represent an individual
difference that affects many life outcomes.
Study 1 showed that exaggeration predicted lower academic performance and also
lower performance on the CRT. This raises the possibility that exaggeration may be simply
an expression of lower cognitive ability in general. To test that, Study 2 employed a broad
measure of cognitive abilities used commercially for evaluating job applicants, the
Wonderlic Personnel Test (E. F. Wonderlic, 1992). Scores on this should predict GPA.
Method
The study was approved by the institutional ethics board and included explicit, active
consent to access student transcript information. As with Study 1, data was gathered via
online survey.
Participants
No longer limited to students from one discipline (like the science students in Study
1) Study 2 considered a wider range of students, in various disciplines and from varied
backgrounds, who happen to be enrolled in an undergraduate psychology course (a popular
elective across disciplines) at UBC, with participants from spring and summer terms.
Students volunteered to complete an online survey for partial course credit. A total of 533
students completed the study, with a median completion time of 44.1 minutes.
Overall, 31% of participants reported their gender as male (69% as female), 50%
reported English as a first language, and 64% as being from Western countries. (Note that
these proportions are also shown in the M (SD) column of Table 3, because these are
dichotomous variables.) 59% of the students were enrolled in an Arts program. While 43%
of the students sampled were in their first year, this distinction had no impact on the
outcomes shown below, nor did their overall number of academic terms.
51
Measures
Academic Performance. To capture academic performance with breadth, reliability
and ecological validity, the study asked participants to grant access to their university
transcripts. From this, GPA was calculated as overall average grade for all courses
completed at the university, including courses in progress, so there were data even for
students new to the university. These grades were represented on a 0 – 100 scale. Note that
this is an improvement over many studies where self-reported GPA is used; official
transcript information avoids measurement error or self-report bias.
Knowledge Exaggeration. The OCQ-150 (Paulhus et al., 2003) was employed here as
a reference for extracting exaggeration measures, given that it has been widely applied in
overclaiming studies. The broader knowledge domain these items query is 1980s American
culture, taken from Hirsch Jr et al. (1988), in ten categories (20th Century Culture Names,
Authors and Characters, Books and Poems, Fine Arts, Historical Names and Events,
Language, Life Sciences, Philosophy, Physical Sciences, Social Science and Law) with each
category having 12 reals and 3 foils. In this application, the potential irrelevance of the
content suits the purpose of establishing that exaggeration generalizes beyond the
domain(s) it is measured on. Claims for each item were solicited with the prompt of
“Please rate how familiar you are with each item” along a Likert scale of 1 : Not at all
familiar to 7 : Very familiar, a format taken from the instrument’s use in overclaiming
studies. The RExI was calculated as the amount (average rating) of foils claiming
residualized on the amount of reals claiming.
Self-Enhancement.
Narcissism. The 40-item dichotomous forced-choice NPI (Raskin & Terry, 1988)
was used to assess narcissism. It has a theoretical range of 0 – 40.
Balanced Inventory of Desirable Responding (BIDR). The BIDR consists of
three 20-item, balanced (equal numbers of forward- and reverse-scored items) instruments:
Impression Management (IM), Self-Deceptive Enhancement (SDE), and Self-Deceptive
52
Denial (SDD). For each item, extreme scores (e.g. 6 or 7 for forward-scored items on the
7-step Likert scale used) count as 1, others as 0, for a range of 0 – 20 for each instrument.
Cognitive Ability. The form A 2000 version of the Wonderlic Personnel Test
(E. F. Wonderlic, 1992), adapted here for online survey administration, was used as a
general measure of cognitive ability. It is a widely-used 12-minute timed test that requires
metacognitive, numerical, graphical and verbal problem-solving skills, and attention to
detail. Scored only as number correct, the maximum possible is 50.
Recognition Memory. Early in the survey, participants were exposed to items via a
Lexical Decision Task (LDT). For each of 100 words, they were asked to categorize each, as
quickly and accurately as possible, as either a genuine word from the English language or
not. Fifty genuine words and fifty pronounceable nonwords were shown. At the end of the
survey (with roughly 20 minutes of other questions in between), using a similar timed
binary classification task, respondents categorized words as to whether they had seen them
before in the LDT, with 50 old and 50 new items (both 50% genuine words) of similar
properties. New and old items were matched for word length.
This protocol gives us two summary measures: one for the rate of claiming old items
(hit rate, analogous to reals claiming for overclaiming), and one for the rate of claiming
new items (false alarm rate, analogous to foils claiming). These two simple measures can
be combined in various ways, e.g. as done to create the RExI. Signal detection theory (the
typical model used in memory research) provides a variety of other combinations, notably
accuracy (roughly the excess of old claims beyond new claims, known as d′) and bias,
representing overall claiming, whether old or new (which correspondingly increases with
recognition). Kantner and Lindsay (2014), in examining their cognitive trait, consider
several kinds of bias measures which will not be considered here. For simplicity, the
correlation table reports memory accuracy (d′) as an indicator of general memory ability,
and also memory exaggeration, as measured by the RExI. For regression models, simple
measures of memory hit rate and false alarm rate are included. These two measures
53
capture all the information (variance) from the memory test without requiring any
understanding of signal detection theory, or arguments about which composite measures to
use. The software used to summarize scores did not easily provide individual item data, so
Cohen’s α was not calculated.
Careless Responding. To identify participants who were not answering sincerely,
two techniques were combined. For the LDT, the fastest decision time with at least average
discrimination was found (454ms) and then used as a cutoff: Participants with median
correct response time faster than this (1.5%) were considered too fast and labelled as
careless. This distinction correlated with LDT discrimination (Hits - False Alarms)
significantly, r(530) = -.69***, 95% CI [-.73, -.64], showing that such rushed responses were
not accurate.
As well, responses that showed an unreasonable consistency on BIDR measures (i.e.
having identical responses to more than half the items, because half are reverse-scored) were
also labeled as careless. In total that came to 12% of the sample, which is consistent with
Meade and Craig (2012) who reported that “approximately 10% – 12% of undergraduates
completing a lengthy survey for course credit were identified as careless responders” (p. 1).
Carelessness was thus a dichotomous variable indicating if a respondent was unreasonable
fast on the LDT and / or gave identical responses to more than half of the items on any of
the three BIDR instruments. Note that this carelessness index can not be meaningfully
compared to any of the BIDR scores because they are both derived from the same items.
Demographics. While participants were asked which gender (or neither) they most
closely identify with, for the purposes of analysis, this was collapsed to a binary variable,
with the larger value indicating male. Participants also reported if English was a first
language for them, and if they had spent 10 or more years living in Western countries (e.g.
English-speaking countries, Western Europe).
54
Table 3: Correlations between Study 2 measures
Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 University GPA .72 (.09) —
2 OCQ RExI .45 (.16) -.27 .78
3 OCQ Foils Rate .22 (.19) -.17 .83 .92
4 OCQ Reals Rate .48 (.16) .09 -.00 .56 .97
5 Sex (M+) .31 (.46) -.12 -.03 .00 .05 —
6 Native English .50 (.50) .11 -.29 -.07 .32 .09 —
7 Western Culture .64 (.48) .14 -.35 -.14 .26 .02 .52 —
8 Careless Responding .12 (.33) -.07 .21 .13 -.09 .00 -.14 -.14 —
9 Wonderlic Score .56 (.12) .29 -.34 -.20 .16 .13 .14 .19 -.21 .83
10 Narcissism .33 (.17) -.18 .25 .21 .01 .11 -.10 -.20 .10 -.18 .84
11 Impression Management .28 (.18) .11 -.08 -.02 .09 -.11 .01 .07 -.26 .14 -.10 .75
12 Self-Deceptive Enhancement .21 (.15) -.01 .01 .11 .19 .04 .02 -.01 -.21 .07 .26 .47 .70
13 Self-Deceptive Denial .33 (.18) .07 -.09 -.01 .10 -.25 .10 .21 -.24 .05 -.16 .66 .42 .76
14 Memory RExI .59 (.15) -.17 .25 .19 -.02 .01 -.09 -.12 .15 -.17 .19 -.04 -.05 -.07 —
15 Memory Accuracy (d′) .37 (.14) .15 -.21 -.14 .05 -.03 .10 .12 -.14 .15 -.16 .07 .05 .12 -.89
Note: N = 530. OCQ: 150-item Overclaiming Questionnaire. RExI: Residualized Exaggeration Index.Cohen’s α (bootstrap approximated for RExI measures) shown in italics on diagonal. M (SD) for range of 0to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).
55
Results
Predictive Validity
This section describes convergent and divergent validity of the RExI based on the
bivariate, zero-order correlations between study variables shown in Table 3. Note again
that, as in all correlation tables in this paper, mean and standard deviation are reported on
ranges standardized to be from 0 to 1, as described in Reporting Conventions.
Exaggeration. The main results of Study 1 are confirmed here, that knowledge
ability exaggeration is related to impaired knowledge (academic) performance, despite the
knowledge being exaggerated having little relevance to performance, as showed by the weak
relationship between real claiming rate and GPA. Exaggeration also showed only moderate
relationships with carelessness, narcissism, lower memory accuracy, and lower cognitive
ability as measured by the Wonderlic test. Altogether, these suggest substantial
discriminant validity, given the higher reliability of these measures compared to Study 1:
Exaggeration does not appear to be merely a side-effect of these other possible
explanations. This distinction is also shown via the regression model presented later.
Despite being related to narcissism, exaggeration showed no significant relationship
with impression management or self-deceptive enhancement, and only a slight relationship
with lower self-deceptive denial. Surprisingly, having English as a first language or a
Western cultural background related to lower exaggeration, which may be an artifact of the
OCQ content being so culturally biased. Memory exaggeration showed similar patterns to
knowledge exaggeration, albeit weaker, without a very strong relationship between the two
measures.
Careless Responding. The ad hoc measure of careless responding showed sensible
relationships with Wonderlic, narcissism and memory measures, suggesting it captured
something relevant, although possibly just situational carelessness, only for this study,
given the lack of relationship with GPA. Note that, being partially derived from response
style on the BIDR, this careless responding measure can’t be meaningfully compared with
56
impression management, self-deceptive enhancement, or self-deceptive denial. The two
components of this carelessness measure, responding too quickly to the LDT and longstring
responding on the BIDR, correlated r(530) = .20***, 95% CI [.12, .28], suggesting only
slight convergence of the two techniques. We see that carelessness did relate to
exaggeration, although not in a dramatic way.
Recognition Memory. Recognition memory accuracy related similarly to both GPA
and Wonderlic, validating the measure used in this study. Notably, memory exaggeration
(RExI) related to both those cognitive ability measures at no less magnitude than did
memory accuracy. Because memory accuracy was calculated (as signal detection theory d′)
on the same data as used for memory exaggeration, here the exaggeration measure is not
completely orthogonal to the ability measure. This is because d′ collapses both ability and
exaggeration into one measure, essentially correcting for guessing. The similar magnitude
of predictive validities raises some questions: If d′ captures both memory ability and
(negative) exaggeration, yet shows effects similar to the RExI (which removes variance
related to ability), this suggests genuine ability may be less relevant than exaggeration in
this context. This is supported by the regression model (below) showing that hit rate
(suggesting correct recognition) does not contribute more than false-alarm rate (controlled
for hit rate) in predicting academic performance.
Self-Enhancement. While narcissism related to both forms of exaggeration,
impression management, self-deceptive enhancement and self-deceptive denial did not. This
may be because those constructs do not relate to exaggeration, or because those
instruments don’t capture them in ways that are relevant here. More detailed analysis
revealed that the Emmons 7-item Exploitiveness / Entitlement subscale of the NPI
(Emmons, 1987) showed similar relationships with GPA (r(528) = -.15***, 95% CI [-.23,
-.06]) and the RExI (r(531) = .23***, 95% CI [.15, .31]), suggesting that facet best
characterizes exaggeration. Also, the Ames et al. (2006) 16-item shortened version of the
NPI related to GPA (r(528) = -.17***, 95% CI [-.25, -.09]) and the RExI (r(531) = .26***,
57
95% CI [.18, .34]) about as well as the 40-item version, allowing for more economy in future
studies.
Incremental Validity
To disentangle the several relationships shown above, Table 4 presents a linear
regression model, with standardized β coefficients (and standard error) predicting
university GPA. Note, again, that the β for foils rate, now being controlled for reals rate, is
exactly the RExI. Note also that the partial correlations (βs) for foils and reals claiming are
larger than their zero-order correlations shown in Table 3. This indicates a mutual
suppressor effect: The “meaning” of one kind of claim depends on the amount of the other
kind of claim. This highlights the value of considering exaggeration in context and the RExI
analytic strategy to isolate it.
Table 4: Regression Model Predicting GPA from OCQ RExI inStudy 2
Predictor β SE p value
OCQ Foils (RExI) -.19 .04 .001
OCQ Reals .17 .05 .003
Wonderlic .22 .04 <.001
Memory False Alarms (RExI) -.10 .04 .06
Memory Hits .07 .05 .20
Narcissism -.05 .04 .25
Impression Management .10 .04 .08
Self-Deceptive Enhancement -.04 .05 .49
Self-Deceptive Denial -.06 .06 .32
Sex (M+) -.15 .04 <.001
Native English .03 .05 .57
Western Culture -.00 .05 .99
Careless Responding .03 .04 .46
Note: Overall R2 = .16, p < .001. N = 530. OCQ: 150-itemOverclaiming Questionnaire. RExI: Residualized ExaggerationIndex.
58
Exaggeration as Distinct Liability. This model helps clarify that, in the relationship
between exaggeration and performance, language and culture are no longer distinct
predictors, once controlling for Wonderlic general cognitive ability, memory performance,
and careless responding. Neither are the self-enhancement measures (NPI or BIDR)
significant predictors in this model, given those controls. This suggests that the behavior of
exaggerating one’s knowledge, even of trivial information, predicts impairment in overall
knowledge (academic) performance that is not simply a side effect of lower cognitive ability,
weaker memory, self-enhancement or carelessness.
Note that the two memory measures, hit rate and false alarm rate, also include some
degree of exaggeration, i.e. unwarranted memory ability claims. Including them in this
model thus reduces the variance attributable to the OCQ RExI; without the memory
measures, the β for the RExI increases slightly. This model, then, is conservative, showing
the effect of knowledge exaggeration after controlling for memory exaggeration. A similar
model using memory exaggeration (without the OCQ but with the other measures), also
shows that memory exaggeration remains a uniquely significant (but weaker) predictor of
GPA. Both kinds of exaggeration predict lower academic performance beyond these
controls.
Discussion
Using more extensive and comprehensive measures, the finding in Study 1 that
knowledge exaggeration (as measured by the RExI) uniquely predicted lower academic
performance was replicated. Memory exaggeration showed a similar, albeit smaller, effect.
Narcissism showed a slight relationship with both kinds of exaggeration.
Recall that my historical review of foil claiming showed that it has been used to
capture self-enhancement (faking, socially-desirable responding), false memory (of unseen
advertisements) and careless responding in surveys. After controlling for those possible
explanations, results here show that exaggeration based on foil claiming still uniquely
59
predicts academic performance. Even after also controlling for simple demographic
variables and general intelligence, exaggeration shows discriminant validity.
The weakening of relationship between knowledge exaggeration and GPA from
zero-order correlation to partial correlation (β) is almost entirely due to adding Wonderlic
score to the model, indicating an overlap in their relationships to academic performance.
The RExI, however, has no obvious relationship with either fluid (problem solving) or
crystallized (acquired, declarative) intelligence as conventionally conceived, so exaggeration
may capture an aspect of g (general intelligence) previously overlooked. Altogether, the
RExI appears to identify an important, distinct predictor of academic performance, worthy
of further exploration.
Curiously, despite the Wonderlic being a professionally-developed, proprietary
instrument widely employed for over 80 years (Wonderlic, 2019), the RExI used here shows
comparable, and independent, validity in predicting an important life outcome, university
GPA. While taking the Wonderlic involves 12 stressful minutes, the items used here for
capturing exaggeration, appearing like a casual trivia quiz, showed a median completion
time of 6 minutes.
A car’s racing performance depends on both the power of the engine and the skill of
the driver. Similarly, it appears that, while cognitive ability predicts academic performance
as expected, how skillfully one drives that engine, how one understands their own ability,
also has an influence. Just as taking a corner too fast can slow you down, excessive
self-image of competence appears to impair application of that competence, whether in
knowledge expression or recognition memory.
To get a better understanding of the nature of exaggeration, it is important to further
examine how it is measured and what it relates to.
60
Study 3: Developing Better Measures
Studies 1 and 2 used overclaiming item inventories for assessing exaggeration,
extracting the variance of foils claiming not explained by reals claiming. Study 2 results
showed an improvement (stronger correlation with relevant measures) from a larger
inventory of 150 items. Might it be possible to create a more efficient set of items? This
study explores the construction of an overclaiming type of instrument designed to capture
general knowledge exaggeration. Additionally, this study also served as a pilot test of an
overstatement instrument for measuring knowledge exaggeration.
Overclaiming Instrument Design
Item Format. Like many personality instruments, the OCQ used in Study 2 relied on
a Likert style “7-point scale ranging from 0 (never heard of it) to 6 (know it very well)”
(Paulhus et al., 2003, p. 891). For a new instrument, I kept the wisdom of enumerating the
lowest step at 0 (to connote complete ignorance) but expanded the upper limit to 10, both
to connote mastery, and to capture higher resolution for confident claims. I replaced the
labels at the extremes with No Knowledge and Complete Knowledge to avoid sensory bias
(you may not have heard of it but have seen it a few times) and ambiguity (how well is
“very” well?). This design is similar to the approach proposed by Hodge and Gillespie
(2003), which demonstrated higher reliability than typical Likert scales.
Item Content. As described earlier, the OCQ based its content on 1980s American
cultural knowledge. Even if exaggeration does not strictly depend on knowledge domain,
respondent reactance to test content should be avoided; the obvious cultural bias was noted
by some participants. Considering the typical convenience sample of North American
undergraduates, a more appropriate domain of general knowledge would simply be English
vocabulary, given that a certain level of English skill is required for university entrance.
This domain also serves as a relevant domain for exaggeration, as “vocabulary knowledge
correlates highly with performance on more general measures of intelligence and is
61
commonly viewed as a proxy for IQ” (Marchman & Fernald, 2008, p. F14).
Beyond being a useful general knowledge domain to query, vocabulary items have the
benefit of being well-studied in education and psycholinguistics. This provides a wealth of
theoretical support and metrics for choosing items. Having so many items to choose from
provides the luxury of high standards for selection. With such a rich resource, we can begin
with theoretical principles for selecting a few hundred candidates out of several thousands,
then use empirical testing for selecting tens out of those hundreds. The theory for item
selection used here can be extended to instrument development in other languages.
Item Selection Theory. What makes good items for genuine and false claims of
knowledge? The reals should be indicative of an appropriate range of relevant knowledge,
and the foils should not represent content associated with knowledge. However, foils must
be alluring, without being “damaged” reals, e.g. misspellings or slang. Reals must not be
too easy (making foils appear as incongruent outliers) nor too hard, or they act as
undetectable foils. Ideally, the psychological processes driving item claiming should be
different for reals and foils.
The most challenging of the above criteria is making foils alluring, worth claiming,
without tapping actual knowledge. Cognitive psychology provides inspiration in
dual-process theory (e.g. Frankish, 2010). The shallow, surface, easy processing of cognitive
fluency may trigger false familiarity to make foils more attractive despite their lack of
substance. At the same time, reals should be less fluent so that their primary cause for
claiming would be from a deeper process of recollection. This provides a simple heuristic
for selecting items: Foils should have high fluency but not be related to ability, while reals
should be related to ability yet have low fluency.
A practical issue arises: Is there an objective way to assess fluency of a word, even if
not real? Psycholinguistics research has shown that both phonotactic frequency and
neighborhood density influence wordlikeness (e.g. Frisch et al., 2000; Bailey & Hahn, 2001).
For our purposes, that means that pseudowords with more common sequences of letters
62
and/or sounds, and having structures similar to existing words, are perceived to be more
like actual words. A common measure of neighborhood density is OLD20; this averages the
orthographic Levenshtein edit distance between a letter string and the 20 closest neighbors
within a given corpus (if you need to know), and it can serve as a simple assessment of
un-wordlikeness (Yarkoni et al., 2008). Longer words are also less fluent, and shorter words
too easy, so word lengths were restricted to be from 6 to 8 characters.
As a foundational resource, the English Lexicon Project (ELP) provides free access to
behavioral data on over 40 thousand each of English words and pseudowords. The
behavioral data includes speed and accuracy information from LDT measures of each item
(word or pseudoword), in which participants had to quickly and accurately decide if a given
letter string was a genuine word or not. This provides both decision time and accuracy
measures for each word, whether real or not. This database can provide some confirmation
of our fluency heuristic: Controlling for word length, OLD20 (unusualness) predicts LDT
accuracy negatively for real words (β = −.53∗∗∗) but positively for fake words (β = .41∗∗∗).
This suggests that fluency makes real words easier and fake words harder to distinguish.
The reals still need to be of appropriate difficulty, so we can use frequency of occurrence
(i.e. how likely an English reader has encountered it) as a proxy for difficulty, as more
common words are more well-known.
Item Selection Process. The selection process for potential real items from the 40
thousand in the ELP database was as follows:
• Items were filtered to remove capitalizations (proper nouns), punctuation
(contractions), common obscenities and extremely uncommon words.
• OLD20 was calculated for each item.
• Items were filtered to be 6 to 8 characters in length, not among the 20% most
common words (too easy), in the top 10% for OLD20 (unusual construction), in the slowest
40% of LDT response time (lower fluency), and not compound words (e.g. “soapbox” or
“bedroom”).
63
The selection process for potential foil items from the 40 thousand in the ELP
database was as follows:
• Items were filtered to remove capitalizations and punctuation.
• OLD20 is calculated for each item.
• Items were filtered to be 6 to 8 characters in length, having more than 1 letter
difference from a real word (i.e. not typos), and being below median for LDT Accuracy (not
obviously fake).
Candidate foils were not filtered by OLD20 because that measure in ELP pseudowords
was contaminated by having common suffixes, e.g. “ing”, “er”, “ed”. It should be noted that
their pseudowords were machine-generated, mostly by slight alterations of real words, so
that set included some rare real words, some common typos of real words, and some that
are very awkward to pronounce, i.e. low fluency. In general, the suitability of pseudowords
found was so weak that it was hard to find even 60 candidates out of the initial 40
thousand.
Generating Foils. The pseudowords in the ELP database were created for a
different purpose, where being obviously fake was useful. To remedy this gap, a novel
algorithm for generating fluent pseudowords was created. This involved taking common,
fluent short words as input, then “stitching” them together by finding overlapping sequences
of 2 or 3 letters. For example, “bear”, “area”, and “read” can be stitched to create the fluent
pseudoword “bearead”. In contrast, “judgmenp” is an example of a typically less-fluent ELP
pseudoword, as demonstrated by the unusual letter sequences “dgm” and “enp”.
Final Candidate Selection. For each of lengths 6, 7, and 8, 20 ELP pseudowords and
10 “stitched” pseudowords were selected, giving a total of 90 candidate foils, each manually
checked by Internet search to be very unlikely to have any plausible meaning. To avoid
possible misinterpretation as a proper noun, word items in this instrument are always
presented in lower-case letters. These 90 candidate foils were combined with 180 candidate
reals, 60 of each length, for a total of 270 items.
64
Overstatement Instrument Design
To match the content of the overclaiming instrument, a simple overstatement test
based on vocabulary was also tested in this study. Any multiple-choice test can be
converted into an overstatement test simply by adding a non-claiming option to each
question, e.g. “Don’t Know”. This provides a degree of freedom between correct and
incorrect answers because the test-taker is not forced into the dichotomy of being right or
wrong. Items were selected from the multiple-choice Vocabulary Size Test (VST) to capture
college-level knowledge. The full VST involves 140 items to assess vocabulary sizes between
1000 and 14000 words, ten at each thousand level (Beglar, 2010). From that, 60 words were
selected representing vocabularies of 7000 to 12000 words. Those items were presented with
the given four answer options (1 correct, 3 incorrect, in randomized order) plus a “Don’t
Know” non-claiming option, to create an overstatement format. Care was taken to ensure
that none of the VST words were included in the candidate overclaiming real words.
Method
The purpose of this study was to assess the suitability of potential overclaiming items
and to pilot test the overstatement version of the VST. To reach a more diverse subject
pool, participants were recruited via Amazon Mechanical Turk (MTurk). That system
allows for “workers” to be paid by “requesters” once their work is approved, otherwise they
are rejected. Excessive rejections look bad for both workers and requesters, but pressure to
reject is high because many workers are careless and requesters have limited budgets. A
special design was employed to avoid this tension. The description of the task (prior to
accepting the task and on the first page, once accepted) made it clear that a college level of
English vocabulary was required to complete the task, and if not demonstrated, the task
would halt, without pay or rejection. Dynamic processing was added to the survey to
determine if vocabulary questions (at the outset) were answered correctly above chance
without cheating, and if not, the task terminated with explanation. This prevented workers
65
from wasting time completing a task without pay, prevented paying for careless work, and
allowed for immediate automatic acceptance and paying of tasks once completed.
Participants
The dynamic exclusion of participants not scoring above chance on vocabulary
questions proved very effective in restricting careless work and minimizing costs, as only
37% of attempts (151 participants) demonstrated vocabulary knowledge above chance.
Those respondents reported ages having mean (standard deviation) of 39.34 (12.40) years,
ranging from 21 to 74. 93% reported English as a first language, 95% said they were more
familiar with American rather than British English, and 82% reported post-secondary or
higher education. Gender and race information was collected indirectly, showing
participants to be roughly equally male and female, and about 80% white (Caucasian).
Measures
Demographics. The opening questions involved the demographic data reported
above.
Vocabulary Overstatement and Overclaiming. After the opening questions,
participants saw three blocks each of 20 multiple-choice (overstatement) VST items and 90
(60 real, 30 foil) overclaiming items. To avoid cheating (e.g. looking up a word), each
overclaiming item had a time limit of 10 seconds and each VST item had a time limit of 20
seconds. If a VST block did not have at least 9 out of 20 correct answers (i.e. less than 5%
chance if random), the task terminated, as described above.
Narcissism and Memory. Following those vocabulary tests was the 40-item NPI and
then a 120-item recognition memory test based on the 60 multiple-choice vocabulary items.
The narcissism and memory measures were included as possible future selection metrics,
and to confirm validity of the survey.
66
Results
Table 5 shows correlations of study measures. Lack of overall carelessness was
demonstrated in overclaiming rates: Foils rate was unrelated to reals rate (r(149) = -.03,
95% CI [-.19, .13]), showing that item claiming was not influenced by a common factor like
careless responding or response bias. This also means that the overclaiming RExI reported
here is effectively identical to the foils claiming rate.
Table 5: Correlations between Study 3 measures
Measure M (SD) 1 2 3 4 5 6 7 8 9
1 Overclaiming RExI .15 (.18) .96
2 Overclaiming Reals Rate .67 (.23) .00 .98
3 Overstatement RExI .57 (.21) .46 .12 .67
4 Overstatement Correct .64 (.20) -.26 .64 -.00 .77
5 Narcissism .29 (.23) .29 -.28 .09 -.32 .93
6 Memory RExI .22 (.19) .42 -.17 .25 -.26 .30 —
7 Memory Accuracy .62 (.21) -.43 .38 -.23 .48 -.40 -.81 —
8 Age .35 (.23) -.36 .30 -.17 .21 -.34 -.08 .18 —
9 Native English .93 (.26) -.12 .10 -.10 .08 -.03 .07 .05 .15 —
10 Post-Secondary Education .82 (.38) .06 .06 .11 .09 .17 -.14 .08 -.22 -.06
Note: N = 151. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrapapproximated for RExI measures) shown in italics on diagonal. M (SD) for rangeof 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see ReportingConventions).
The two novel exaggeration measures showed a moderate correlation, suggesting some
overlap but also some distinction despite both involving similar content. This is supported
by the overstatement exaggeration not correlating as strongly with narcissism, memory
measures or age. Note that reals claiming in the overclaiming inventory related to number
correct in the overstatement test, confirming content similarity and that reals claiming
captured relevant knowledge.
Age correlated negatively with narcissism and both measures of exaggeration, while
being positively related to vocabulary score and recognition memory accuracy (Hits - False
67
Alarms; r(149) = .18*, 95% CI [.02, .33]), suggesting some benefits of maturity.
Discussion
These preliminary results suggest that the overclaiming items were performing as
expected and should provide a suitable resource for more efficient measures in future
studies.
From these 270 candidate items, reals were selected if claiming them correlated
r >= .40 with correct vocabulary answers and r <= −.19 with answered but incorrect
vocabulary answers, thus capturing genuine knowledge and low exaggeration. Similarly,
foils were selected if claiming them correlated r >= .40 with answered but incorrect
vocabulary answers and r <= −.20 with responding “Don’t Know” on vocabulary answers,
thus capturing exaggerated knowledge and unwillingness to admit ignorance. As expected
by the fluency heuristic, empirically selected reals had, on average, higher OLD20 than
selected foils (.75*** [.43, 1.08], t(17.43) = 4.85; d = 1.91), confirming that higher fluency
for foils is optimal.
This process yielded 20 reals and 10 foils for a new overclaiming inventory designated
the Vocabulary Knowledge Exaggeration (VoKE) inventory. The proportion of reals to foils
in an overclaiming test is largely a subjective choice, balancing the need for an adequate
number of foils for capturing exaggeration with the need to have an adequate number of
reals to keep the test reasonable: Too many unrecognizable items could raise suspicion or
doubt. While some researchers have used item sets of only foils (e.g. Phillips & Clancy,
1972), a common assumption in the overclaiming literature is to have at least 50% reals for
credibility. Williams et al. (2002) compared similar tests with 20% and 50% foils and found
no differences in overall claiming nor the differential between reals and foils claiming.
Study 4 will compare this new VoKE item set and the VST overstatement test against
OCQ items (as used in Study 2) for capturing knowledge exaggeration and predicting
academic performance.
68
Study 4: Robustness of the RExI
Study 2 showed knowledge exaggeration to be a distinct phenomenon, independently
predicting academic performance beyond recognition memory performance, cognitive
ability, self-enhancement, carelessness, and basic demographic variables. Using the
instruments developed in Study 3, Study 4 tests their efficacy and also considers other
possible personality correlates.
Method
The same sampling approach as in Study 2 was used, surveying students enrolled in
any program who happen to be taking a psychology undergraduate course at UBC, but this
study used a cohort from the following year. The Wonderlic was not used to test IQ this
time, as there would have been some overlap in samples (e.g. students persisting in
psychology) and to allow time for including other questions. Again, consent to access
transcript information was explicitly granted at the start of the survey in a process
approved by the institutional ethics review board.
Participants
For this sample, 710 complete survey results were collected, with roughly 25% of
those respondents self-identifying (dichotomously) as male (some missing data, see Data
Merging Loss Effects), 51% claiming English as a first language, and 60% reporting having
lived 10 or more years in Western countries. 71% were in an Arts program, and 60% were
in their first or second year.
Measures
Academic Performance: GPA. As before, overall cumulative average of all courses
taken at UBC was obtained from institutional records after getting explicit consent from
students.
69
Demographics. As in previous studies, respondents were asked at what age they
became fluent in English (then coded as 1 if English was a first language, 0 otherwise), and
the number of years they have lived in Western countries (i.e. English-speaking or Western
Europe), then coded as 1 if 10 or more years, 0 otherwise.
To avoid stereotype threat for the ability tests in the survey, gender was not queried
in the survey itself, but merged with earlier data from a survey run for all participants in
the department’s subject pool. This process, unfortunately, lost some records due to errors
in merging unique identifiers. This loss is due largely to students not correctly recalling a
unique identifier they create to link records, a process which created data missing not
completely at random, meaning that such loss cannot be compensated for. See Data
Merging Loss Effects.
RExI From Vocabulary Size Test (VST) Overstatement. As done in Study 3, a
60-item multiple-choice word-meaning test was adapted from Beglar (2010)’s VST, using
levels 7 through 12, i.e. assuming participants had vocabulary sizes of between 7 and 12
thousand words. This became an overstatement test by adding a non-claiming option
labeled Don’t Know. Like Study 3, to discourage looking up answers, each vocabulary
question had a time limit of 20 seconds. Questions lost due to timeout were not counted as
active incompetence.
Because having a non-claiming option (to avoid answering a question) allows a degree
of freedom between answering correctly and answering incorrectly, a RExI is calculated on
an overstatement test by regressing the number answered incorrectly on the number
correct, and taking the residuals. This captures variance in imagined ability unrelated to
actual ability, providing two orthogonal measures, number correct and a RExI, thus creating
several opportunities:
• The number correct on this test is a simple, conventional measure of vocabulary
knowledge following a common, widely accepted technique.
• The knowledge tapped by this test should be at least somewhat relevant to overall
70
academic performance.
• Adapting it to also derive a RExI allows a direct comparison of the predictive value
of competence vs exaggeration.
• The knowledge domain of this test is similar to the VoKE instrument developed in
Study 3 which uses an overclaiming inventory, so overstatement and overclaiming
approaches to measuring exaggeration can be roughly compared.
RExI From Overclaiming Inventories. A 60-item version (using knowledge domains
of “Historical Names and Events”, “Books, Stories and Poems”, “20th Century Figures”, and
“Sciences”, selected by the originators, each with 10 reals and 5 foils) of the OCQ used in
Study 2 was supplemented with the domain “English Words”, the 30 items of the VoKE
developed in Study 3, which had the same ratio of 13 foils. Item presentation was
randomized, with the knowledge domain presented with each item. Participants were
prompted with “Please rate your knowledge of each item for the given topic” with responses
recorded on a 0 – 10 scale from No Knowledge to Complete Knowledge. Intermingling the
OCQ and VoKE items allowed for a direct comparison of the two item sets, eliminating
method or order effects. Each item here had a time limit of 10 seconds to discourage
looking up answers. Timed out responses were ignored.
Cognitive Reflection Test (CRT). The same set of 7 word problems testing reflective
thinking on math problems (Toplak et al., 2014) used in Study 1 was employed again here
as a cognitive ability control measure. No time limits were imposed for this or any
subsequent questions.
Self-Enhancement. In Study 2 we saw that narcissism related to exaggeration and
predicted GPA. This study focused on more specific aspects of a self-enhancing personality
that may relate to either exaggeration or academic performance. For consistency,
personality measures other than narcissism (both entitlement instruments, intellectual
humility, need for cognition) were scored via Likert rating of “how true the statement is for
you, in general” ranging from Not at all True (0) to Completely True (10).
71
Overplacement. Overplacement is an interpersonal form of overconfidence often
called the better-than-average effect. After completing all of the VST, OCT items, and CRT
problems, participants were asked “Compared to the average student doing these tests, how
well do you think you performed?” on a scale of 0 (Far Below Average) to 5 (Average) to
10 (Far Above Average). The scores from those three ability tests (with reals claiming
minus foils claiming serving as score for the overclaiming inventories) were each
standardized then averaged to serve as an overall general ability measure. That average
was then subtracted from the standardized self rating to this question to capture
overplacement, i.e. how much greater relative rating of ability was than relative
performance. The standardization of scores before taking the difference removes the main
effect of overplacement (i.e. that people in general see themselves as better than unknown
others) in order to examine individual differences.
Narcissism. Because the 16-item version of the NPI proposed by Ames et al.
(2006) proved as predictive as the 40-item version in Study 2, for economy, that version was
employed here, providing scores in a 0 – 16 range, because each item is binary forced-choice.
Entitlement. A closer examination of the 40-item narcissism measure in Study 2
found that predictive correlations of similar size were found with the Exploitiveness /
Entitlement facet as identified by Emmons (1987). As a general measure of entitlement,
Study 4 employed the 9-item Psychological Sense of Entitlement (PES) developed by
Campbell, Bonacci, et al. (2004) which has been found to capture a single factor and show
some discrimination from NPI Entitlement with no link to social desirability. This
instrument was also found to identify a slightly less pathological variant than the NPI
(Pryor et al., 2008). For a more domain-specific measure of entitlement, Study 4 also used
the 8-item Academic Entitlement Questionnaire (AEQ) developed by Kopp et al. (2011),
with the expectation it may be more relevant to this sample and context. This instrument
contained items more applicable to students, such as “Because I pay tuition, I deserve
passing grades.”
72
Intellectual Humility. Krumrei-Mancuso and Rouse (2016) developed the
22-item Comprehensive Intellectual Humility Scale (CIHS) used here, which “measures 4
distinct but intercorrelated aspects of intellectual humility, including independence of
intellect and ego, openness to revising one’s viewpoint, respect for others’ viewpoints, and
lack of intellectual overconfidence” (p.209). Although using a simpler measure, Deffler et al.
(2016) compared intellectual humility with both overclaiming and recognition memory,
finding “participants who were higher in intellectual humility more accurately discriminated
between real and bogus items on the OCQ” (p. 258) and “more accurately distinguished
old from new items” (p. 257), suggesting this personality trait may relate to both
exaggeration and memory.
Need For Cognition. Leary et al. (2017) reported a relationship between intellectual
humility and need for cognition, so the same measure was used here, the 18-item Need for
Cognition Scale (NFCS) developed by Cacioppo et al. (1984). Woo et al. (2007) also reports
a relationship between overclaiming and need for cognition.
Recognition Memory. Using the 60 items (both reals and foils) presented at the
start of the study in the OCQ as old items, 60 more new items (fitting the same knowledge
domains but chosen to be at least as recognizable) were added to create a 120-item
recognition memory test. Responses were a simple Yes/No decision of whether the item
had been seen earlier in the study. By residualizing false alarm rate predicted by hit rate, a
RExI was calculated to indicate memory exaggeration for comparison with other measures
of knowledge exaggeration. Using standard signal detection techniques, a d′ measure of
accuracy was also calculated. Note that, while d′ is a common measure of memory
competence, it will not be orthogonal to exaggeration here.
Careless Responding: Rushing. Wood et al. (2017) investigated the relationship
between item response time (measured in spi : seconds per item) and found “a sharp
drop-off in response consistency for most inventories at approximately 1 spi” (p. 462), after
aggregating response times and calculating an average spi. While those authors “strongly
73
recommend . . . the automatic removal of participants who complete survey instruments at
rates faster than 1 spi” this creates the dilemma of whether to keep participants that
achieve a 1.1 spi average by occasional very slow responses. That approach also discards
data based on an a priori judgement, biasing the sample and losing variance that may be
relevant, as it is here. Instead, a dynamic, interactive approach to measure and control
response times was developed for this study.
The platform used for collecting data was Qualtrics (www.qualtrics.com) which
presents questions online via web browser. The default presentation requires users to select
answers then move their mouse to the bottom right of the screen to click an arrow icon to
advance. For the hundreds of items presented in surveys like the current study, this can
involve hundreds of wrist and finger movements just to advance through the study, which
can become quite tedious. To ease such burden on the participant, custom JavaScript code
was implemented to create automatic progression to the next in a series of questions once a
selection was made. A side-effect of such facilitation is that it becomes easier to respond
carelessly. This may seem to be a liability but is also an opportunity.
To capitalize on this, the JavaScript code also recorded response time and blocked
responses below a certain threshold: 1 second for most items, but 0.5 seconds for the
single-word recognition tasks of overclaiming and memory. Response attempts below
threshold generated the warning “You appear to be responding faster than humanly
possible. Be careful!” which required clicking “OK” then continuing with the question. This
was effectively a real-time penalty for impatient responding that allowed respondents to
continue. Each such rushed response was counted, and the distribution found to be highly
skewed, i.e. modal zero with a steep drop-off and long tail. The long tail was probably
because the warnings can be turned off by a browser setting (after the first few), so
intentionally hasty users could continue without repeated warnings. In such cases,
responses below threshold were still blocked, slightly thwarting their progress. This
protocol effectively allowed participants to express a range of impatience, but only with
74
conscious effort. Because of this highly-skewed distribution, rushed responding was
calculated as the square root of number of rushed responses, to minimize effects of outliers.
This can be seen as a measure of motivated carelessness or contempt because respondents
repeatedly ignore warnings and delays to continue rushing.
Foil Delay. Having individual item response times allowed for examination of how
respondents considered foils relative to reals. The foil delay measure used here is an
individual’s median response time for reals (in an overclaiming inventory) subtracted from
their median response time for foils. This indicates individual differences in processing foil
vs real items, i.e. how much longer one takes responding to foils relative to reals.
Data Lost By Error. Some data were lost due to mistakes in implementation. The
18 NFCS items were inadvertently duplicated into the spaces where the AEQ and PES items
were to go for the first 174 respondents, thus limiting N for those two measures to 536.
Because this loss should be unrelated to which respondents were affected, it can be
considered completely at random, allowing for statistical compensation in the regression
models described later.
Results
Some unexpected outcomes created serendipitous opportunities. Problems in merging
data sets revealed unexpected patterns of carelessness. Errors in survey implementation left
out some data collection but also created opportunities to compare rushed responding with
consistency. Finally, having per-item response time data allowed examination of
overclaiming behavior in more detail, as discussed in Foil Delay.
75
Table 6: Correlations between Study 4 measures
Measure M (SD) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 University GPA .72 (.09) —
2 VoKE RExI .29 (.15) -.22 .75
3 VoKE Foils .16 (.18) -.15 .93 .89
4 VoKE Reals .41 (.22) .14 -.00 .37 .91
5 OCQ RExI .39 (.16) -.20 .64 .52 -.20 .68
6 OCQ Foils .12 (.14) -.13 .67 .72 .25 .79 .91
7 OCQ Reals .32 (.15) .05 .27 .50 .67 -.00 .62 .91
8 VST RExI .35 (.13) -.19 .32 .33 .10 .32 .37 .20 .78
9 VST Correct .66 (.15) .09 -.17 .07 .60 -.36 -.06 .35 .00 .89
10 CRT Correct .44 (.30) .23 -.15 -.12 .05 -.08 -.04 .03 -.16 .03 .74
11 CRT Intuitive .36 (.25) -.14 .09 .09 .02 .01 .00 -.00 .12 .06 -.82 .57
12 Memory RExI .16 (.14) -.21 .37 .33 -.05 .47 .41 .06 .35 -.18 -.19 .06 .87
13 Memory Accuracy (d′) .49 (.14) .29 -.45 -.29 .35 -.55 -.30 .22 -.24 .38 .26 -.13 -.69 —
14 Native English .51 (.50) .00 -.17 -.05 .30 -.36 -.17 .18 -.02 .39 -.16 .22 -.21 .26 —
15 Western Culture .60 (.49) .07 -.23 -.10 .29 -.36 -.21 .12 -.03 .37 -.12 .17 -.26 .30 .52 —
16 NPI-16 .24 (.17) -.08 .16 .11 -.09 .20 .17 .03 .17 -.14 -.07 .02 .17 -.22 -.06 -.10 .67
17 NFCS .56 (.13) .10 -.09 -.00 .22 -.14 .05 .26 -.00 .13 .17 -.12 -.10 .22 .08 .05 .07 .85
18 AEQ .29 (.18) -.22 .23 .18 -.09 .29 .24 .01 .17 -.19 -.14 .02 .31 -.31 -.16 -.18 .28 -.36 .86
19 PES .33 (.19) -.10 .17 .09 -.18 .26 .19 -.02 .12 -.19 -.05 -.02 .20 -.26 -.19 -.26 .47 -.17 .65 .84
20 CIHS .67 (.11) .11 -.22 -.14 .19 -.28 -.18 .07 -.16 .21 .14 -.03 -.28 .31 .11 .17 -.28 .34 -.45 -.42 .85
21 Overplacement .40 (.14) -.12 .30 .18 -.24 .36 .24 -.08 .16 -.36 -.08 .06 .16 -.26 -.12 -.18 .22 -.02 .17 .21 -.13 —
22 Rushed Responding .07 (.11) -.18 .25 .17 -.18 .38 .25 -.09 .21 -.30 -.20 .03 .44 -.44 -.17 -.11 .24 -.11 .32 .23 -.35 .22 —
23 Foil Delay .42 (.14) -.10 .36 .37 .09 .27 .35 .23 .15 -.08 -.00 -.03 .14 -.16 -.10 -.12 .08 .06 .06 .07 -.04 .05 .02
Note: N = 710, except AEQ and PES where N = 536. RExI: Residualised Exaggeration Index. VoKE: Vocabulary Knowledge Exaggeration.OCQ: 60-item Overclaiming Questionnaire. VST: Vocabulary Size Test. CRT: Cognitive Reflection Test. NPI: Narcissistic Personality Inventory.NFCS: Need For Cognition Scale. AEQ: Academic Entitlement Questionnaire. PES Psychological Entitlement Scale. CIHS: ComprehensiveIntellectual Humility Scale. RExI: Residualized Exaggeration Index. Cohen’s α (bootstrap approximated for RExI measures) shown in italics ondiagonal. M (SD) for range of 0 to 1. Probabilities shown as p >= .05, p < .05, p < .01 (see Reporting Conventions).
76
Data Merging Loss Effects. As mentioned above, sex data were intended to be
added to this survey data by linking with screening data collected by the department for all
researchers using this subject pool. Data were linked by having students generate a
(mostly) unique identification code, with similar instructions given during both data
collections. Inevitably, some codes don’t match because students don’t follow the
instructions the same way each time, resulting in some records not being linkable. For this
sample, that accounted for 12% of records gathered that had completed the study.
Comparing merged and unmerged records showed several significant relationships:
Students whose records did not merge had, on average, lower GPA (by -2.19* [-4.04, -.33],
t(109.88) = -2.33; d = -0.25, in percentage points), and higher overall exaggeration, by
.08*** [.04, .12], t(100.61) = 4.06; d = 0.51 (a unitless measure, but the effect size is
important). Non-merged records also showed lower memory and vocabulary performance,
more overconfidence and rushed responding, and had personalities with more narcissism
and academic entitlement, and less intellectual humility.
Consequently, none of that screening data is included in the analyses reported here,
including the measure of sex as control. Given that sex showed no relationship with RExI
measures in previous studies, it is unlikely that any relationship would have been found
here. This merging loss, however, suggests that the process of asking participants to
generate their own unique identification code (e.g. in order to preserve anonymity when
linking data sets) carries a cost of losing data and potentially significantly compromising
data quality.
Predictive Validity
Table 6 presents zero-order correlations between study variables. Table 7, as a
redundant convenience, summarizes significant correlates with RExI measures, as well as
with all four RExIs combined with equal weighting to show aggregate effects.
77
Table 7: Study 4 RExI Correlates, Selected from Table 6
Measure GPA VoKE OCQ VST Memory All
VoKE RExI -.22
OCQ RExI -.20 .64
VST RExI -.19 .32 .32
Memory RExI -.21 .37 .47 .35
All RExI -.27 .78 .81 .66 .73
CRT Correct .23 -.15 -.08 -.16 -.19 -.20
VST Correct .09 -.17 -.36 .00 -.18 -.24
Memory Accuracy (d′) .29 -.45 -.55 -.24 -.69 -.65
Narcissism -.08 .16 .20 .17 .17 .23
Need For Cognition .10 -.09 -.14 -.00 -.10 -.11
Academic Entitlement -.22 .23 .29 .17 .31 .33
Psychological Entitlement -.10 .17 .26 .12 .20 .25
Intellectual Humility .11 -.22 -.28 -.16 -.28 -.31
Overplacement -.12 .30 .36 .16 .16 .33
Rushed Responding -.18 .25 .38 .21 .44 .43
Foil Delay -.10 .36 .27 .15 .14 .31
Note: N = 710, except Entitlement measures where N = 536. RExI:Residualised Exaggeration Index. VoKE: Vocabulary KnowledgeExaggeration. OCQ: 60-item Overclaiming Questionnaire. VST: VocabularySize Test. CRT: Cognitive Reflection Test. RExI: Residualized ExaggerationIndex. Cohen’s α (bootstrap approximated for RExI measures) shown initalics on diagonal. M (SD) for range of 0 to 1. Probabilities shown as p >=.05, p < .05, p < .01 (see Reporting Conventions).
Exaggeration. As summarized in Table 7, all four exaggeration measures behave
similarly, always in opposition with academic performance, yet also with some distinction
from each other. The one exception is Need For Cognition which does not correlate with
exaggeration on the VST. Note that memory exaggeration and VST overstatement both
correlate with foil delay measured on the overclaiming instruments, suggesting that the
cognitive allocation indicated by foil delay reflects exaggeration in general. While the
overstatement method shows weaker correlations with personality measures, it relates to
academic performance comparably with the other formats and almost not at all to the
language and culture controls (see Table 6), suggesting that an overstatement approach to
78
exaggeration may be “cleaner” in some ways. Recall that the VST RExI is (by design)
completely unrelated to the number correct on that test so that both measures from the
same test can be used as unique predictors. In this case, exaggeration is a stronger
predictor of academic performance than knowledge.
Personality. Study 2 showed that narcissism had a significantly negative impact on
academic performance, and we see that result replicated here, although diminished,
possibly because only 16 of the 40 items were used. Recall that Study 2 also found the
entitlement facet of the NPI to be most predictive, which is why entitlement measures were
included in this study, and we see that academic entitlement was more predictive of GPA
and exaggeration, while general entitlement was still significant.
All RExI measures also showed small but consistent relationships with personality
measures NPI, CIHS, AEQ, PES and overplacement, as well as with the rushed responding
behavioral measure. Note that all these measures also predict academic performance,
memory performance, and VST Correct (and mostly CRT Correct) in the opposite direction,
suggesting that exaggeration is a costly, maladaptive form of “self-enhancement”.
While the RExI measures vary somewhat in magnitude of association, it appears that
narcissism, entitlement, overconfidence, impatience, and lower intellectual humility confirm
a consistent personality profile of exaggeration.
Memory. Memory accuracy again relates positively to academic performance and
negatively to exaggeration. The stronger overlap for the overclaiming inventory items (OCQ
and VoKE) than for the VST overstatement test suggests more susceptibility to recognition
bias when using an overclaiming inventory.
Like the RExI derived from OCQ, VoKE and VST items, the RExI calculated on
recognition memory performance shows a similar relationship with GPA, VST Correct and
CRT Correct scores, suggesting the generalizability of the RExI: Memory exaggeration
shows similarities to knowledge exaggeration.
79
Careless Responding. The new measure of carelessness, rushed responding,
correlated significantly with all RExI measures, suggesting impatience is part of
exaggeration. Similar results were found when rushing was counted dichotomously as more
than 2 or 3 warnings; 55% of respondents never rushed a response, and 87% did so only
once or twice over hundreds of items.
Confirmation of Rushed Responding as Careless. Due to an error in
implementing the survey, the 18 items of the NFCS were duplicated in the spots where the
PES and AEQ items were to go for the first 174 respondents. This serendipitously created
the opportunity to use the correlation between answers of these identical item sets as a
check on careless responding, such that the higher the correlation, the more consistent, and
less careless, the response set. Item order was somewhat randomized, so the repetition was
likely not obvious. This measure of consistent responding correlated as expected with
memory accuracy (r(172) = .35***, 95% CI [.21, .47]), and with rushed responding, r(172)
= -.35***, 95% CI [-.48, -.21], similar to findings by Wood et al. (2017) that “found response
times and consistency to be routinely positively associated across inventories” (p. 458).
Despite the bias in the merged prescreen data (described above), an attention check
question in that data also validated the rushed responding measure as indicative of careless
responding, with rushing being related to failing the attention check, r(625) = .39***, 95%
CI [.33, .46].
An advantage of this time-based technique for capturing carelessness is that it does
not require adding extra “bogus” (foil) questions which might alienate or be misunderstood
by the participant, and it provides a continuous behavioral measure to use as a control for
all data rather than an arbitrary cutoff for discarding data. Here we see that rushed
responding shows significant relationships with academic performance, cognitive and
personality measures, indicating a generalized detrimental trait.
Another serendipitous error in survey preparation shone some light on the influence of
careless responding specific to the OCQ items (which show the strongest relationship to
80
rushed responding). Using Microsoft Excel to assemble items, the auto-increment feature
inadvertently changed the domain “20th Century Figures” into “21th”, “22th” through “35th
Century Figures”, so that these items were now effectively absurd. While this only affected
the first 35 responses, no difference in responding (reals rate, foils rate, their sum or
difference) was found.
Exaggeration Methodology. The 30 items of the VoKE did at least as well as the 60
items of the OCQ in predicting GPA, and showed slightly more internal consistency,
suggesting that VoKE items were more efficient and effective at capturing exaggeration,
with less culturally-biased content. As further validation of the item engineering approach,
VoKE reals claiming correlated well with VST correct score and reasonably with GPA (even
more than VST Correct), indicating genuine knowledge was being tapped by those items.
In contrast, reals claiming on the OCQ items did not relate to GPA, showing the academic
irrelevance of that content. We can also note that the mean of VoKE reals claiming was well
more than a standard deviation away from its boundaries and thus was not suffering from
ceiling or floor effects, i.e. item difficulty was neither too easy nor too hard.
This study also resurrected the overstatement approach by modifying a conventional
multiple-choice vocabulary test, the VST. Except for language and culture variables, where
it shows some independence, RExI from the VST shows similar patterns to RExI from the
overclaiming inventories, suggesting that, despite a very different methodology, the
exaggeration index is capturing a similar phenomenon. This contrasts with the
contradictory findings of previous literature, where overclaiming increases with ability while
overestimation (overconfidence) decreases. The RExI approach yields a consistent index.
Note, however, that the overlap between OCQ and VoKE RExI measures is twice what
it is for either with the VST RExI. This suggests some method variance worth further
exploration. As discussed above (and below), there are clearly different advantages and
disadvantages to either methodology, but it is encouraging to see that the RExI extracts
similar information from both, and all RExI measures show similar predictions of overall
81
GPA.
For the VST, it should be noted that, out of 60 questions, the mean (standard
deviation) of number correct was 39.86 (8.76) with a maximum of 59. It may be that
reasonable exaggeration measures are best had from overstatement tests that nobody
scores completely correct on; there needs to be some opportunity for every test taker to
either exaggerate or avoid claiming.
An important advantage of an overstatement approach is that it yields two distinct,
usable measures: ability and exaggeration. Because they are, by definition, orthogonal, the
two zero-order correlations become standardized β coefficients when both measures are
combined to predict academic performance. While the contribution of VST knowledge to
GPA is meager, it is still significant, but more importantly, the VST RExI adds twice the
predictive power, increasing (regression model predicting GPA) R2 from .01* to .04***. If
adding RExI to everyday multiple-choice tests used in education only doubles predictive
power, this simple adjustment to standard testing procedure could have a profound effect
on academic assessment.
Additionally, adding VoKE RExI to that VST model increases R2 even further to
.07***, indicating distinct kinds of validity from the different methods, despite them both
being ostensibly based on the same domain of English vocabulary.
Finally, it is worth noting that the correlations between competence and
incompetence evidence are positive for overclaiming, but negative for overstatement (r(708)
= -.39***, 95% CI [-.45, -.33]), supporting the reasoning behind the RExI formula.
Foil Delay
Having response times for individual items allowed consideration of some cognitive
dynamics of exaggeration. Do people process foils differently than reals, and what might
that imply?
While Study 2 showed that the RExI predicts academic performance beyond cognitive
82
ability, it remained debatable whether this represents evidence of a process unlike typical
conceptions of cognitive ability. Ordinary problem solving takes time to get right. In this
data set, the time taken to answer CRT questions and the quality of answers given correlate
as expected: positively for correct answers (r(708) = .14***, 95% CI [.06, .21]) and
negatively for “intuitive” but incorrect answers, r(708) = -.11**, 95% CI [-.19, -.04]; i.e.
more time thinking gave better answers. It would follow, then, that claiming impossible
knowledge might be the result of shallow, hasty, inadequate processing of information. We
might expect that people who spend more time deliberating about what they do or don’t
know would exaggerate less. Does response time for overclaiming items predict amount of
exaggeration?
Overall, median response times for foils on the overclaiming inventories (OCQ and
VoKE) were slightly less than response times (in seconds) for reals, -.08* [-.16, -.003],
t(1395.38) = -2.04; d = -0.11, but overall median response times for overclaiming did not
relate to RExI on those instruments (r(708) = .05, 95% CI [-.03, .12]) nor to GPA (r(708) =
.00, 95% CI [-.07, .08]). Time spent thinking about overclaiming items in general did not
relate to exaggeration or academic performance.
Nonetheless, the amount of time spent on foils relative to reals (foil delay) tended to
increase with exaggeration amount, as did time spent on foils alone (r(708) = .15***, 95%
CI [.08, .22]), suggesting that exaggeration required more (or at least different) cognitive
effort rather than less. Foil delay also showed a slight detrimental impact on academic and
memory performance. Table 8 shows prediction of overclaiming exaggeration from the
median response time for overclaiming items in general (as control) and foil delay, showing
that higher exaggeration was characterized by (relatively) faster reals claiming and slower
foils claiming.
This is consistent with a transcranial magnetic stimulation study (Amati et al., 2010)
that reported that inhibiting medial prefrontal cortex (MPFC) activity reduced both
response time and foil claiming. Page 269 of that study noted that “regions of the MPFC
83
Table 8: Study 4 Overclaiming Response Times Predicting RExI.
Predictor β SE p value
Overclaiming Item Response Time -.01 .04 .72
Foil Delay .35 .04 <.001
Note: Overall R2 = .12, p < .001. N = 710. RExI: ResidualisedExaggeration Index. Foil Delay = Median Foils Time - Median Reals Time.RExI: Residualized Exaggeration Index.
are found to be particularly important for comparing the self to others.” Exaggeration
appears to be less about amount of mental processing than about allocation, less about
degree and more about kind of thinking.
Incremental Validity
Study 2 showed how exaggeration uniquely predicted academic performance beyond
cognitive ability, carelessness and demographic variables. Do these other exaggeration
measures do the same, and are they distinct from each other?
The shorter OCQ RExI used in this study no longer significantly relates to academic
performance once controlling for memory performance, which may be due to the
inappropriateness of OCQ content or that OCQ items were used for the memory test.
However, the other three exaggeration measures, VoKE overclaiming, VST overstatement,
and recognition memory, all uniquely predict GPA after controlling for CRT cognitive
ability, rushing, and demographics. Keeping memory exaggeration (and performance in
general) as a control, Table 9 shows that the shorter overclaiming inventory of the VoKE
captures exaggeration as uniquely predicting academic performance, beyond other study
measures. Similarly, Table 10 shows that the overstatement test based on the
multiple-choice VST also predicts GPA uniquely. Despite the relatively low correlation
between these two measures of exaggeration, based on different methods, they are both
behaving similarly. As before, in every model, the β for incompetence evidence is exactly
the RExI, once controlling for evidence of competence.
84
Table 9: Regression Model Predicting GPA from VoKE RExI in Study 4
Predictor β SE p value γ Neffective
VoKE RExI -.13 .04 .003 .008 704
VoKE Reals Rate .20 .05 <.001 .01 702
VST Correct -.08 .05 .08 .009 703
CRT Correct .14 .04 <.001 .008 704
Memory RExI -.10 .04 .02 .01 701
Memory Hits .09 .04 .04 .010 702
Native English -.15 .09 .09 .009 703
Western Culture .08 .09 .39 .02 699
Rushed Responding -.03 .04 .43 .010 703
Narcissism -.01 .04 .80 .06 669
Psychological Entitlement .12 .06 .05 .25 532
Need For Cognition -.03 .04 .46 .04 682
Intellectual Humility -.04 .04 .38 .03 685
Academic Entitlement -.22 .06 <.001 .24 538
Overplacement -.01 .04 .72 .01 700
Note: RExI: Residualised Exaggeration Index. VoKE: VocabularyKnowledge Exaggeration. VST: Vocabulary Size Test. CRT: CognitiveReflection Test. Impact of missingness indicated by γ. RExI:Residualized Exaggeration Index.
Missing Data Compensation. As noted above, the two entitlement measures had
significant missing data. The cause of this missingness was simply an oversight in survey
implementation, and so should have no relationship with which respondents were affected.
Thus, this data is missing completely at random, and missingness should be unrelated to
any survey measures. This allows for the use of statistical methods to compensate for
missingness.
The technique used here is full information maximum likelihood (FIML) which
estimates values based on existing relationships within the complete data set, as
recommended by Schafer and Graham (2002). The estimated impact of missingness on the
standard errors (SE) for each predictor is given by γ (which has a maximum value of 1) in
the tables. Note that this impact is minimal for the RExI measures. The software used to
85
Table 10: Regression Model Predicting GPA from VST RExI in Study 4
Predictor β SE p value γ Neffective
VST RExI -.12 .04 .004 .007 705
VST Correct -.06 .04 .19 .007 704
CRT Correct .14 .04 <.001 .007 704
Memory RExI -.09 .04 .04 .01 702
Memory Hits .14 .04 <.001 .01 702
Native English -.12 .09 .16 .007 704
Western Culture .13 .09 .15 .01 700
Rushed Responding -.03 .04 .53 .009 703
Narcissism -.00 .04 .99 .05 671
Psychological Entitlement .11 .06 .08 .24 538
Need For Cognition -.01 .04 .79 .04 684
Intellectual Humility -.03 .04 .50 .03 687
Academic Entitlement -.20 .06 .001 .24 542
Overplacement -.03 .04 .44 .01 701
Note: RExI: Residualised Exaggeration Index. VoKE: VocabularyKnowledge Exaggeration. VST: Vocabulary Size Test. CRT: CognitiveReflection Test. Impact of missingness indicated by γ. RExI:Residualized Exaggeration Index.
compute this is based on Biesanz (2020).
Discussion
Study 4 replicated results of Study 2, but more broadly, using different, innovative
measures of exaggeration. The engineered VoKE overclaiming inventory was more effective
and efficient than OCQ items, and the VST overstatement test demonstrated a different way
to capture exaggeration, one which also showed unique predictive validity with academic
performance.
All exaggeration instruments painted a similar personality portrait of entitlement,
overconfidence, impatience, lack of humility, and narcissism. These personality traits also
revealed themselves to be detrimental to academic success.
The discovery in Study 2 that exaggeration appeared to be distinct from cognitive
86
ability was supported and elaborated by the finding that relatively more (or at least
different) thought was required to exaggerate than less. The phenomenon of exaggeration
appears to fall somewhere between conventional conceptions of intelligence and personality,
as an individual difference that reflects both information processing and self-image.
A novel measure of motivated carelessness, rushed responding, was implemented and
validated. By facilitating quick responses to repeated item formats, participants had
flexibility to respond carefully or hastily, with conscious intention required to persist in
rushed responding. By allowing and measuring this flexibility, this content-free behavioral
indicator of inattention captured more than situational carelessness, as shown by correlates
with other trait measures.
Inadvertently, a weakness was found in the process of linking surveys by use of a
respondent-generated code. Those who succeeded in replicating their code where found to
be significantly different on several important study measures. Had this pattern not been
detected, results could have been significantly limited.
Validation for the theoretical principles behind operationalizing exaggeration as the
RExI was also found: Regardless of whether evidence for competence and incompetence
correlated positively (as it does in overclaiming inventories) or negatively (as it does with
overstatement), the extracted RExI shows similar relationships with other variables.
The RExI appears to reliably capture exaggeration regardless of content and format,
and to demonstrate consistent relationships with cognition and personality variables, while
also showing distinction.
87
General Discussion
The tendency to imagine one’s abilities as greater than they are, to have an
exaggerated sense of self, has seen scientific inquiry for at least a century. The
overstatement test, a name synonymous with the dictionary definition of exaggeration, was
an early approach with great face validity: repeated measurement of someone’s predicted
and actual success with tasks. Unfortunately, that approach faded into obscurity by the
1960s. The similar (but less psychologically direct) approach of overestimation has
considerable recent application as the most popular form of measuring overconfidence
(D. A. Moore & Healy, 2008) but uses difference scores which methodologically correlate
negatively with the ability being overestimated, thus obscuring distinction between actual
and self-perceived competence.
Overclaiming, comparing endorsement of real (genuine) and foil (non-existent) items,
also has a century-long legacy, but has had multiple, inconsistent applications and
interpretations. Foil claiming has been assumed to indicate self-enhancement (faking,
socially-desirable misrepresentation), memory bias, or carelessness, with little or no
research distinguishing these interpretations. For ability overclaiming, both reals and foils
claiming can correlate positively with objectively measured competence, thus also
obscuring distinction between reality and self-perception.
The lack of clarity in these two literatures may be due to a tendency for psychological
research to be theory-driven: Conjectures about hypothesized psychological processes are
empirically tested via operationalizations. This can involve several layers of assumptions
connecting the test outcomes with the theorized constructs. The approach taken in the
present research was somewhat the reverse: A specific concrete behavior (active
incompetence) inspired an analytic procedure, the RExI, which then provided evidence for
understanding the behavior. Thus, the present research does not so much validate a
psychological hypothesis as it does elucidate a common behavior.
This reversed approach (going from evidence to theory instead of theory to evidence)
88
was motivated by existing research on overconfidence and overclaiming, both of which may
have been influenced by confirmation bias created by theoretical assumptions. If
overconfidence is assumed to be detrimental, then the difference scores from overestimation
measures will necessarily confirm that assumption, because they correlate negatively with
the ability being measured. Likewise, foils claiming, because it follows reals claiming and
actual ability, will often correlate positively with self-image and so appear as
self-enhancement. Considering those two literatures together, however, leads to a
contradiction: Overestimation results suggest that an exaggerated sense of competence
decreases with genuine competence, while overclaiming results suggest the opposite. To
rectify this, the RExI was designed to remove influence of the ability being exaggerated.
In retrospect, the RExI use of regression may seem an obvious choice, so why had it
not become the preferred approach before? For much of the last century, linear regression
tools were not as easily accessible, either materially or cognitively. As noted above, simple
techniques (e.g. difference score, raw foils claiming) confirmed hypotheses, undermining
motivation to question further. Methodological habits also form: “Over time, cottage
industries have emerged that take one measure off the shelf, accepting its validity without
further inquiry.” (Krueger & Wright, 2011, p. 20).
Even if regression techniques were used, error in self-image of competence
(exaggeration) was neglected as a distinct psychological phenomenon. For example,
researchers had not “considered foil claiming as a meaningful variable in its own right.”
(Paulhus, 2012, p. 152). Even the correction-for-guessing literature held a theoretical bias
built into the name: “Correction” (of ability scores) assumes that “guessing” is not a
distinct dimension worthy of separate study, but rather an influence to be subsumed as
part of the competence being measured. Much of the value of the RExI may be that it
highlights something that had been hiding in plain sight.
89
Research Outcomes
The goal of the present research was to investigate if excessive self-image of
competence impacted performance, and if so, the nature of that impact. To explore that
question fairly, the RExI was developed to remove influence of competence on self-image.
Validation
The primary research question was whether, after removing the “baby” of measured
ability, would the “bathwater” of the RExI have any meaningful relationship with broader
performance of the ability. This statistical dumpster diving — looking for useful variance
in the residuals after the ostensibly valuable variance had been partialed out — paid off.
Performance Outcomes. In every application of the RExI in this research, whether
the content was 1980s American culture trivia, English vocabulary, or recognition memory,
or whether the format was overclaiming or overstatement, exaggeration of ability predicted
impaired use of that ability, i.e. knowledge (and memory) exaggeration predicted lower
academic performance. While the effect size was fairly small, it was consistent, and (in
Study 2) comparable with, yet distinct from, a commercial general cognitive ability test in
predicting a broad life outcome (GPA).
While the raw behavioral measures used in overclaiming (reals rate and foils rate) or
recognition memory (hit rate, false-alarm rate) tend to correlate positively, but the raw
measures used in overstatement (number correct, number incorrect) correlate negatively,
the RExI showed similar relationships with other measures. This shows that the RExI could
be an effective tool in unifying previous findings about overconfidence and overclaiming.
Because the RExI puts individual perception-performance discrepancy in the context of
group performance (by using regression to interpret discrepancy relative to others in the
sample), this approach could also (somewhat) conceptually unify overestimation and
overplacement, creating a more wholistic measure of overconfidence.
90
Self-Enhancement. Consistent with an intuitive notion of overconfidence or
exaggeration, that one maintains an unrealistically high self-image, RExI measures reliably
showed positive relationships with narcissism, and particularly with a maladaptive facet of
narcissism, entitlement. This was supported by similar relationships with overplacement,
known as the better-than-average effect, and with lower intellectual humility, both
expressions of overconfidence. However, Study 2 showed negligible relationships between
exaggeration and the BIDR measures of Impression Management, Self-Deceptive
Enhancement, and Self-Deceptive Denial. This may either indicate a distinction in the kind
of self-enhancement represented by exaggeration, or inadequacy of the BIDR measures in
this particular application.
Cognitive Bias. Besides academic outcomes, exaggeration measures also reliably
predicted lower performance on cognitive tests, including general intelligence (the
Wonderlic Personnel Test), reasoning ability (the Cognitive Reflection Test), vocabulary
knowledge (VST Correct), and recognition memory accuracy. Memory exaggeration was
related to knowledge exaggeration, but did not fully explain the relationship between
knowledge exaggeration and academic performance. Memory exaggeration was also related
to self-enhancement measures, suggesting that, while false recognition may reflect some
degree of “innocent”, unmotivated errors in information processing, it may also be
influenced by egoic motivational processes. Further research on this interaction between
cognitive function and identity maintenance is warranted.
Carelessness. Given the use of foils claiming for capturing inattentive survey
responses, it was important to ensure that foils claiming in an overclaiming inventory was
not simply an expression of carelessness. While carelessness (measuring by rushing and by
longstrings) showed some relationship with exaggeration, it clearly was not a predominant
explanation.
The rushed responding measure used in Study 4 required an intentional form of
“carelessness”, because hasty participants were repeatedly warned about (and thwarted in)
91
responding too quickly, yet some persisted regardless. This persistent impatience showed
significant relationships with cognitive and personality measures, suggesting it captured
something more trait-like than situational inattentiveness, which is consistent with
carelessness as an indicator of personality (Bowling et al., 2016). Rather than exaggeration
being an artifact of carelessly invalid responding, the relationship between this impatience
and RExI measures broadens the personality profile of exaggeration.
Divergent Validity. Exaggeration showed similar patterns of association with
academic and cognitive performance, narcissism, entitlement, overplacement, intellectual
humility, and carelessness. Yet, these associations were generally fairly weak, and not
enough to explain exaggeration. Several other measures considered here (Big Five
personality, metacognition, growth mindset, Impression Management, Self-Deceptive
Enhancement, Self-Deceptive Denial, Need for Cognition) showed little if any significant
relationship with exaggeration. Thus, while showing that exaggeration is multi-faceted, the
correlates explored in this research do not fully characterize this behavior.
While Study 2 showed that exaggeration predicts academic performance distinctly
from general intelligence, the consistent relationships between the RExI and cognitive
measures suggested that exaggeration may still be due to weak cognition. Response time
analyses (with Foil Delay in Study 4) confirmed prior evidence that exaggeration
represented not hasty or superficial thinking, but rather a different kind of thought,
perhaps one that handicaps potential. Together, this suggests that exaggeration may
represent a novel construct worth further investigation.
Incremental Validity. The above explanations (self-enhancement, cognitive bias,
carelessness) of exaggeration were considered because of previous research linking foils
claiming to these psychological processes. Despite consistent relationships with these
alternative explanations (and others, such as overplacement, intellectual humility,
entitlement, rushing), exaggeration was shown to maintain a significant, distinct
relationship with academic performance. This distinct relationship held after controlling for
92
participant variables of sex, having English as a native language, and having a Western
cultural background. Regression models in Study 4 showed that, even though reasoning
ability (CRT Correct), memory functioning, and academic entitlement remained significant
predictors of GPA (validating those measures), RExI from both overclaiming and
overstatement inventories still uniquely predicted academic performance.
Altogether, by extracting exaggeration of an ability unrelated to the ability itself, the
error of self-imagined competence, the RExI appears to provide a potent new technology for
understanding overconfidence, a trait which has been called “the mother of all psychological
biases” (D. A. Moore, 2018, p. 1).
Limitations
Despite the consistent, convergent patterns of associations, the different RExI
measures showed only small to moderate inter-correlations, and even some unique
predictive validity of GPA. Future research should elucidate convergent and divergent
validity of the RExI when used with various formats and content.
All of the studies presented here are correlational, lacking controlled experimental
interventions, so no inferences of causality can be made. Nonetheless, the decrease in
exaggeration with age noted in Study 3 suggests either some malleability or generational
effects. Future research should explore situational, cultural, and motivational precursors to
exaggeration.
These studies relied on convenience samples of undergraduate students for obvious
reasons, but also because it was possible to collect a powerful, wholistic outcome dependent
variable (DV), university GPA. There may be no other comparable situation where
(mostly) adult humans, from a fairly diverse background (UBC has a large international
student cohort), invest the majority of their efforts toward a (mostly) consistent goal over a
relatively long time, with performance repeatedly measured by coordinated professionals.
University GPA represents a composite of many competencies beyond cognitive ability, such
93
as emotional resilience, social skills, and time, health and mood management. While the
exaggeration measures used here showed relatively small effect sizes, results were robustly
consistent, considering the breadth of the outcome DV.
Exaggeration in the present research identified some costs of excessive self-construal
of ability. It is entirely possible that there are also costs of miserly self-image of
competence, of under-confidence, or not adequately recognizing one’s own capabilities.
This presents a methodological conundrum, given the difficulty of measuring an
competence when it is not expressed. It may also be the case that the psychological
processes leading to inadequate ability self-construal are not simply uni-dimensional, linear,
polar opposites of exaggeration. Regardless, it should be noted that the RExI taps a
unipolar, linear construct that may not be relevant to issues of low self-confidence.
Future Research
The findings presented here set the stage for considerable future research. Previous
findings using overestimation or overclaiming could be clarified by replicating with a RExI
analysis. Different techniques for gathering evidence of competence and volunteered
incompetence could be considered, such as weighting item difficulty for more refined
estimates.
While the current results with academic performance suggest some generalization
across cognitive abilities (e.g. memory, problem-solving), other, non-cognitive domains of
ability should be explored. For example, if the RExI were applied to claims of physical
ability, would this predict impairment in athletic performance, perhaps more fouls, errors,
injuries, or other misjudgments? Further research should explore both other domains of
exaggerated competence (such as in sports or health care), and other outcome variables,
such as job performance indicators, or history of professional errors. Also, the educational
potential noted below should be supported by qualitative research to inform appropriate
and ecologically valid applications.
94
Finally, the findings that recognition memory exaggeration shows similarities with
knowledge exaggeration (including several non-cognitive correlates) may open new avenues
of research in cognitive psychology. The ubiquitous use of signal detection theory in
memory research assumes an information-processing model to distinguish signal (accurate
memory) from noise (response bias). Without contradicting that model, a RExI approach
may provide additional insight into how cognition is situated within the social, egoic
mental processes that influence behavior. When responding to stimuli, even in a laboratory
setting, human response may be conditioned by numerous, broad, external factors. The
above discussion about foil delay suggests that, when deciding whether or not a word had
been seen 20 minutes earlier in a lab study, social comparison may influence their answer,
even though nobody else will ever know. That RExI measures show distinction from
conventional cognitive and personality measures invites more research beyond those
academic silos, integrating both mechanistic information-processing and humanistic social
psychology perspectives.
What is this thing called Exaggeration?
At a minimum, the current research established that more information can be
extracted from a test of knowledge to better predict subsequent knowledge performance, by
residualizing test errors using the RExI technique. But what does it all mean?
Conceptually, exaggeration can be seen as a modeling error. Our self-image is our
self-model, and exaggeration is error in internal representation of competence, just as one’s
body-image can involve error in internal representation of physical self. Someone could
conceivably use a RExI approach to measure body-image error: Gather self-estimates of
weight, strength, or other physical characteristics, then residualize from objective measures.
But what does error in self-image of competence mean? Where does that come from?
The current research explored some of the consequents (e.g. academic performance), but
what are the antecedents? On that point, more evidence is required. In response to
95
possible theory-motivated reasoning that may have led to oversights in overconfidence and
overclaiming literature, I deliberately took a more agnostic approach to exploring
exaggeration: Forget any inferences drawn from measures that may be confounded with
competence, and start from scratch with likely suspects.
Results presented here suggest aspects of self-enhancement, overconfidence and
entitlement, and cognitive liabilities, and I have other (unpublished) research adding
hindsight bias, antagonism, gullibility and cheating to the mix. All of those, however, are
fairly weak correlates, not explanations. Exaggeration may be described as whatever
process leads to such a pattern. Altogether, this would suggest a tendency to favor a
flattering self-image over reality, to avoid the potential humiliation of acknowledging
ignorance or incompetence, and prefer personal over objective truth. From that
perspective, further research should examine whether exaggeration relates to
epistemological tendency to assess truth by evidence, authority, or social pragmatism:
”Social beliefs can be regarded as true if they are true in their consequences, that is, if the
consequences are desirable” (Krueger & Wright, 2011). One may evaluate the “truth” of
one’s competence based on their own authority or what they imagine others believe.
Ultimately, paraphrasing Forrest Gump, exaggeration is as exaggeration does. Just as
a knowledge test can be considered a behavioral construct, an operationalization of the
latent construct of competence, so the RExI can be considered a behavioral construct, an
operationalization of exaggeration. Competence measures (tests) can be influenced by other
factors, such as motivation, fatigue, or anxiety; likewise, exaggeration measures can be
influenced by memory bias or carelessness. Thus, test score represents not just competence
but also motivation, etc., and the RExI represents exaggeration and other influences. While
we might think test score indicates competence, it really only counts correct answers,
however obtained. Similarly, a RExI score suggests exaggeration, but just indicates error
rate controlled for success rate. Thus, competence is defined by its imperfect manifestation
(performance), and exaggeration is defined by its imperfect manifestation as RExI scores.
96
Just as many improvements have been made in measuring competence (e.g. Item Response
theory), improvements on the RExI will likely emerge to better capture exaggeration.
Potential Applications
That the RExI robustly predicts a comprehensive life outcome like university GPA
suggests that it may be a useful measure in many situations where knowledge performance
matters.
The two formats explored in Study 4 show different advantages and disadvantages.
An overclaiming format provides a time-efficient, low-stress measure of exaggeration but no
objectively measure of ability. That format also seems to have stronger personality and
memory correlates, and so may be more appropriate when used in conjunction with other
competence tests. While an overclaiming test can be quick and easy, the use of foils may
create some unexpected reactance effects, such as suspicion or confusion, or ethical
complaints about potential deception. Care should be taken in developing overclaiming
inventories (as done for the VoKE) to establish that reals claiming relates to appropriate
ability and is at least somewhat distinct from foils claiming. Extra care should be given to
ensuring foils do not indicate any kind of ability.
The overstatement format carries the time and stress burdens of any objective ability
test, but has the advantage of supplying a conventional ability score (number correct) in
addition to the RExI, so both can be used synergistically. In situations where a traditional
ability score is required, adaptation to provide a RExI could a) diminish some measurement
error due to guessing, b) provide incremental validity (if including RExI in predictions), and
c) provide important additional feedback about individual items, i.e. not just rate correct or
incorrect, but rate not claimed. Even if the RExI is not used as a quantitative predictor, the
extra information gained by allowing non-claiming may support useful qualitative insights.
An additional benefit of the RExI is that it is a behavioral measure that can predict a
behavioral outcome (e.g. GPA), avoiding the complications and assumptions of self-report
97
measures. As a conceptual analogy, consider a traffic cop pulling over a suspected
inebriated driver. If the driver (implicitly claiming driving competence) fails simple
behavioral tests (e.g. walking a straight line), the officer can make a reasonable inference of
future performance (e.g. an accident), without needing to know the underlying causes or
processes. Similarly, the RExI provides the practical convenience of using a simple
behavioral test to predict impairment on future performance — understanding
psychological mechanisms is optional. A RExI may prove to be convenient and useful in a
variety of settings.
Standardized Testing
In every study, exaggeration of an ability showed incremental predictive validity
beyond that of the ability itself, sometimes even to a larger degree. In the United States, a
sizeable industry has grown around predicting academic performance:
“Standardized-testing regimens cost states some $1.7 billion a year overall, or a quarter of 1
percent of total K-12 spending” (Ujifusa, 2012). Despite a long history of criticism (Hutt &
Schneider, 2018), the greatest value of such testing may be in predicting future outcomes,
such as whether an education system produces economically productive workers.
Integrating a RExI into such testing could improve such predictions both quantitatively and
qualitatively, given the range of non-cognitive associations presented here, providing more
accurate and more wholistic evaluation.
Human Resources
Job applicants have shown negative attitudes to pre-employment personality tests yet
not to ability tests (Rosse et al., 1994). Given that identifying narcissistic, entitled,
impatient, overconfident exaggerators with unreliable memories may be valuable in
employee selection, yet be problematic to assess directly without reactance, a simple
vocabulary test (for example) that indirectly captures these traits may be very useful.
Further research may show context-dependence effects, e.g. that exaggeration is helpful for
98
marketing but not for engineering.
Education
Beyond simply increasing predictive value, incorporating ignorance awareness in
education could be broadly beneficial. In most objective tests (e.g. multiple-choice),
students are encouraged to guess if they don’t know the answer. A common rationale for
guessing is that it can improve one’s test score, albeit illegitimately. This framing implicitly
encourages a performance motivation, e.g. maximizing marks regardless of ability, instead
of mastery motivation which prioritizes understanding and yields better academic outcomes
(Kaplan et al., 2002). Rarely, if ever, are students encouraged to critically assess their
ignorance or incompetence, depriving them of the chance to develop self-awareness and
emotion management skills. Automatically translating ignorance into failure (forcing an
incorrect guess) also prevents instructors from getting important feedback, both about the
student and about the questions being asked. The act of guessing itself may be harmful, as
students “can also learn false facts from multiple-choice tests; testing leads to persistence of
some multiple-choice lures on later general knowledge tests” (Marsh et al., 2007, p. 194). If
someone does not know the answer, why do we teach them to pretend they do?
By encouraging guessing, we not only lose important qualitative information about
the student and the test, we also deteriorate our quantitative measures. There has been a
decades-long debate about how to deal with guessing on multiple-choice tests, with strong
evidence that it should not be ignored (e.g. Espinosa & Gardeazabal, 2010) even though it
typically is. An oversight of this literature is that it assumes a unidimensional measure of
ability, e.g. adjusting number correct with some correction-for-guessing algorithm. No such
algorithm has been universally adopted, probably due to complexity (compared to simply
counting number correct) or that no single, simple algorithm gives consistent, comparable
results. Allowing guessing confounds the ostensible knowledge measure with personality
factors like risk-taking (Alnabhan, 2002). The research presented here shows that allowing
99
admission of ignorance not only allows for more predictive validity (i.e. with the RExI), but
also identifies other personality confounds, like entitlement, overconfidence or narcissism.
Even without calculating a RExI, allowing non-claiming of ability can give educators an
immediately sense of student (over)confidence or misconception and respond accordingly.
Across disciplines, use of the RExI may inform pedagogical research, by distinctly
quantifying failure in self-image. The known benefit of repeated testing (e.g. Larsen et al.,
2009) may owe some of its effectiveness to failure feedback, thus reducing tendency to
exaggerate. A related pedagogical approach is a technique called productive failure which
allows students to confront their ignorance of a subject (failure) which subsequently
increases productivity of learning (e.g. Chowrira et al., 2019). Perhaps confronting
ignorance or incompetence is a “desirable difficulty”, a challenge that improves learning
(e.g. Bjork & Kroll, 2015).
With the RExI providing an empirical foundation, the widely lauded but weakly
defined goal of teaching “critical thinking” (T. Moore, 2013) could become more tangible:
Skill in evaluating new knowledge may be grounded in ability to evaluate one’s existing
knowledge. The emerging scientific study of wisdom (Grossmann et al., 2020) suggests that
an important component may be intellectual humility, the “ability to accurately
acknowledge one’s limitations and abilities” (Van Tongeren et al., 2019, p. 463). The
significant relationship between intellectual humility and exaggeration noted in the present
research hints that the RExI may be a convenient tool in supporting future wisdom research.
Being unrelated to the ability being measured, the RExI allows for research that need
not interfere with ordinary educational practices, supporting scientific advancement
without ethical compromises. While more ecologically-valid contextual research (e.g. in
classrooms, examining student and teacher perspectives) is required before RExI scores
could be used in summative evaluations (e.g. determining grades for academic progression),
the additional, quantitative data provided by exaggeration scores would, at least, be
valuable formative feedback that helps students better understand both what they are
100
learning and themselves.
Conclusion
The present research aims to show that exaggeration (as captured by the RExI) offers
a more accurate measurement of what is intuitively connoted by the terms overconfidence,
overestimation, overclaiming, or overstatement. This contribution to psychology is
important because it a) identifies and rectifies oversights in previous research, b) provides a
unifying conceptual and analytic framework and, c) establishes that the underlying
phenomenon is ubiquitous and maladaptive.
In Errol Morris’s 2013 documentary about former U.S. Secretary of Defense Donald
Rumsfeld, The Unknown Known, besides famously distinguishing between “known knowns”
and “unknown unknowns”, Morris has Rumsfeld describe “unknown knowns”: “things that
you think you know, that it turns out you did not” (Morris, 2013). That documentary
serves as a case study in rationalized bad decisions.
In addition to the well-known arrogance of politicians, ordinary people also appear to
be susceptible to thinking they know things they do not. This has become low-hanging
fruit for comedians, asking people to give opinions on non-existent subjects, such as
imaginary musical groups, fashion designers or movie scenes, as demonstrated by Jimmy
Kimmel’s “Lie Witness News” (Kimmel, 2020).
Such delusions may be widespread because of an innate human heuristic conflating
confidence with competence (Birch et al., 2010). From an evolutionary perspective,
expressed confidence probably adequately predicted genuine competence enough to be an
adaptive cue of whom to trust and follow. In our modern world, where many of us interact
more with technology than other humans, this heuristic might cause problems, such as
electing the confident more than the competent (Ronay et al., 2019). This is supported by
evidence that foils claiming predicts anti-establishment voting (van Prooijen & Krouwel,
2019). In short, modernity may create a positive-feedback loop for this abuse of an evolved
101
heuristic (exaggeration) which may have increasing societal impact.
The pitfalls of unwarranted ability claiming have been appreciated for millennia, as in
the teaching of Lao Tzu 2600 years ago: “He who tries to shine dims his own light.”
(Mitchell & Tzu, 1992, chapter 24). With the hazards of humanity’s hubris becoming
increasingly apparent, raising awareness of our habitual miscalibration could be restorative.
Both individually and collectively, ignorance and incompetence are inevitable, and should
not elicit shame, denial or deceit, but rather curiosity. In our information-driven society,
the unmeasured often goes unnoticed, so if the RExI became a common metric of
mis-calibration, the increased salience may guide more judicious management of our
limitations, and avoid some of the dangers of “unknown knowns”.
102
References
Ackerman, P. L., & Ellingsen, V. J. (2014). Vocabulary overclaiming — A complete
approach: Ability, personality, self-concept correlates, and gender differences.
Intelligence, 100 (46), 216–227. https://doi.org/10.1016/j.intell.2014.07.003
Ackerman, R. A., Witt, E. A., Donnellan, M. B., Trzesniewski, K. H., Robins, R. W., &
Kashy, D. A. (2011). What Does the Narcissistic Personality Inventory Really
Measure? Assessment, 18 (1), 67–87. https://doi.org/10.1177/1073191110382845
Alnabhan, M. (2002). An empirical investigation of the effects of three methods of handling
guessing and risk taking on the psychometric indices of a test. Social Behavior and
Personality: An International Journal, 30 (7), 645–652.
https://doi.org/10.2224/sbp.2002.30.7.645
Amati, F., Oh, H., Kwan, V. S. Y., Jordan, K., & Keenan, J. P. (2010). Overclaiming and
the medial prefrontal cortex: A transcranial magnetic stimulation study. Cognitive
Neuroscience, 1 (4), 268–276. https://doi.org/10.1080/17588928.2010.493971
Ames, D. R., Rose, P., & Anderson, C. P. (2006). The NPI-16 as a short measure of
narcissism. Journal of Research in Personality, 40 (4), 440–450.
https://doi.org/10.1016/j.jrp.2005.03.002
Anderson, C. D., Warner, J. L., & Spencer, C. C. (1984). Inflation bias in self-assessment
examinations: Implications for valid employee selection. Journal of Applied
Psychology, 69 (4), 574–580. https://doi.org/10.1037/0021-9010.69.4.574
Atir, S., Rosenzweig, E., & Dunning, D. (2015). When Knowledge Knows No Bounds:
Self-Perceived Expertise Predicts Claims of Impossible Knowledge. Psychological
Science, 26 (8), 1295–1303. https://doi.org/10.1177/0956797615588195
Back, M. D., Schmukle, S. C., & Egloff, B. (2010). Why are narcissists so charming at first
sight? Decoding the narcissism–popularity link at zero acquaintance. Journal of
Personality and Social Psychology, 98 (1), 132–145.
https://doi.org/10.1037/a0016338
103
Bailey, T. M., & Hahn, U. (2001). Determinants of Wordlikeness: Phonotactics or Lexical
Neighborhoods? Journal of Memory and Language, 44 (4), 568–591.
https://doi.org/10.1006/jmla.2000.2756
Baker, P. (2020). ‘Person. Woman. Man. Camera. TV.’ Didn’t Mean What Trump Hoped
It Did. The New York Times.
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B.,
Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English
Lexicon Project. Behavior Research Methods, 39 (3), 445–459.
https://doi.org/10.3758/BF03193014
Beglar, D. (2010). A Rasch-based validation of the Vocabulary Size Test. Language Testing,
27 (1), 101–118. https://doi.org/10.1177/0265532209340194
Bensch, D., Paulhus, D. L., Stankov, L., & Ziegler, M. (2017). Teasing Apart Overclaiming,
Overconfidence, and Socially Desirable Responding. Assessment, 26 (3), 351–363.
https://doi.org/10.1177/1073191117700268
Biesanz, J. (2020). Functions for Applied Behavioural Sciences.
Birch, S. A. J., Akmal, N., & Frampton, K. L. (2010). Two-year-olds are vigilant of others’
non-verbal cues to credibility. Developmental Science, 13 (2), 363–369.
https://doi.org/10.1111/j.1467-7687.2009.00906.x
Bjork, R. A., & Kroll, J. F. (2015). Desirable Difficulties in Vocabulary Learning. The
American journal of psychology, 128 (2), 241–252.
Bond, J. A. (1986). Inconsistent Responding to Repeated MMPI Items: Is Its Major Cause
Really Carelessness? Journal of Personality Assessment, 50 (1), 50–64.
https://doi.org/10.1207/s15327752jpa5001 7
Bowling, N. A., Huang, J. L., Bragg, C. B., Khazon, S., Liu, M., & Blackmore, C. E.
(2016). Who cares and who is careless? Insufficient effort responding as a reflection
of respondent personality. Journal of Personality and Social Psychology, 111 (2),
218–229. https://doi.org/10.1037/pspp0000085
104
Brogden, H. E. (1940). A factor analysis of forty character tests. Psychological Monographs,
52 (3), 39–55. https://doi.org/10.1037/h0093562
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A New
Source of Inexpensive, Yet High-Quality, Data? Perspectives on Psychological
Science, 6 (1), 3–5. https://doi.org/10.1177/1745691610393980
Burks, S. V., Carpenter, J. P., Goette, L., & Rustichini, A. (2013). Overconfidence and
Social Signalling. The Review of Economic Studies, 80 (3), 949–983.
https://doi.org/10.1093/restud/rds046
Bynum, L. A., & Davison, H. K. (2014). A Comparison of Faking on Equity Sensitivity
Measures Using the Overclaiming Instrument. Journal of Managerial Issues, 26 (4),
345–364.
Byrne, R. W., & Whiten, A. (1992). Cognitive evolution in primates: Evidence from
tactical deception. Man, 609–627.
Cacioppo, J. T., Petty, R. E., & Kao, C. F. (1984). The Efficient Assessment of Need for
Cognition. Journal of Personality Assessment, 48 (3), 306–307.
https://doi.org/10.1207/s15327752jpa4803 13
Campbell, W. K., Bonacci, A. M., Shelton, J., Exline, J. J., & Bushman, B. J. (2004).
Psychological Entitlement: Interpersonal Consequences and Validation of a
Self-Report Measure. Journal of Personality Assessment, 83 (1), 29–45.
https://doi.org/10.1207/s15327752jpa8301 04
Campbell, W. K., Goodie, A. S., & Foster, J. D. (2004). Narcissism, confidence, and risk
attitude. Journal of Behavioral Decision Making, 17 (4), 297–311.
https://doi.org/10.1002/bdm.475
Chowrira, S., Smith, K., Dubois, P. J., & Roll, I. (2019). DIY Productive Failure: Boosting
Performance in a Large Undergraduate Biology Course. npj Science of Learning,
4 (1). https://doi.org/10.1038/s41539-019-0040-6
105
Cohen, P., Cohen, J., Aiken, L. S., & West, S. G. (1999). The Problem of Units and the
Circumstance for POMP. Multivariate Behavioral Research, 34 (3), 315–346.
https://doi.org/10.1207/S15327906MBR3403 2
Deci, E. L., Koestner, R., & Ryan, R. M. (2001). Extrinsic Rewards and Intrinsic
Motivation in Education: Reconsidered Once Again. Review of Educational
Research, 71 (1), 1–27. https://doi.org/10.3102/00346543071001001
Deffler, S. A., Leary, M. R., & Hoyle, R. H. (2016). Knowing what you know: Intellectual
humility and judgments of recognition memory. Personality and Individual
Differences, 96, 255–259. https://doi.org/10.1016/j.paid.2016.03.016
DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations
for data screening. Journal of Organizational Behavior, 36 (2), 171–181.
https://doi.org/10.1002/job.1962
Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., & Ludeke, S. G.
(2016). Openness to (Reporting) Experiences That One Never Had: Overclaiming as
an Outcome of the Knowledge Accumulated Through a Proclivity for Cognitive and
Aesthetic Exploration. Journal of Personality and Social Psychology, 113 (5),
810–834. https://doi.org/10.1037/pspp0000110
Duttle, K. (2016). Cognitive Skills and Confidence: Interrelations with Overestimation,
Overplacement and Overprecision. Bulletin of Economic Research, 68 (S1), 42–55.
https://doi.org/10.1111/boer.12069
Dweck, C. S. (2006). Mindset: The New Psychology of Success. Random House.
Edwards, J. R. (1994). Regression Analysis as an Alternative to Difference Scores.
JOURNAL OF MANAGEMENT, 20 (3), 7.
Emmons, R. A. (1987). Narcissism: Theory and measurement. Journal of personality and
social psychology, 52 (1), 11–17.
106
Espinosa, M. P., & Gardeazabal, J. (2010). Optimal correction for guessing in
multiple-choice tests. Journal of Mathematical Psychology, 54 (5), 415–425.
https://doi.org/10.1016/j.jmp.2010.06.001
Fell, C. B., & Konig, C. J. (2018). Examining Cross-Cultural Differences in Academic
Faking in 41 Nations. Applied Psychology, 69 (2), 444–478.
https://doi.org/10.1111/apps.12178
Festinger, L. (1954). A theory of social comparison processes. Human relations, 7 (2),
117–140.
Frankish, K. (2010). Dual-Process and Dual-System Theories of Reasoning. Philosophy
Compass, 5 (10), 914–926. https://doi.org/10.1111/j.1747-9991.2010.00330.x
Frederick, S. (2005). Cognitive Reflection and Decision Making. The Journal of Economic
Perspectives, 19 (4), 25–42.
Frisch, S. A., Large, N. R., & Pisoni, D. B. (2000). Perception of Wordlikeness: Effects of
Segment Probability and Length on the Processing of Nonwords. Journal of
Memory and Language, 42 (4), 481–496. https://doi.org/10.1006/jmla.1999.2692
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach.
Psychological review, 102 (4), 652.
Furnham, A., Hyde, G., & Trickey, G. (2015). Personality and value correlates of careless
and erratic questionnaire responses. Personality and Individual Differences, 80,
64–67. https://doi.org/10.1016/j.paid.2015.02.005
Goldberg, L. R. (1981). Language and individual differences: The search for universals in
personality lexicons. Review of personality and social psychology, 2 (1), 141–165.
Gosling, S. D., Rentfrow, P. J., & Swann Jr, W. B. (2003). A very brief measure of the
Big-Five personality domains. Journal of Research in personality, 37 (6), 504–528.
Grossmann, I., Weststrate, N. M., Ardelt, M., Brienza, J. P., Dong, M., Ferrari, M.,
Fournier, M. A., Hu, C. S., Nusbaum, H. C., & Vervaeke, J. (2020). The Science of
107
Wisdom in a Polarized World: Knowns and Unknowns. Psychological Inquiry, 31 (2),
103–133. https://doi.org/10.1080/1047840X.2020.1750917
Guenther, C. L., & Alicke, M. D. (2010). Deconstructing the better-than-average effect.
Journal of Personality and Social Psychology, 99 (5), 755–770.
https://doi.org/10.1037/a0020959
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice
Item-Writing Guidelines for Classroom Assessment. Applied Measurement in
Education, 15 (3), 309–333. https://doi.org/10.1207/S15324818AME1503 5
Hargittai, E. (2009). An Update on Survey Measures of Web-Oriented Digital Literacy.
Social Science Computer Review, 27 (1), 130–137.
https://doi.org/10.1177/0894439308318213
Heine, S. J., & Hamamura, T. (2007). In Search of East Asian Self-Enhancement.
Personality and Social Psychology Review, 11 (1), 4–27.
https://doi.org/10.1177/1088868306294587
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?
Behavioral and brain sciences, 33 (2-3), 61–83.
https://doi.org/10.1017/S0140525X0999152X
Hirsch Jr, E. D., Kett, J. F., & Trefil, J. S. (1988). Cultural literacy: What every American
needs to know. Vintage.
Hodge, D. R., & Gillespie, D. (2003). Phrase completions: An alternative to Likert scales.
Social Work Research, 27 (1), 45–55.
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting
and Deterring Insufficient Effort Responding to Surveys. Journal of Business and
Psychology, 27 (1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Hutt, E., & Schneider, J. (2018). A History of Achievement Testing in the United States
Or: Explaining the Persistence of Inadequacy. Teachers College Record, 34.
108
John, O. P., & Robins, R. W. (1994). Accuracy and bias in self-perception: Individual
differences in self-enhancement and the role of narcissism. Journal of personality and
social psychology, 66 (1), 206.
Joinson, A., McKenna, K., Postmes, T., & Reips, U.-D. (2007). Oxford Handbook of
Internet Psychology. Oxford University Press.
Joule, R.-V., & Azdia, T. (2003). Cognitive dissonance, double forced compliance, and
commitment. European Journal of Social Psychology, 33 (4), 565–571.
https://doi.org/10.1002/ejsp.165
Kahneman, D., Slovic, S. P., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty:
Heuristics and biases. Cambridge university press.
Kantner, J., & Lindsay, D. S. (2012). Response bias in recognition memory as a cognitive
trait. Memory & Cognition, 40 (8), 1163–1177.
https://doi.org/10.3758/s13421-012-0226-0
Kantner, J., & Lindsay, D. S. (2014). Cross-situational consistency in recognition memory
response bias. Psychonomic Bulletin & Review, 21 (5), 1272–1280.
https://doi.org/10.3758/s13423-014-0608-3
Kaplan, A., Middleton, M. J., Urdan, T., & Midgley, C. (2002). Achievement goals and
goal structures. Goals, goal structures, and patterns of adaptive learning, 21–53.
Kelley, T. L. (1927). Interpretation of educational measurements. World Book Co.
Kennedy, J. A., Anderson, C., & Moore, D. A. (2013). When overconfidence is revealed to
others: Testing the status-enhancement theory of overconfidence. Organizational
Behavior and Human Decision Processes, 122 (2), 266–279.
https://doi.org/10.1016/j.obhdp.2013.08.005
Kenny, D. A. (2004). PERSON: A general model of interpersonal perception. Personality
and social psychology review, 8 (3), 265–280.
Kimmel, J. (2020). Lie Witness News – Oscars 2020 Edition.
Kirkpatrick, E. A. (1907). A vocabulary test. Popular Science Monthly, 70, 157–164.
109
Kopp, J. P., Zinn, T. E., Finney, S. J., & Jurich, D. P. (2011). The Development and
Evaluation of the Academic Entitlement Questionnaire. Measurement and
Evaluation in Counseling and Development, 44 (2), 105–129.
https://doi.org/10.1177/0748175611400292
Korossy, K. (1999). Modeling knowledge as competence and performance. Knowledge
spaces: Theories, empirical research, and applications, 103–132.
Krueger, J. I., & Mueller, R. A. (2002). Unskilled, unaware, or both? The
better-than-average heuristic and statistical regression predict errors in estimates of
own performance. Journal of personality and social psychology, 82 (2), 180–188.
Krueger, J. I., & Wright, J. C. (2011). Measurement of self-enhancement (and
self-protection). Handbook of self-enhancement and self-protection, 472–494.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in
recognizing one’s own incompetence lead to inflated self-assessments. Journal of
personality and social psychology, 77 (6), 1121–1134.
Krumrei-Mancuso, E. J., & Rouse, S. V. (2016). The development and validation of the
comprehensive intellectual humility scale. Journal of personality assessment, 98 (2),
209–221.
Lamport, L. (1986). LATEX: A document preparation system, adison.
Larsen, D. P., Butler, A. C., & Roediger III, H. L. (2009). Repeated testing improves
long-term retention relative to repeated study: A randomised controlled trial.
Medical Education, 43 (12), 1174–1181.
https://doi.org/10.1111/j.1365-2923.2009.03518.x
Leary, M. R., Diebels, K. J., Davisson, E. K., Jongman-Sereno, K. P., Isherwood, J. C.,
Raimi, K. T., Deffler, S. A., & Hoyle, R. H. (2017). Cognitive and Interpersonal
Features of Intellectual Humility. Personality and Social Psychology Bulletin, 43 (6),
793–813. https://doi.org/10.1177/0146167217697695
110
Lee, D., & Daunizeau, J. (2020). Choosing what we like vs liking what we choose: How
choice-induced preference change might actually be instrumental to decision-making.
PLOS ONE, 15 (5), e0231081. https://doi.org/10.1371/journal.pone.0231081
Lee, K., & Ashton, M. C. (2004). Psychometric Properties of the HEXACO Personality
Inventory. Multivariate Behavioral Research, 39 (2), 329–358.
https://doi.org/10.1207/s15327906mbr3902 8
Levashina, J., Morgeson, F. P., & Campion, M. A. (2009). They Don’t Do It Often, But
They Do It Well: Exploring the relationship between applicant mental abilities and
faking. International Journal of Selection and Assessment, 17 (3), 271–281.
https://doi.org/10.1111/j.1468-2389.2009.00469.x
Lucas, D. B. (1942). A Controlled Recognition Technique for Measuring Magazine
Advertising Audiences. Journal of Marketing, 6 (4 part 2), 133–136.
https://doi.org/10.1177/002224294200600431.1
Ludeke, S. G., & Makransky, G. (2016). Does the Over-Claiming Questionnaire measure
overclaiming? Absent convergent validity in a large community sample.
Psychological Assessment, 28 (6), 765–774.
Macenczak, L. A., Campbell, S., Henley, A. B., & Campbell, W. K. (2016). Direct and
interactive effects of narcissism and power on overconfidence. Personality and
Individual Differences, 91, 113–122. https://doi.org/10.1016/j.paid.2015.11.053
Mackenzie, J. A., & McMillan, T. M. (2005). Knowledge of post-concussional syndrome in
naıve lay-people, general practitioners and people with minor traumatic brain
injury. British Journal of Clinical Psychology, 44 (3), 417–424.
https://doi.org/10.1348/014466505X35696
Macmillan, N. A. (2002). Signal detection theory. Stevens’ handbook of experimental
psychology.
Marchman, V. A., & Fernald, A. (2008). Speed of word recognition and vocabulary
knowledge in infancy predict cognitive and language outcomes in later childhood.
111
Developmental Science, 11 (3), F9–F16.
https://doi.org/10.1111/j.1467-7687.2008.00671.x
Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial
consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14 (2),
194–199. https://doi.org/10.3758/BF03194051
McKay, A. S., Garcia, D. M., Clapper, J. P., & Shultz, K. S. (2018). The attentive and the
careless: Examining the relationship between benevolent and malevolent personality
traits with careless responding in online surveys. Computers in Human Behavior,
84, 295–303. https://doi.org/10.1016/j.chb.2018.03.007
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.
Psychological methods, 17 (3), 437–455.
Merriam-Webster. (2020a). Exaggerate.
Merriam-Webster. (2020b). Overclaim.
Mitchell, S., & Tzu, L. (1992). Tao Te Ching written by Lao-tzu. HarperPerennial.
Moore, D. A. (2018). Overconfidence | Psychology Today.
Moore, D. A., Dev, A. S., & Goncharova, E. Y. (2018). Overconfidence Across Cultures.
Collabra: Psychology, 4 (1), 36. https://doi.org/10.1525/collabra.153
Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological
Review, 115 (2), 502–517. https://doi.org/10.1037/0033-295X.115.2.502
Moore, T. (2013). Critical thinking: Seven definitions in search of a concept. Studies in
Higher Education, 38 (4), 506–522. https://doi.org/10.1080/03075079.2011.586995
Moran, G., & Cutler, B. L. (1997). Bogus Publicity Items and the Contingency Between
Awareness and Media-Induced Pretrial Prejudice. Law and Human Behavior, 21 (3),
339–344. https://doi.org/10.1023/A:1024846917038
Morris, E. (2013). The Unknown Known.
Murre, J. M. J., & Dros, J. (2015). Replication and Analysis of Ebbinghaus’ Forgetting
Curve. PLOS ONE, 10 (7), e0120644. https://doi.org/10.1371/journal.pone.0120644
112
Newman, E. J., Garry, M., Bernstein, D. M., Kantner, J., & Lindsay, D. S. (2012).
Nonprobative photographs (or words) inflate truthiness. Psychonomic Bulletin &
Review, 19 (5), 969–974. https://doi.org/10.3758/s13423-012-0292-0
Nichols, D. S., Greene, R. L., & Schmolck, P. (1989). Criteria for assessing inconsistent
patterns of item endorsement on the MMPI: Rationale, development, and empirical
trials. Journal of Clinical Psychology, 45 (2), 239–250. https://doi.org/10.1002/1097-
4679(198903)45:2<239::AID-JCLP2270450210>3.0.CO;2-1
Ohtani, K., & Hisasaka, T. (2018). Beyond intelligence: A meta-analytic review of the
relationship among metacognition, intelligence, and academic performance.
Metacognition and Learning, 13 (2), 179–212.
https://doi.org/10.1007/s11409-018-9183-8
Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of
personality and social psychology, 46 (3), 598.
Paulhus, D. L. (1988). Balanced inventory of desirable responding (BIDR). Acceptance and
Commitment Therapy. Measures Package, 41.
Paulhus, D. L. (1998). The Paulhus deception scales: BIDR version 7. Toronto/Buffalo:
Multi-Health Systems.
Paulhus, D. L. (2012). History of the Technique. In M. Ziegler, C. MacCann, &
R. D. Roberts (Eds.), New Perspectives on Faking in Personality Assessment
(pp. 309–329). Oxford University Press.
https://doi.org/10.1093/acprof:oso/9780195387476.003.0087
Paulhus, D. L., & Bruce, M. N. (1990). Validation of the OCQ: A preliminary study.
Annual Convention of the Canadian Psychological Association, Ottawa, Ontario,
Canada.
Paulhus, D. L., & Dubois, P. J. (2014). Application of the Overclaiming Technique to
Scholastic Assessment. Educational and Psychological Measurement, 74 (6), 975–990.
https://doi.org/10.1177/0013164414536184
113
Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming
technique: Measuring self-enhancement independent of ability. Journal of
Personality and Social Psychology, 84 (4), 890–904.
Paulhus, D. L., & Williams, K. M. (2002). The Dark Triad of personality: Narcissism,
Machiavellianism, and psychopathy. Journal of Research in Personality, 36 (6),
556–563. https://doi.org/10.1016/S0092-6566(02)00505-6
Peter, J. P., Churchill, G. A., Jr., & Brown, T. J. (1993). Caution in the Use of Difference
Scores in Consumer Research. Journal of Consumer Research, 19 (4), 655–662.
https://doi.org/10.1086/209329
Phillips, D. L., & Clancy, K. J. (1972). Some Effects of ”Social Desirability” in Survey
Studies. American Journal of Sociology, 77 (5), 921–940.
Pintrich, P. R. (1991). A manual for the use of the Motivated Strategies for Learning
Questionnaire (MSLQ).
Pryor, L. R., Miller, J. D., & Gaughan, E. T. (2008). A Comparison of the Psychological
Entitlement Scale and the Narcissistic Personality Inventory’s Entitlement Scale:
Relations With General Personality Traits and Personality Disorders. Journal of
Personality Assessment, 90 (5), 517–520.
https://doi.org/10.1080/00223890802248893
R Core Team. (2020). R: A language and environment for statistical computing.
Raskin, R., & Terry, H. (1988). A principal-components analysis of the Narcissistic
Personality Inventory and further evidence of its construct validity. Journal of
personality and social psychology, 54 (5), 890–902.
Raubenheimer, A. S. (1925). An experimental study of some behavior traits of the
potentially delinquent boy. Psychology Monograph, 34 (6), 107.
Richardson, K. (2002). What IQ Tests Test. Theory & Psychology, 12 (3), 283–314.
https://doi.org/10.1177/0959354302012003012
114
Ronay, R., Oostrom, J. K., Lehmann-Willenbrock, N., Mayoral, S., & Rusch, H. (2019).
Playing the trump card: Why we select overconfident leaders and why it matters.
The Leadership Quarterly, 30 (6), 1–19.
https://doi.org/10.1016/j.leaqua.2019.101316
Rosse, J. G., Miller, J. L., & Stecher, M. D. (1994). A field study of job applicants’
reactions to personality and cognitive ability testing. Journal of Applied Psychology,
79 (6), 987–992. https://doi.org/10.1037/0021-9010.79.6.987
Rousselet, G. A., & Wilcox, R. R. (2020). Reaction Times and other Skewed Distributions:
Problems with the Mean and the Median. Meta-Psychology, 4.
https://doi.org/10.15626/MP.2019.1630
RStudio Team. (2019). RStudio: Integrated development environment for r. Manual.
RStudio, Inc. Boston, MA.
Safat, A. A., Sheibani, H., Mohammadi, P., Hasanabadi, N., & Sakhaee, E. (2018).
Evaluation of lipid-lowering effect of Cynara scolymus extract-loaded mesoporous
silica nanoparticles on ultra-lipid-fed mice. Comparative Clinical Pathology, 27 (2),
513–518. https://doi.org/10.1007/s00580-017-2621-1
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art.
Psychological methods, 7 (2), 147.
Schlenker, B. R. (2012). Self-presentation. Handbook of self and identity, 2nd ed
(pp. 542–570). The Guilford Press.
Schumacher, E., & Eskenazi, M. (2016). A Readability Analysis of Campaign Speeches
from the 2016 US Presidential Campaign. arXiv:1603.05739 [cs].
Smith, E. M., & Mason, J. B. (1970). The Influence of Instructions on Respondent Error.
Journal of Marketing Research, 7 (2), 254–255.
https://doi.org/10.1177/002224377000700216
Stuart, T. (2016). Donald Trump’s 13 Biggest Business Failures.
115
Suls, J., & Wheeler, L. (2013). Handbook of social comparison: Theory and research.
Springer Science & Business Media.
Symonds, P. M. (1924). The present status of character measurement. Journal of
Educational Psychology, 15 (8), 484–498. https://doi.org/10.1037/h0069154
Thomson, K. S., & Oppenheimer, D. M. (2016). Investigating an alternate form of the
cognitive reflection test. Judgment and Decision making, 11 (1), 99–113.
Toplak, M. E., West, R. F., & Stanovich, K. E. (2014). Assessing miserly information
processing: An expansion of the Cognitive Reflection Test. Thinking & Reasoning,
20 (2), 147–168. https://doi.org/10.1080/13546783.2013.844729
Trump, D. J. (2015). Campaign Speech, Hilton Head Island, South Carolina, United States.
Ujifusa, A. (2012). Standardized Testing Costs States $1.7 Billion a Year, Study Says.
Education Week.
van Prooijen, J.-W., & Krouwel, A. P. M. (2019). Overclaiming Knowledge Predicts
Anti-establishment Voting. Social Psychological and Personality Science, 11 (3),
356–363. https://doi.org/10.1177/1948550619862260
Van Tongeren, D. R., Davis, D. E., Hook, J. N., & vanOyen Witvliet, C. (2019). Humility.
Current Directions in Psychological Science, 28 (5), 463–468.
https://doi.org/10.1177/0963721419850153
Voelker, P. F. (1921). An Account of Certain Methods of Testing for Moral Reactions in
Conduct. Religious Education, 16 (2), 81–83.
https://doi.org/10.1080/0034408210160204
Wallace, H. M. (2011). Narcissistic self-enhancement. The handbook of narcissism and
narcissistic personality disorder: Theoretical approaches, empirical findings, and
treatments, 309–318.
Ward, M. K., Meade, A. W., Allred, C. M., Pappalardo, G., & Stoughton, J. W. (2017).
Careless response and attrition as sources of bias in online survey assessments of
116
personality traits and performance. Computers in Human Behavior, 76, 417–430.
https://doi.org/10.1016/j.chb.2017.06.032
Ward, M., & Meade, A. W. (2018). Applying Social Psychology to Prevent Careless
Responding during Online Surveys. Applied Psychology, 67 (2), 231–263.
https://doi.org/10.1111/apps.12118
Whittlesea, B. W. A., & Leboe, J. P. (2003). Two fluency heuristics (and how to tell them
apart). Journal of Memory and Language, 49 (1), 62–79.
https://doi.org/10.1016/S0749-596X(03)00009-3
Williams, K. M., Paulhus, D. L., & Nathanson, C. (2002). The nature of over-claiming:
Personality and cognitive factors. A Poster Presented to the Annual Meeting of the
American Psychological Association, Chicago, IL.
Wonderlic. (2019). The story of Wonderlic, being awesome since 1937.
Wonderlic, E. F. (1992). Wonderlic Personnel Test and scholastic level exam user’s manual.
Wonderlic and Associates: Northfield, IL, USA.
Woo, S. E., Harms, P. D., & Kuncel, N. R. (2007). Integrating personality and intelligence:
Typical intellectual engagement and need for cognition. Personality and Individual
Differences, 43 (6), 1635–1639. https://doi.org/10.1016/j.paid.2007.04.022
Wood, D., Harms, P. D., Lowman, G. H., & DeSimone, J. A. (2017). Response Speed and
Response Consistency as Mutually Validating Indicators of Data Quality in Online
Samples. Social Psychological and Personality Science, 8 (4), 454–464.
https://doi.org/10.1177/1948550617703168
Woodrow, H., & Bemmels, V. (1927). Overstatement as a test of general character in
pre-school children. Journal of Educational Psychology, 18 (4), 239–246.
https://doi.org/10.1037/h0071514
Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure
of orthographic similarity. Psychonomic Bulletin & Review, 15 (5), 971–979.
https://doi.org/10.3758/PBR.15.5.971
117
Zemeckis, R. (1994). Forrest Gump.
118
Appendix: The Overclaiming Technique (OCT)
While the RExI can be applied to an overstatement test to assess both ability and
exaggeration as uncorrelated measures, there exists another approach that purports to do
the same thing. The Overclaiming Technique (OCT) is presented as “Measuring
Self-Enhancement Independent of Ability” (Paulhus et al., 2003, p. 809), framing
self-enhancement as synonymous with exaggeration: “The OCT was designed to measure
knowledge exaggeration and knowledge accuracy simultaneously and independently”
(Paulhus, 2012, p. 151). How is this different from the RExI?
The OCT begins with the same data collection used by Raubenheimer (1925), labeled
“overclaiming” by Phillips and Clancy (1972), i.e. soliciting claims of knowledge or
familiarity with a variety of items, some of which are reals, some foils. The unique
contribution of the OCT is in using Signal Detection Theory (SDT) for analysis (e.g.
Macmillan, 2002). The portion of reals claimed (reals rate, or hit rate in SDT terms) and
the portion of foils claimed (foils rate, or false-alarm rate) are combined to create two new
indices: accuracy (the excess of reals rate over foils rate, also called sensitivity in SDT) and
bias (the average of the two)19.
The paper that introduced the OCT (Paulhus et al., 2003, which used a collection of
general knowledge reals and foils called the OCQ) first references the definition of
overclaiming used by Phillips and Clancy (1972): “Over-claiming is the tendency to claim
knowledge about nonexistent items” (p. 891)20, in other words, foils rate. The next page,
however, equates the term overclaiming with the bias definition described above:
“over-claiming was operationalized with the OCQ bias index” (p. 892), i.e. averaged reals
rate and foils rate, giving the term a very different meaning: not just unwarranted claims,
19 This simple approach, using difference and average, is called the “common sense” approach (Paulhus,2012, p. 154), which closely approximates the traditional SDT measures of d′ and −c, where reals rate andfoils rate are z-transformed before combining. Both approaches produce very similar results.
20 The terms “over-claiming” and “overclaiming” are used interchangeably by that author and others, butthe non-hyphenated version is a distinct keyword and search term, and so used here.
119
but any claims. For reporting results, however, “predictions with the OCQ bias measure
are always assessed after controlling for the OCQ accuracy score. Thus, discriminant
validity with respect to accurate knowledge is built into the calculation of the over-claiming
index.” (p. 899). That index (bias controlled for accuracy, or residualized bias) is meant to
capture exaggeration (self-enhancement) independent of knowledge (ability). Residualized
bias will necessarily be uncorrelated with the accuracy index, but does it capture
exaggeration independent of ability?
Let us examine the mathematics involved. Let R be reals rate, F , foils rate, A,
accuracy, and B, bias: A = R−F , and B = R+F . Plotting F against R then B against A
will show that creating difference and sum composites simply rotates the variable space by
45 degrees. What does this mean conceptually? As discussed earlier, R represents plausible,
self-reported ability (possibly with some exaggeration) that approximates actual ability, as
shown by P. L. Ackerman and Ellingsen (2014). F represents implausible ability claims,
but is likely related to genuine ability, as shown by P. L. Ackerman and Ellingsen (2014)
and Atir et al. (2015). B represents the indiscriminate claiming of real or foil items, which
is simply response bias, making no distinction between plausible and implausible claims.
Interpreting this as exaggeration would be comparable to asking fishermen the size of their
catch, and assuming everything they say is exaggeration regardless of what was caught.
In contrast, A represents plausible claims compensated for implausible claims, which
could be interpreted as a corrected self-estimate of ability. This idea has some validation:
Paulhus and Dubois (2014) demonstrated that this measure on an overclaiming inventory
was comparable to multiple-choice and short-answer quiz formats in predicting
undergraduate course grades. This suggests that foil claiming negatively predicts ability,
but, like bias, accuracy obfuscates any distinction between ability and exaggeration, since it
combines both reals and (reversed) foils claiming with equal weight. In essence, this is a
simple form of correction for guessing, collapsing two dimensions into one.
What about bias controlled for accuracy (residualized bias) which the OCT
120
recommends as the index of exaggeration or self-enhancement? Let BRes be residualized
bias. The SDT model that is the basis for OCT assumes equal variance for R and F . This
would make difference and sum composites A and B uncorrelated21, i.e. Cor(A,B) = 0. If
SDT assumptions are met, Cor(A,B) = 0 so controlling B for A has no effect and
BRes = B, meaning the OCT exaggeration index is no different from bias, i.e. indiscriminate
claiming.
In typical overclaiming research, however, the SDT assumption of equal variance may
not hold. Nonetheless, no part of the OCT ensures accuracy and bias will be substantially
correlated, i.e. that BRes will meaningfully differ from B. Thus there is no assurance that
the OCT measures exaggeration (self-enhancement) distinct from knowledge (ability),
contrary to the declared design goals.
This lack of distinction may explain the empirical failures of the OCT, because a
number of researchers have reported that the OCT does not measure what it claims to:
Bensch et al. (2017) examined self-enhancement broadly as “positivity bias”, including
measures of narcissism, self-deceptive enhancement, impression management,
overconfidence, crystallized intelligence, a variety of measures of socially-desirable
responding, and the five-factor model of personality. A factor analysis of all these found
that OCT bias or residualized bias did not appear on any of the six factors found, with the
authors concluding that whatever the OCT measured was “fully independent of personality
and crystallized intelligence.” (p. 12).
Using the HEXACO measure of Openness (K. Lee & Ashton, 2004), Dunlop et al.
(2016) found that OCT accuracy, bias and residualized bias all significantly related to
Openness (around r = .30), concluding that “overclaiming can be understood as a result of
knowledge accumulated through a general proclivity for cognitive and aesthetic exploration
(i.e., Openness)” (p. 1).
Less flattering, Ludeke and Makransky (2016), using the OCQ as done in the paper
21 Cov(X + Y,X − Y ) = E((X − µX) + (Y − µY ))((X − µX)− (Y − µY )) = V ar(X)− V ar(Y ) = 0
121
introducing the OCT (and so both the same items and same analytic technique) noted
“Using a sample of 704 adult community members, we found minimal support for the OCQ
as an assessment of misrepresentation. . . . OCQ bias measures were instead consistently
and sometimes even highly related to measures of careless responding.” (p. 1).
The OCT “exaggeration index” will mostly represent indiscriminate claiming, or
general response bias, a mix of ability and exaggeration, which could explain the above
findings. The OCT does not capture exaggeration as defined in this paper.