of 54
7/30/2019 Validity 07
1/54
1
Validity Outline
1.Definition
2.Validity: Two Different Views
3.Types of Validity
A. FaceB. Content
C. Criterion
i. Predictive vs. Concurrent
ii. Validity CoefficientsD. Construct
i. Convergent
ii. Discriminant
7/30/2019 Validity 07
2/54
2
Validity Definition
Validitymeasures
agreement between a
test score and the
characteristic it isbelieved to measure
The basic question is:
are you measuring
what you think youre
measuring?
7/30/2019 Validity 07
3/54
3
Validity: two very different views
Traditional:
Validity is a property of
tests
Does the test measure
what you think it
measures?
7/30/2019 Validity 07
4/54
4
Validity: two very different views
Traditional
Recent (e.g, Messick,
1989; Committee on
Standards forEducational and
Psychological Testing
(CSEPT)):
Validity is a property of
test score
interpretations
Validity exists when
actions based on the
interpretation are
justified given a
theoretical basis and
social consequences
7/30/2019 Validity 07
5/54
5
Note the difference:
Does the test
measure what you
think it measures?
Validity exists when
actions based on the
interpretation are
justified given atheoretical basis and
social consequences
7/30/2019 Validity 07
6/54
6
A problem with the CSEPT view
Who is to say the social
consequences of test
use are good or bad?
According to CSEPT
validity is a subjective
judgment
In my view, this makes
the concept useless: if
you like the result the test
gives you, you will
consider it valid. If you
dont, you wont. Thats not how scientists
think.
7/30/2019 Validity 07
7/54
7
Borsboom et al. (2004)
Borsboom et al reject
CSEPTs view
Validity is a very
basic concept and
was correctly
formulated, forinstance, by Kelley
(1927, p. 14) when he
stated that a test is
valid if it measureswhat it purports to
measure. (p. 1061)
7/30/2019 Validity 07
8/54
8
Borsboom et al. (2004)
a test is valid formeasuring anattribute if and only if(a) the attribute existsand (b) variations inthe attribute causallyproduce variations inthe outcomes of the
measurementprocedure.
Variations in what youare measuring causevariations in yourmeasurements.
E.g., variations acrosspeople in intelligencecause variations intheir IQ scores
This is not acorrelational model ofvalidity
7/30/2019 Validity 07
9/54
9
Borsboom et al. (2004)
You dont create a
test and then do the
analysis necessary to
establish its validity
Rather, you begin by
doing the theoretical
work necessary to
create a valid test inthe first place.
On this view, validity
is not a big issue.
7/30/2019 Validity 07
10/54
10
Borsboom et al. vs. CSEPT
Who is right?
Each scientist has to
make up his or her
own mind on thatquestion
I find Borsboom et
al.s arguments
compelling.
Other psychologistsmay disagree
7/30/2019 Validity 07
11/54
11
The CSEPT view
CSEPT recognizes 3
types of evidence for
test validity:
Content-related Criterion-related
Construct-related
Boundaries not clearly
defined
Cronbach (1980):
Construct is basic,
while Content &
Criterion aresubtypes.
7/30/2019 Validity 07
12/54
12
Parenthetical Point Face Validity
Face validity refers
to the appearance
that a test measures
what it is intended tomeasure.
Face validity has P.R.
value test-takers
may have better
motivation if the testappears to be a
sensible way to
measure what it
measures.
7/30/2019 Validity 07
13/54
13
CSEPT: Content validity
Content-related
evidence considers
coverage of the
conceptual domaintested.
Important in
educational settings
Like face validity, it is
determined by logicrather than statistics
Typically assessed by
expert judges
7/30/2019 Validity 07
14/54
14
CSEPT: Content validity
Content-related
evidence considers
coverage of the
conceptual domaintested.
Construct-irrelevant
variance
Construct under-representation
Is each item relevant
to domain?
Is domain adequately
covered or are partsof it left out?
But if you are going to
ask these questions,
why not do it when
creating the test?
7/30/2019 Validity 07
15/54
15
Borsboom et al.: Content validity
Borsboom et al.
would say that
content validity is not
something to beestablished after the
test has been
created.
Rather, you build it
into your test by
having a good theory
of what you aretesting
E.g., for a test in this
course to have
content validity, itshould test your
understanding of
content validity!
7/30/2019 Validity 07
16/54
16
CSEPT: Criterion validity
Criterion-related
evidence tells us how
well a test score
corresponds to aparticularcriterion
measure.
A criterion is a
standard against
which a test is
compared. The test score should
tell us something
about the criterion
score.
7/30/2019 Validity 07
17/54
17
CSEPT: Criterion validity
A criterion is a
standard against
which a test is
compared.
E.g., we could
compare GPAs to
SAT scores to
produce evidence ofvalidity ofconclusions
drawn on basis of
SAT scores
Two basic types: Predictive
Concurrent
7/30/2019 Validity 07
18/54
18
CSEPT: Criterion validity
Predictive validity Test scores used to
predict future
performance how
good is theprediction?
E.g., SAT is used to
predict final
undergraduate GPA
SAT GPA are
moderately correlated
7/30/2019 Validity 07
19/54
19
CSEPT: Criterion validity
Predictive validity
Concurrent validity
Correlation between
test scores and
criterion when the two
are measured atsame time.
Test illuminates
current performance
rather than predictingfuture performance
(e.g., why does
patient have a
temperature? Why
7/30/2019 Validity 07
20/54
20
Borsboom et al.: Criterion validity
Criterion validity
involves a correlation,
of test scores with
some criterion suchas GPA
That does not
establish the tests
validity, only its utility.
E.g., height andweight are correlated,
but a test of height is
not a test of what
bathroom scalesmeasure.
7/30/2019 Validity 07
21/54
21
Borsboom et al.: Criterion validity
SAT is valid because
it was developed on
the sensible theory
that past academicachievement is a
good guide to future
academic
achievement
Validityis built into
the test, not
established after the
test has been created
7/30/2019 Validity 07
22/54
22
Borsboom et al.: Criterion validity
Validation research
aims at showing how
variation in the
attribute causesvariation in the test
score
This requires a
theory of the task:
how does the test-
taker do the mentaloperations needed to
respond to test items?
7/30/2019 Validity 07
23/54
23
CSEPT: Criterion validity
Note: no point indeveloping a test ifyou already have acriterion unlessimpracticality orexpense makes useof the criteriondifficult.
Criterion measure
only available in the
future?
Criterion tooexpensive to use?
7/30/2019 Validity 07
24/54
24
CSEPT: Criterion validity
Validity Coefficient Compute correlation
(r) between test score
and criterion.
r = .30 or .40 wouldbe considered
normal.
r > .60 is rare
Note: r varies between
-1.0 and +1.0
7/30/2019 Validity 07
25/54
25
CSEPT: Criterion validity
Validity Coefficient r2 gives proportion of
variance in criterion
explained by test
score. E.g., if rxy = .30, r
2 =
.09, so 9% of
variability in Y can be
explained by variationin X
7/30/2019 Validity 07
26/54
26
CSEPT: Criterion validity
Interpreting Validity
Coefficients watch
out for:
1. Changes in causal
relationships
2. What does criterion
mean? Is it valid,reliable?
3. Is subject population
for validity study
appropriate?
4. Sample size
7/30/2019 Validity 07
27/54
27
CSEPT: Criterion validity
Interpreting Validity
Coefficients watch
out for:
5. Criterion/predictor
confusion
6. Range restrictions
7. Do validity studyresults generalize?
8. Differential
predictions
7/30/2019 Validity 07
28/54
28
CSEPT: Construct validity
Problem: for many
psychological
characteristics of
interest there is noagreed-upon
universe of content
and no clear criterion
We cannot assess
content or criterion
validity for such
characteristics These characteristics
involve constructs:
something built by
mental synthesis.
7/30/2019 Validity 07
29/54
29
CSEPT: Construct validity
Examples ofconstructs:
Intelligence Love
Curiosity
Mental health
CSEPT: We obtain
evidence of validity by
simultaneously
defining the constructand developing
instruments to
measure it.
This isbootstrapping.
7/30/2019 Validity 07
30/54
30
Bootstrapping construct validity
assemble evidence
about what a test
means in other
words, about thecharacteristic it is
testing.
CSEPT: this process
is never finished
Borsboom: this is part
of the process ofcreating a test in the
first place, not
something done after
the fact
31
7/30/2019 Validity 07
31/54
31
Bootstrapping construct validity
assemble evidence
show relationshipsbetween a test andother tests
none of the othertests is a criterion
Borsboom: theserelationships do nottell us what a testscore means (e.g., age is correlated
with annual income
but a measure of ageis not a measure ofannual income).
32
7/30/2019 Validity 07
32/54
32
Bootstrapping construct validity
assemble evidence
show relationships
each new relationship
adds meaning to thetest
tests meaning is
gradually clarified
over time
Borsboom would say,why all the mystery?
The meaning of many
tests (e.g., WAIS,
academic exams,Piagets tests) is clear
right from the start
33
7/30/2019 Validity 07
33/54
33
CSEPT: Construct validity
Example from text:Rubins work on Love.
Rubin collected a set
of items for a Love
scale
He read poetry,novels; asked people
for definitions
created a scale of
Love and one of
Liking
34
7/30/2019 Validity 07
34/54
34
CSEPT: Construct validity
Rubin gave scale tomany subjects &factor-analyzedresults
Love integrates
Attachment, Caring, &
Intimacy
LikingintegratesAdjustment, Maturity,
Good Judgment, and
Intelligence
The two areindependent: you can
love someone you
dont like (as song-
writers know)
35
7/30/2019 Validity 07
35/54
35
Campbell & Fiske (1959)
Two types of
Construct-related
Evidence
Convergent evidence
When a test
correlates well with
other tests believed to
measure the sameconstruct
36
7/30/2019 Validity 07
36/54
36
Campbell & Fiske (1959)
Two types of
Construct-related
Evidence
Convergent evidence
Discriminant evidence
When a test does not
correlate with other
tests believed to
measure some otherconstruct.
37
7/30/2019 Validity 07
37/54
37
Convergent validity
Example Health
Index
Scores correlated
with age, number of
symptoms, chronic
medical conditions,physiological
measures
Treatments designed
to improve healthshould increase
Health Index scores.
They do.
38
7/30/2019 Validity 07
38/54
38
Discriminant validity
low correlations
between new test and
tests believed to tap
unrelated constructs.
evidence that the new
test measures
something unique
39
7/30/2019 Validity 07
39/54
39
CSEPT: Validity & Reliability
CSEPT: No point in
trying to establish
validity of an
unreliable test.
Its possible to have a
reliable test that has
no meaning (is not
valid). Logically impossible
to produce evidence
of validity for an
unreliable test.
40
7/30/2019 Validity 07
40/54
40
Borsboom: Validity & Reliability
Borsboom et al: what
does it mean to say
that a test is reliable
but not valid?
What is it a test of?
It isnt a test at all, just
a collection of items
41
7/30/2019 Validity 07
41/54
41
Borsboom: Validity & Reliability
Borsboom et al:validity is a necessarycondition for reliability
Reliability of a test ofX estimatesprecisionof measurementof X
but how could you
estimate the precisionof measurement of Xfor a test that doesnot measure X?
Thus, validity ispresumed when youassess reliability
42
7/30/2019 Validity 07
42/54
42
Blanton & Jaccard arbitrary metrics
We observe a
behavior in order to
learn about the
underlyingpsychological
characteristic
A persons test score
represents theirstanding on that
underlying dimension
Such scores form an
arbitrary metric
That is, we do not
know how theobserved scores are
related to the true
scores on the
underlying dimension
7/30/2019 Validity 07
43/54
6543210
0 1 2 3 4 5 6
Person A Person B
Underlying
dimension
Test 1
Test 2
Adapted from Blanton & Jaccard (2006) Figure 1, p. 29
Neutral
44
7/30/2019 Validity 07
44/54
44
Arbitrary metrics the IAT
Implicit Association
Test (IAT) claimed
to diagnose implicit
attitudinal preferences or racist attitudes
IAT authors say you
may have prejudices
you dont know you
have. Are these claims
true?
45
7/30/2019 Validity 07
45/54
45
Arbitrary metrics the IAT
Task: categorize
stimuli using two pairs
of categories
Two buttons to press,
two assignments of
categories to buttons,
used in sequence
46
7/30/2019 Validity 07
46/54
46
Arbitrary metrics the IAT
Assignment pattern A
Button 1 press if
stimulus refers to the
category White or thecategory Pleasant
Button 2 press if
stimulus refers to the
category Blackor thecategory Unpleasant
Assignment pattern B
Button 1 press if
stimulus refers to the
category White or thecategory Unpleasant
Button 2 press if
stimulus refers to the
category Blackor thecategory Pleasant
47
7/30/2019 Validity 07
47/54
47
Arbitrary metrics the IAT
IAT authors claim that
if responses are
faster to Pattern A
than to Pattern B, thatindicates a
preference for
Whites over Blacks
in other words, aracist attitude
IAT authors also give
test-takers feedback
about how strong
their preferences are,based on how much
fastertheir responses
are to Pattern A than
to Pattern B This is inappropriate
48
7/30/2019 Validity 07
48/54
48
Arbitrary metrics the IAT
Blanton & Jaccard: The IAT does not tell
us about racist
attitudes
IAT authors take adimension which is
non-arbitrary when
used by physicists
time and use it in anarbitrary way in
psychology
49
7/30/2019 Validity 07
49/54
49
Arbitrary metrics the IAT
The function relatingthe responsedimension (time) tothe underlying
dimension (attitudes)is unknown
Zero on the (PatternA Pattern B)difference may not bezero on the
underlying attitudepreference dimension
There are alternativemodels of how that
(Pattern A PatternB) difference couldarise
50
7/30/2019 Validity 07
50/54
Review
CSEPT:
1.Validity is a
characteristic of
evidence, not of tests.2.Valid evidence
supports conclusions
drawn using test
results
3.Validity is determined
by social
consequences of test
Borsboom et al.
1. Validity is not a
methodological issue,
but a substantive
(theoretical) issue
2. A test of an attribute is
valid if (a) the attribute
exists, and (b) variation
in the attribute causesvariation in test scores
51
7/30/2019 Validity 07
51/54
Review
CSEPT:
4. Validity can be
established in three
ways, though boundaries
between them are fuzzy:A. Content-related evidence
B. Criterion-related evidence
C. Construct-related
evidence
Borsboom et al:
3. Its all the same validity:
a test is valid if it
measures what you think
it measures
4. Validity is not mysterious
52
7/30/2019 Validity 07
52/54
Review
CSEPT
5. Content-related
evidence: do test
items representwhole domain of
interest?
6. Criterion-related
evidence: do testscores relate to a
criterion either now
(concurrent) or in the
future redictive ?
Borsboom et al.
5. These questions are
properly part of the
process of creating atest
53
7/30/2019 Validity 07
53/54
Review
CSEPT
6. Construct-related
evidence is obtained
when we develop apsychological
construct and the
way to measure it at
the same time.7. A test can be reliable
but not valid. A test
cannot be valid if not
reliable.
Borsboom et al.
6. A test must be valid
for a reliability
estimate to have anymeaning
54
7/30/2019 Validity 07
54/54
Review
Blanton & Jaccard
(2006) warn against
over-interpretation of
scores which arebased on an arbitrary
metric
For an arbitrary
metric, we have no
idea how the test
scores are actuallyrelated to the
underlying dimension