Validity 07

7/30/2019 Validity 07

1/54

1

Validity Outline

1.Definition

2.Validity: Two Different Views

3.Types of Validity

A. FaceB. Content

C. Criterion

i. Predictive vs. Concurrent

ii. Validity CoefficientsD. Construct

i. Convergent

ii. Discriminant

7/30/2019 Validity 07

2/54

2

Validity Definition

Validitymeasures

agreement between a

test score and the

characteristic it isbelieved to measure

The basic question is:

are you measuring

what you think youre

measuring?

7/30/2019 Validity 07

3/54

3

Validity: two very different views

Traditional:

Validity is a property of

tests

Does the test measure

what you think it

measures?

7/30/2019 Validity 07

4/54

4

Validity: two very different views

Traditional

Recent (e.g, Messick,

1989; Committee on

Standards forEducational and

Psychological Testing

(CSEPT)):

Validity is a property of

test score

interpretations

Validity exists when

actions based on the

interpretation are

justified given a

theoretical basis and

social consequences

7/30/2019 Validity 07

5/54

5

Note the difference:

Does the test

measure what you

think it measures?

Validity exists when

actions based on the

interpretation are

justified given atheoretical basis and

social consequences

7/30/2019 Validity 07

6/54

6

A problem with the CSEPT view

Who is to say the social

consequences of test

use are good or bad?

According to CSEPT

validity is a subjective

judgment

In my view, this makes

the concept useless: if

you like the result the test

gives you, you will

consider it valid. If you

dont, you wont. Thats not how scientists

think.

7/30/2019 Validity 07

7/54

7

Borsboom et al. (2004)

Borsboom et al reject

CSEPTs view

Validity is a very

basic concept and

was correctly

formulated, forinstance, by Kelley

(1927, p. 14) when he

stated that a test is

valid if it measureswhat it purports to

measure. (p. 1061)

7/30/2019 Validity 07

8/54

8


a test is valid formeasuring anattribute if and only if(a) the attribute existsand (b) variations inthe attribute causallyproduce variations inthe outcomes of the

measurementprocedure.

Variations in what youare measuring causevariations in yourmeasurements.

E.g., variations acrosspeople in intelligencecause variations intheir IQ scores

This is not acorrelational model ofvalidity

7/30/2019 Validity 07

9/54

9


You dont create a

test and then do the

analysis necessary to

establish its validity

Rather, you begin by

doing the theoretical

work necessary to

create a valid test inthe first place.

On this view, validity

is not a big issue.

7/30/2019 Validity 07

10/54

10

Borsboom et al. vs. CSEPT

Who is right?

Each scientist has to

make up his or her

own mind on thatquestion

I find Borsboom et

al.s arguments

compelling.

Other psychologistsmay disagree

7/30/2019 Validity 07

11/54

11

The CSEPT view

CSEPT recognizes 3

types of evidence for

test validity:

Content-related Criterion-related

Construct-related

Boundaries not clearly

defined

Cronbach (1980):

Construct is basic,

while Content &

Criterion aresubtypes.

7/30/2019 Validity 07

12/54

12

Parenthetical Point Face Validity

Face validity refers

to the appearance

that a test measures

what it is intended tomeasure.

Face validity has P.R.

value test-takers

may have better

motivation if the testappears to be a

sensible way to

measure what it

measures.

7/30/2019 Validity 07

13/54

13

CSEPT: Content validity

Content-related

evidence considers

coverage of the

conceptual domaintested.

Important in

educational settings

Like face validity, it is

determined by logicrather than statistics

Typically assessed by

expert judges

7/30/2019 Validity 07

14/54

14

CSEPT: Content validity

Content-related

evidence considers

coverage of the

conceptual domaintested.

Construct-irrelevant

variance

Construct under-representation

Is each item relevant

to domain?

Is domain adequately

covered or are partsof it left out?

But if you are going to

ask these questions,

why not do it when

creating the test?

7/30/2019 Validity 07

15/54

15

Borsboom et al.: Content validity

Borsboom et al.

would say that

content validity is not

something to beestablished after the

test has been

created.

Rather, you build it

into your test by

having a good theory

of what you aretesting

E.g., for a test in this

course to have

content validity, itshould test your

understanding of

content validity!

7/30/2019 Validity 07

16/54

16

CSEPT: Criterion validity

Criterion-related

evidence tells us how

well a test score

corresponds to aparticularcriterion

measure.

A criterion is a

standard against

which a test is

compared. The test score should

tell us something

about the criterion

score.

7/30/2019 Validity 07

17/54

17


A criterion is a

standard against

which a test is

compared.

E.g., we could

compare GPAs to

SAT scores to

produce evidence ofvalidity ofconclusions

drawn on basis of

SAT scores

Two basic types: Predictive

Concurrent

7/30/2019 Validity 07

18/54

18


Predictive validity Test scores used to

predict future

performance how

good is theprediction?

E.g., SAT is used to

predict final

undergraduate GPA

SAT GPA are

moderately correlated

7/30/2019 Validity 07

19/54

19


Predictive validity

Concurrent validity

Correlation between

test scores and

criterion when the two

are measured atsame time.

Test illuminates

current performance

rather than predictingfuture performance

(e.g., why does

patient have a

temperature? Why

7/30/2019 Validity 07

20/54

20

Borsboom et al.: Criterion validity

Criterion validity

involves a correlation,

of test scores with

some criterion suchas GPA

That does not

establish the tests

validity, only its utility.

E.g., height andweight are correlated,

but a test of height is

not a test of what

bathroom scalesmeasure.

7/30/2019 Validity 07

21/54

21


SAT is valid because

it was developed on

the sensible theory

that past academicachievement is a

good guide to future

academic

achievement

Validityis built into

the test, not

established after the

test has been created

7/30/2019 Validity 07

22/54

22


Validation research

aims at showing how

variation in the

attribute causesvariation in the test

score

This requires a

theory of the task:

how does the test-

taker do the mentaloperations needed to

respond to test items?

7/30/2019 Validity 07

23/54

23


Note: no point indeveloping a test ifyou already have acriterion unlessimpracticality orexpense makes useof the criteriondifficult.

Criterion measure

only available in the

future?

Criterion tooexpensive to use?

7/30/2019 Validity 07

24/54

24


Validity Coefficient Compute correlation

(r) between test score

and criterion.

r = .30 or .40 wouldbe considered

normal.

r > .60 is rare

Note: r varies between

-1.0 and +1.0

7/30/2019 Validity 07

25/54

25


Validity Coefficient r2 gives proportion of

variance in criterion

explained by test

score. E.g., if rxy = .30, r

2 =

.09, so 9% of

variability in Y can be

explained by variationin X

7/30/2019 Validity 07

26/54

26


Interpreting Validity

Coefficients watch

out for:

1. Changes in causal

relationships

2. What does criterion

mean? Is it valid,reliable?

3. Is subject population

for validity study

appropriate?

4. Sample size

7/30/2019 Validity 07

27/54

27


Interpreting Validity

Coefficients watch

out for:

5. Criterion/predictor

confusion

6. Range restrictions

7. Do validity studyresults generalize?

8. Differential

predictions

7/30/2019 Validity 07

28/54

28

CSEPT: Construct validity

Problem: for many

psychological

characteristics of

interest there is noagreed-upon

universe of content

and no clear criterion

We cannot assess

content or criterion

validity for such

characteristics These characteristics

involve constructs:

something built by

mental synthesis.

7/30/2019 Validity 07

29/54

29


Examples ofconstructs:

Intelligence Love

Curiosity

Mental health

CSEPT: We obtain

evidence of validity by

simultaneously

defining the constructand developing

instruments to

measure it.

This isbootstrapping.

7/30/2019 Validity 07

30/54

30

Bootstrapping construct validity

assemble evidence

about what a test

means in other

words, about thecharacteristic it is

testing.

CSEPT: this process

is never finished

Borsboom: this is part

of the process ofcreating a test in the

first place, not

something done after

the fact

31

7/30/2019 Validity 07

31/54

31


assemble evidence

show relationshipsbetween a test andother tests

none of the othertests is a criterion

Borsboom: theserelationships do nottell us what a testscore means (e.g., age is correlated

with annual income

but a measure of ageis not a measure ofannual income).

32

7/30/2019 Validity 07

32/54

32


assemble evidence

show relationships

each new relationship

adds meaning to thetest

tests meaning is

gradually clarified

over time

Borsboom would say,why all the mystery?

The meaning of many

tests (e.g., WAIS,

academic exams,Piagets tests) is clear

right from the start

33

7/30/2019 Validity 07

33/54

33


Example from text:Rubins work on Love.

Rubin collected a set

of items for a Love

scale

He read poetry,novels; asked people

for definitions

created a scale of

Love and one of

Liking

34

7/30/2019 Validity 07

34/54

34


Rubin gave scale tomany subjects &factor-analyzedresults

Love integrates

Attachment, Caring, &

Intimacy

LikingintegratesAdjustment, Maturity,

Good Judgment, and

Intelligence

The two areindependent: you can

love someone you

dont like (as song-

writers know)

35

7/30/2019 Validity 07

35/54

35

Campbell & Fiske (1959)

Two types of

Construct-related

Evidence

Convergent evidence

When a test

correlates well with

other tests believed to

measure the sameconstruct

36

7/30/2019 Validity 07

36/54

36

Campbell & Fiske (1959)

Two types of

Construct-related

Evidence

Convergent evidence

Discriminant evidence

When a test does not

correlate with other

tests believed to

measure some otherconstruct.

37

7/30/2019 Validity 07

37/54

37

Convergent validity

Example Health

Index

Scores correlated

with age, number of

symptoms, chronic

medical conditions,physiological

measures

Treatments designed

to improve healthshould increase

Health Index scores.

They do.

38

7/30/2019 Validity 07

38/54

38

Discriminant validity

low correlations

between new test and

tests believed to tap

unrelated constructs.

evidence that the new

test measures

something unique

39

7/30/2019 Validity 07

39/54

39

CSEPT: Validity & Reliability

CSEPT: No point in

trying to establish

validity of an

unreliable test.

Its possible to have a

reliable test that has

no meaning (is not

valid). Logically impossible

to produce evidence

of validity for an

unreliable test.

40

7/30/2019 Validity 07

40/54

40

Borsboom: Validity & Reliability

Borsboom et al: what

does it mean to say

that a test is reliable

but not valid?

What is it a test of?

It isnt a test at all, just

a collection of items

41

7/30/2019 Validity 07

41/54

41

Borsboom: Validity & Reliability

Borsboom et al:validity is a necessarycondition for reliability

Reliability of a test ofX estimatesprecisionof measurementof X

but how could you

estimate the precisionof measurement of Xfor a test that doesnot measure X?

Thus, validity ispresumed when youassess reliability

42

7/30/2019 Validity 07

42/54

42

Blanton & Jaccard arbitrary metrics

We observe a

behavior in order to

learn about the

underlyingpsychological

characteristic

A persons test score

represents theirstanding on that

underlying dimension

Such scores form an

arbitrary metric

That is, we do not

know how theobserved scores are

related to the true

scores on the


7/30/2019 Validity 07

43/54

6543210

0 1 2 3 4 5 6

Person A Person B

Underlying

dimension

Test 1

Test 2

Adapted from Blanton & Jaccard (2006) Figure 1, p. 29

Neutral

44

7/30/2019 Validity 07

44/54

44

Arbitrary metrics the IAT

Implicit Association

Test (IAT) claimed

to diagnose implicit

attitudinal preferences or racist attitudes

IAT authors say you

may have prejudices

you dont know you

have. Are these claims

true?

45

7/30/2019 Validity 07

45/54

45


Task: categorize

stimuli using two pairs

of categories

Two buttons to press,

two assignments of

categories to buttons,

used in sequence

46

7/30/2019 Validity 07

46/54

46


Assignment pattern A

Button 1 press if

stimulus refers to the

category White or thecategory Pleasant

Button 2 press if


category Blackor thecategory Unpleasant

Assignment pattern B

Button 1 press if


category White or thecategory Unpleasant

Button 2 press if


category Blackor thecategory Pleasant

47

7/30/2019 Validity 07

47/54

47


IAT authors claim that

if responses are

faster to Pattern A

than to Pattern B, thatindicates a

preference for

Whites over Blacks

in other words, aracist attitude

IAT authors also give

test-takers feedback

about how strong

their preferences are,based on how much

fastertheir responses

are to Pattern A than

to Pattern B This is inappropriate

48

7/30/2019 Validity 07

48/54

48


Blanton & Jaccard: The IAT does not tell

us about racist

attitudes

IAT authors take adimension which is

non-arbitrary when

used by physicists

time and use it in anarbitrary way in

psychology

49

7/30/2019 Validity 07

49/54

49


The function relatingthe responsedimension (time) tothe underlying

dimension (attitudes)is unknown

Zero on the (PatternA Pattern B)difference may not bezero on the

underlying attitudepreference dimension

There are alternativemodels of how that

(Pattern A PatternB) difference couldarise

50

7/30/2019 Validity 07

50/54

Review

CSEPT:

1.Validity is a

characteristic of

evidence, not of tests.2.Valid evidence

supports conclusions

drawn using test

results

3.Validity is determined

by social

consequences of test

Borsboom et al.

1. Validity is not a

methodological issue,

but a substantive

(theoretical) issue

2. A test of an attribute is

valid if (a) the attribute

exists, and (b) variation

in the attribute causesvariation in test scores

51

7/30/2019 Validity 07

51/54

Review

CSEPT:

4. Validity can be

established in three

ways, though boundaries

between them are fuzzy:A. Content-related evidence

B. Criterion-related evidence

C. Construct-related

evidence

Borsboom et al:

3. Its all the same validity:

a test is valid if it

measures what you think

it measures

4. Validity is not mysterious

52

7/30/2019 Validity 07

52/54

Review

CSEPT

5. Content-related

evidence: do test

items representwhole domain of

interest?

6. Criterion-related

evidence: do testscores relate to a

criterion either now

(concurrent) or in the

future redictive ?

Borsboom et al.

5. These questions are

properly part of the

process of creating atest

53

7/30/2019 Validity 07

53/54

Review

CSEPT

6. Construct-related

evidence is obtained

when we develop apsychological

construct and the

way to measure it at

the same time.7. A test can be reliable

but not valid. A test

cannot be valid if not

reliable.

Borsboom et al.

6. A test must be valid

for a reliability

estimate to have anymeaning

54

7/30/2019 Validity 07

54/54

Review

Blanton & Jaccard

(2006) warn against

over-interpretation of

scores which arebased on an arbitrary

metric

For an arbitrary

metric, we have no

idea how the test

scores are actuallyrelated to the


Date post:	14-Apr-2018
Category:	Documents
Upload:	saad209
View:	217 times
Download:	0 times

Validity 07

Documents