+ All Categories
Home > Documents > Validity 07

Validity 07

Date post: 14-Apr-2018
Category:
Upload: saad209
View: 217 times
Download: 0 times
Share this document with a friend

of 54

Transcript
  • 7/30/2019 Validity 07

    1/54

    1

    Validity Outline

    1.Definition

    2.Validity: Two Different Views

    3.Types of Validity

    A. FaceB. Content

    C. Criterion

    i. Predictive vs. Concurrent

    ii. Validity CoefficientsD. Construct

    i. Convergent

    ii. Discriminant

  • 7/30/2019 Validity 07

    2/54

    2

    Validity Definition

    Validitymeasures

    agreement between a

    test score and the

    characteristic it isbelieved to measure

    The basic question is:

    are you measuring

    what you think youre

    measuring?

  • 7/30/2019 Validity 07

    3/54

    3

    Validity: two very different views

    Traditional:

    Validity is a property of

    tests

    Does the test measure

    what you think it

    measures?

  • 7/30/2019 Validity 07

    4/54

    4

    Validity: two very different views

    Traditional

    Recent (e.g, Messick,

    1989; Committee on

    Standards forEducational and

    Psychological Testing

    (CSEPT)):

    Validity is a property of

    test score

    interpretations

    Validity exists when

    actions based on the

    interpretation are

    justified given a

    theoretical basis and

    social consequences

  • 7/30/2019 Validity 07

    5/54

    5

    Note the difference:

    Does the test

    measure what you

    think it measures?

    Validity exists when

    actions based on the

    interpretation are

    justified given atheoretical basis and

    social consequences

  • 7/30/2019 Validity 07

    6/54

    6

    A problem with the CSEPT view

    Who is to say the social

    consequences of test

    use are good or bad?

    According to CSEPT

    validity is a subjective

    judgment

    In my view, this makes

    the concept useless: if

    you like the result the test

    gives you, you will

    consider it valid. If you

    dont, you wont. Thats not how scientists

    think.

  • 7/30/2019 Validity 07

    7/54

    7

    Borsboom et al. (2004)

    Borsboom et al reject

    CSEPTs view

    Validity is a very

    basic concept and

    was correctly

    formulated, forinstance, by Kelley

    (1927, p. 14) when he

    stated that a test is

    valid if it measureswhat it purports to

    measure. (p. 1061)

  • 7/30/2019 Validity 07

    8/54

    8

    Borsboom et al. (2004)

    a test is valid formeasuring anattribute if and only if(a) the attribute existsand (b) variations inthe attribute causallyproduce variations inthe outcomes of the

    measurementprocedure.

    Variations in what youare measuring causevariations in yourmeasurements.

    E.g., variations acrosspeople in intelligencecause variations intheir IQ scores

    This is not acorrelational model ofvalidity

  • 7/30/2019 Validity 07

    9/54

    9

    Borsboom et al. (2004)

    You dont create a

    test and then do the

    analysis necessary to

    establish its validity

    Rather, you begin by

    doing the theoretical

    work necessary to

    create a valid test inthe first place.

    On this view, validity

    is not a big issue.

  • 7/30/2019 Validity 07

    10/54

    10

    Borsboom et al. vs. CSEPT

    Who is right?

    Each scientist has to

    make up his or her

    own mind on thatquestion

    I find Borsboom et

    al.s arguments

    compelling.

    Other psychologistsmay disagree

  • 7/30/2019 Validity 07

    11/54

    11

    The CSEPT view

    CSEPT recognizes 3

    types of evidence for

    test validity:

    Content-related Criterion-related

    Construct-related

    Boundaries not clearly

    defined

    Cronbach (1980):

    Construct is basic,

    while Content &

    Criterion aresubtypes.

  • 7/30/2019 Validity 07

    12/54

    12

    Parenthetical Point Face Validity

    Face validity refers

    to the appearance

    that a test measures

    what it is intended tomeasure.

    Face validity has P.R.

    value test-takers

    may have better

    motivation if the testappears to be a

    sensible way to

    measure what it

    measures.

  • 7/30/2019 Validity 07

    13/54

    13

    CSEPT: Content validity

    Content-related

    evidence considers

    coverage of the

    conceptual domaintested.

    Important in

    educational settings

    Like face validity, it is

    determined by logicrather than statistics

    Typically assessed by

    expert judges

  • 7/30/2019 Validity 07

    14/54

    14

    CSEPT: Content validity

    Content-related

    evidence considers

    coverage of the

    conceptual domaintested.

    Construct-irrelevant

    variance

    Construct under-representation

    Is each item relevant

    to domain?

    Is domain adequately

    covered or are partsof it left out?

    But if you are going to

    ask these questions,

    why not do it when

    creating the test?

  • 7/30/2019 Validity 07

    15/54

    15

    Borsboom et al.: Content validity

    Borsboom et al.

    would say that

    content validity is not

    something to beestablished after the

    test has been

    created.

    Rather, you build it

    into your test by

    having a good theory

    of what you aretesting

    E.g., for a test in this

    course to have

    content validity, itshould test your

    understanding of

    content validity!

  • 7/30/2019 Validity 07

    16/54

    16

    CSEPT: Criterion validity

    Criterion-related

    evidence tells us how

    well a test score

    corresponds to aparticularcriterion

    measure.

    A criterion is a

    standard against

    which a test is

    compared. The test score should

    tell us something

    about the criterion

    score.

  • 7/30/2019 Validity 07

    17/54

    17

    CSEPT: Criterion validity

    A criterion is a

    standard against

    which a test is

    compared.

    E.g., we could

    compare GPAs to

    SAT scores to

    produce evidence ofvalidity ofconclusions

    drawn on basis of

    SAT scores

    Two basic types: Predictive

    Concurrent

  • 7/30/2019 Validity 07

    18/54

    18

    CSEPT: Criterion validity

    Predictive validity Test scores used to

    predict future

    performance how

    good is theprediction?

    E.g., SAT is used to

    predict final

    undergraduate GPA

    SAT GPA are

    moderately correlated

  • 7/30/2019 Validity 07

    19/54

    19

    CSEPT: Criterion validity

    Predictive validity

    Concurrent validity

    Correlation between

    test scores and

    criterion when the two

    are measured atsame time.

    Test illuminates

    current performance

    rather than predictingfuture performance

    (e.g., why does

    patient have a

    temperature? Why

  • 7/30/2019 Validity 07

    20/54

    20

    Borsboom et al.: Criterion validity

    Criterion validity

    involves a correlation,

    of test scores with

    some criterion suchas GPA

    That does not

    establish the tests

    validity, only its utility.

    E.g., height andweight are correlated,

    but a test of height is

    not a test of what

    bathroom scalesmeasure.

  • 7/30/2019 Validity 07

    21/54

    21

    Borsboom et al.: Criterion validity

    SAT is valid because

    it was developed on

    the sensible theory

    that past academicachievement is a

    good guide to future

    academic

    achievement

    Validityis built into

    the test, not

    established after the

    test has been created

  • 7/30/2019 Validity 07

    22/54

    22

    Borsboom et al.: Criterion validity

    Validation research

    aims at showing how

    variation in the

    attribute causesvariation in the test

    score

    This requires a

    theory of the task:

    how does the test-

    taker do the mentaloperations needed to

    respond to test items?

  • 7/30/2019 Validity 07

    23/54

    23

    CSEPT: Criterion validity

    Note: no point indeveloping a test ifyou already have acriterion unlessimpracticality orexpense makes useof the criteriondifficult.

    Criterion measure

    only available in the

    future?

    Criterion tooexpensive to use?

  • 7/30/2019 Validity 07

    24/54

    24

    CSEPT: Criterion validity

    Validity Coefficient Compute correlation

    (r) between test score

    and criterion.

    r = .30 or .40 wouldbe considered

    normal.

    r > .60 is rare

    Note: r varies between

    -1.0 and +1.0

  • 7/30/2019 Validity 07

    25/54

    25

    CSEPT: Criterion validity

    Validity Coefficient r2 gives proportion of

    variance in criterion

    explained by test

    score. E.g., if rxy = .30, r

    2 =

    .09, so 9% of

    variability in Y can be

    explained by variationin X

  • 7/30/2019 Validity 07

    26/54

    26

    CSEPT: Criterion validity

    Interpreting Validity

    Coefficients watch

    out for:

    1. Changes in causal

    relationships

    2. What does criterion

    mean? Is it valid,reliable?

    3. Is subject population

    for validity study

    appropriate?

    4. Sample size

  • 7/30/2019 Validity 07

    27/54

    27

    CSEPT: Criterion validity

    Interpreting Validity

    Coefficients watch

    out for:

    5. Criterion/predictor

    confusion

    6. Range restrictions

    7. Do validity studyresults generalize?

    8. Differential

    predictions

  • 7/30/2019 Validity 07

    28/54

    28

    CSEPT: Construct validity

    Problem: for many

    psychological

    characteristics of

    interest there is noagreed-upon

    universe of content

    and no clear criterion

    We cannot assess

    content or criterion

    validity for such

    characteristics These characteristics

    involve constructs:

    something built by

    mental synthesis.

  • 7/30/2019 Validity 07

    29/54

    29

    CSEPT: Construct validity

    Examples ofconstructs:

    Intelligence Love

    Curiosity

    Mental health

    CSEPT: We obtain

    evidence of validity by

    simultaneously

    defining the constructand developing

    instruments to

    measure it.

    This isbootstrapping.

  • 7/30/2019 Validity 07

    30/54

    30

    Bootstrapping construct validity

    assemble evidence

    about what a test

    means in other

    words, about thecharacteristic it is

    testing.

    CSEPT: this process

    is never finished

    Borsboom: this is part

    of the process ofcreating a test in the

    first place, not

    something done after

    the fact

    31

  • 7/30/2019 Validity 07

    31/54

    31

    Bootstrapping construct validity

    assemble evidence

    show relationshipsbetween a test andother tests

    none of the othertests is a criterion

    Borsboom: theserelationships do nottell us what a testscore means (e.g., age is correlated

    with annual income

    but a measure of ageis not a measure ofannual income).

    32

  • 7/30/2019 Validity 07

    32/54

    32

    Bootstrapping construct validity

    assemble evidence

    show relationships

    each new relationship

    adds meaning to thetest

    tests meaning is

    gradually clarified

    over time

    Borsboom would say,why all the mystery?

    The meaning of many

    tests (e.g., WAIS,

    academic exams,Piagets tests) is clear

    right from the start

    33

  • 7/30/2019 Validity 07

    33/54

    33

    CSEPT: Construct validity

    Example from text:Rubins work on Love.

    Rubin collected a set

    of items for a Love

    scale

    He read poetry,novels; asked people

    for definitions

    created a scale of

    Love and one of

    Liking

    34

  • 7/30/2019 Validity 07

    34/54

    34

    CSEPT: Construct validity

    Rubin gave scale tomany subjects &factor-analyzedresults

    Love integrates

    Attachment, Caring, &

    Intimacy

    LikingintegratesAdjustment, Maturity,

    Good Judgment, and

    Intelligence

    The two areindependent: you can

    love someone you

    dont like (as song-

    writers know)

    35

  • 7/30/2019 Validity 07

    35/54

    35

    Campbell & Fiske (1959)

    Two types of

    Construct-related

    Evidence

    Convergent evidence

    When a test

    correlates well with

    other tests believed to

    measure the sameconstruct

    36

  • 7/30/2019 Validity 07

    36/54

    36

    Campbell & Fiske (1959)

    Two types of

    Construct-related

    Evidence

    Convergent evidence

    Discriminant evidence

    When a test does not

    correlate with other

    tests believed to

    measure some otherconstruct.

    37

  • 7/30/2019 Validity 07

    37/54

    37

    Convergent validity

    Example Health

    Index

    Scores correlated

    with age, number of

    symptoms, chronic

    medical conditions,physiological

    measures

    Treatments designed

    to improve healthshould increase

    Health Index scores.

    They do.

    38

  • 7/30/2019 Validity 07

    38/54

    38

    Discriminant validity

    low correlations

    between new test and

    tests believed to tap

    unrelated constructs.

    evidence that the new

    test measures

    something unique

    39

  • 7/30/2019 Validity 07

    39/54

    39

    CSEPT: Validity & Reliability

    CSEPT: No point in

    trying to establish

    validity of an

    unreliable test.

    Its possible to have a

    reliable test that has

    no meaning (is not

    valid). Logically impossible

    to produce evidence

    of validity for an

    unreliable test.

    40

  • 7/30/2019 Validity 07

    40/54

    40

    Borsboom: Validity & Reliability

    Borsboom et al: what

    does it mean to say

    that a test is reliable

    but not valid?

    What is it a test of?

    It isnt a test at all, just

    a collection of items

    41

  • 7/30/2019 Validity 07

    41/54

    41

    Borsboom: Validity & Reliability

    Borsboom et al:validity is a necessarycondition for reliability

    Reliability of a test ofX estimatesprecisionof measurementof X

    but how could you

    estimate the precisionof measurement of Xfor a test that doesnot measure X?

    Thus, validity ispresumed when youassess reliability

    42

  • 7/30/2019 Validity 07

    42/54

    42

    Blanton & Jaccard arbitrary metrics

    We observe a

    behavior in order to

    learn about the

    underlyingpsychological

    characteristic

    A persons test score

    represents theirstanding on that

    underlying dimension

    Such scores form an

    arbitrary metric

    That is, we do not

    know how theobserved scores are

    related to the true

    scores on the

    underlying dimension

  • 7/30/2019 Validity 07

    43/54

    6543210

    0 1 2 3 4 5 6

    Person A Person B

    Underlying

    dimension

    Test 1

    Test 2

    Adapted from Blanton & Jaccard (2006) Figure 1, p. 29

    Neutral

    44

  • 7/30/2019 Validity 07

    44/54

    44

    Arbitrary metrics the IAT

    Implicit Association

    Test (IAT) claimed

    to diagnose implicit

    attitudinal preferences or racist attitudes

    IAT authors say you

    may have prejudices

    you dont know you

    have. Are these claims

    true?

    45

  • 7/30/2019 Validity 07

    45/54

    45

    Arbitrary metrics the IAT

    Task: categorize

    stimuli using two pairs

    of categories

    Two buttons to press,

    two assignments of

    categories to buttons,

    used in sequence

    46

  • 7/30/2019 Validity 07

    46/54

    46

    Arbitrary metrics the IAT

    Assignment pattern A

    Button 1 press if

    stimulus refers to the

    category White or thecategory Pleasant

    Button 2 press if

    stimulus refers to the

    category Blackor thecategory Unpleasant

    Assignment pattern B

    Button 1 press if

    stimulus refers to the

    category White or thecategory Unpleasant

    Button 2 press if

    stimulus refers to the

    category Blackor thecategory Pleasant

    47

  • 7/30/2019 Validity 07

    47/54

    47

    Arbitrary metrics the IAT

    IAT authors claim that

    if responses are

    faster to Pattern A

    than to Pattern B, thatindicates a

    preference for

    Whites over Blacks

    in other words, aracist attitude

    IAT authors also give

    test-takers feedback

    about how strong

    their preferences are,based on how much

    fastertheir responses

    are to Pattern A than

    to Pattern B This is inappropriate

    48

  • 7/30/2019 Validity 07

    48/54

    48

    Arbitrary metrics the IAT

    Blanton & Jaccard: The IAT does not tell

    us about racist

    attitudes

    IAT authors take adimension which is

    non-arbitrary when

    used by physicists

    time and use it in anarbitrary way in

    psychology

    49

  • 7/30/2019 Validity 07

    49/54

    49

    Arbitrary metrics the IAT

    The function relatingthe responsedimension (time) tothe underlying

    dimension (attitudes)is unknown

    Zero on the (PatternA Pattern B)difference may not bezero on the

    underlying attitudepreference dimension

    There are alternativemodels of how that

    (Pattern A PatternB) difference couldarise

    50

  • 7/30/2019 Validity 07

    50/54

    Review

    CSEPT:

    1.Validity is a

    characteristic of

    evidence, not of tests.2.Valid evidence

    supports conclusions

    drawn using test

    results

    3.Validity is determined

    by social

    consequences of test

    Borsboom et al.

    1. Validity is not a

    methodological issue,

    but a substantive

    (theoretical) issue

    2. A test of an attribute is

    valid if (a) the attribute

    exists, and (b) variation

    in the attribute causesvariation in test scores

    51

  • 7/30/2019 Validity 07

    51/54

    Review

    CSEPT:

    4. Validity can be

    established in three

    ways, though boundaries

    between them are fuzzy:A. Content-related evidence

    B. Criterion-related evidence

    C. Construct-related

    evidence

    Borsboom et al:

    3. Its all the same validity:

    a test is valid if it

    measures what you think

    it measures

    4. Validity is not mysterious

    52

  • 7/30/2019 Validity 07

    52/54

    Review

    CSEPT

    5. Content-related

    evidence: do test

    items representwhole domain of

    interest?

    6. Criterion-related

    evidence: do testscores relate to a

    criterion either now

    (concurrent) or in the

    future redictive ?

    Borsboom et al.

    5. These questions are

    properly part of the

    process of creating atest

    53

  • 7/30/2019 Validity 07

    53/54

    Review

    CSEPT

    6. Construct-related

    evidence is obtained

    when we develop apsychological

    construct and the

    way to measure it at

    the same time.7. A test can be reliable

    but not valid. A test

    cannot be valid if not

    reliable.

    Borsboom et al.

    6. A test must be valid

    for a reliability

    estimate to have anymeaning

    54

  • 7/30/2019 Validity 07

    54/54

    Review

    Blanton & Jaccard

    (2006) warn against

    over-interpretation of

    scores which arebased on an arbitrary

    metric

    For an arbitrary

    metric, we have no

    idea how the test

    scores are actuallyrelated to the

    underlying dimension


Recommended