Test Validity “… the development of a valid test requires multiple procedures, which are...

“… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation process begins with the formulation of detailed trait or construct definitions … Test items are then prepared to fit the construct definitions. Empirical item analyses follow … Other appropriate internal analyses may then be carried out … The final stage includes validation and cross-validation of various scores and interpretive combinations of scores through statistical analyses against external, real-life criteria.” (Anastasi, 1986, p.3)

Almost any information gathered in the process of developing or using a test is relevant to its validity … If we think of test validity in terms of understanding what a particular test measures, it should be apparent that virtually any empirical data obtained with the test represent a potential source of validity information.” (Anastasi, 1986, p.3)

Test Validity

Test Validation ProcessDefine Objectives

State Inferences

Decide on Methods to Test

Inferences

Collect Evidence

Types of Validity

Content Validity [the extent to which test items represent a domain]

a) Subject Matter Expert Opinions (e.g., CVR statistic)

b) Internal consistency reliability

c) Correlation with other similar tests

Content relevance Domain specification

Content coverage Domain representativeness

1) Perform a job analysis• Description of job tasks• Rating of job tasks on various criteria• Specification of KSAs• Rating of KSAs on various criteria• Link/connect tasks to KSAs

From SIOP Principles: “The characterization of the work domain should be based on accurate and thorough information about the work including analysis of work behaviors and activities, responsibilities of the job incumbents, and/or the KSAOs prerequisite to effective to effective performance on the job. The researcher should indicate what important work behaviors , activities, and worker KSAOs are included in the domain, describe how the content of the domain is linked to the selection procedure, and explain why certain parts of the domain were or were not included in the selection procedure.” (p. 22)

2) Selection of SMEsFrom SIOP Principles: “ The success of the content-based study is closely related to

the qualifications of the subject matter experts (SMEs) … The experts should have thorough knowledge of the work behaviors and activities, responsibilities of job incumbents, and the KSAOs prerequisite to effective to effective performance on the job” (p. 22)

3) Writing (or selecting) and evaluation of selection measure content (test items)

Steps in a Content Validation Effort

TASK -- KSA MATRIX

To what extent is each KSA needed when performing each job task?

5 = Extremely necessary, the job task cannot be performed without the KSA4 = Very necessary, the KSA is very helpful when performing the job task3 = Moderately necessary, the KSA is moderately helpful when performing the job task2 = Slightly necessary, the KSA is slightly helpful when performing the job task1 = Not necessary, the KSA is not used when performing the job task

KSA A B C D E F G H I J K L M N O P Q R

Job Tasks

1

2

3

4

5

6

7

8

9

10

11

12

13

item # KSA B KSA B C item # KSA B KSA B C

1 41

2 42

3 43

4 44

5 45

6 46

7 47

8 48

9 49

10 50

11 51

12 52

Content Validity Issues• Are the job activities and requirements stable across time?

• Does successful performance on the test require the same KSAs as successful performance on the job?

• Is the type (or mode) of testing procedure the same as that required on the job?

• Do some KSAs not required on the job exist on the test?

• Not useful when abstract constructs are being measured (a small inferential leap is required between the test content and job requirements)From Anastasi (1986): “When tests are designed for use within special contexts, the relevant constructs are usually derived from content analysis of particular behavior domains” (p. 7).

From SIOP Principles: “ When selection procedure content is linked to job content, content-oriented strategies are useful. When selection procedure content is less clearly linked to job content, other sources of validity evidence take precedence” (p. 23).

Predictive

[Correlation between test scores of applicants and their performance scores when some time interval has passed after they are hired]

• Range restriction issue on performance scores

• Time, cost, & pragmatic concerns

Criterion-related Validity

Concurrent [Correlation between test scores and performance scores of current employees]• Motivation level• Guessing, Faking• Job experience factor• Range restriction issue on performance scores

Types of Validity (cont.)

Criterion-related Validity Issues

A) Job Stability

B) Reliable and relevant measure of job performanceFrom SIOP Principles: “A relevant, reliable, and uncontaminated criterion(s) must be

obtained or developed. Of these characteristics, the most important is relevance. A relevant criterion is one that reflects the relative standing of employees with respect to important work behavior(s) or outcome measure(s). If such a criterion measure does not exist or cannot be developed, use of a criterion-related validation strategy is not feasible (p. 14).

C) Use of a representative sample of people and jobs

D) Large sample (on predictor and criterion)From SIOP Principles: “A competent criterion-related validity study should be

based on a sample that is reasonably representative of the work and candidate pool … A number of factors related to statistical power can influence the feasibility of a criterion-related study. Among these factors are the degree (and type) of range restriction in the predictor or the criterion, reliability of the criterion, and statistical power (p. 14)

Legal Issues and Criterion-related Validity

• Court focus on the content of measures as opposed to criterion validity evidence (relationship between test cores and job performance)

• Emphasis on the legal history of tests

• Criterion-validity emphasis versus concurrent validity designs

• Statistical significant relationships are not always acceptable (consideration of other factors such as test utility)

• Reliability of both the criterion (job performance) and the predictor (test)

• Restriction of range (on both the test and job performance measure)

• Contamination of the criterion (e.g., measure of job performance is affected by other variables rather than one’s ability or knowledge)

Factors Affecting the Validity Coefficient[correlation between a test and job

performance]

Standard error of estimate

(validity coefficient):y’ = y 1 - r

2xy

y = standard deviation of y

(criterion)

r2

xy = correlation between x

and y squared

Correction for Attenuation

Observed validity coefficientT =

x y xy0

yyCriterion reliability

Validity coefficient

= =

S

S1

1

of unrestricted sample

of restricted sample

1 - + 2 2S 2

1

S 21

Range of Restriction (Predictor)

1 - (1 - )2S

S

2121

Range Restriction (Criterion)

Selection Ratio (SR) = n

N

# Job openings

# Applicants

Test Validity [Criterion-related]: The extent to which test scores correlate with job performance scores [Range is from 0 to 1.0]

Test Utility Key Points

Proportion of “Successes” Expected Through the Use of Test of Given Validity and Given Selection Ratio, for Base Rate .60.

(From Taylor & Russell, 1939, p. 576)

Selection RatioValidity .05 .10 .20 .30 .40 .50 .60 .70 .80 .90 .95 .00 .60 .60 .60 .60 .60 .60 .60 .60 .60 .60 .60 .05 .64 .63 .63 .62 .62 .62 .61 .61 .61 .60 .60 .10 .68 .67 .65 .64 .64 .63 .63 .62 .61 .61 .60 .15 .71 .70 .68 .67 .66 .65 .64 .63 .62 .61 .60 .20 .75 .73 .71 .69 .67 .66 .65 .64 .63 .62 .61

.25 .78 .76 .73 .71 .69 .68 .66 .65 .63 .62 .61 .30 .82 .79 .76 .73 .71 .69 .68 .66 .64 .62 .61 .35 .85 .82 .78 .75 .73 .71 .69 .67 .65 .63 .62 .40 .88 .85 .81 .78 .75 .73 .70 .68 .66 .63 .62 .45 .90 .87 .83 .80 .77 .74 .72 .69 .66 .64 .62 .50 .93 .90 .86 .82 .79 .76 .73 .70 .67 .64 .62 .55 .95 .92 .88 .84 .81 .78 .75 .71 .68 .64 .62 .60 .96 .94 .90 .87 .83 .80 .76 .73 .69 .65 .63 .65 .98 .96 .92 .89 .85 .82 .78 .74 .70 .65 .63 .70 .99 .97 .94 .91 .87 .84 .80 .75 .71 .66 .63 .75 .99 .99 .96 .93 .90 .86 .81 .77 .71 .66 .63 .80 1.00 .99 .98 .95 .92 .88 .83 .78 .72 .66 .63 .85 1.00 1.00 .99 .97 .95 .91 .86 .80 .73 .66 .63 .90 1.00 1.00 1.00 .99 .97 .94 .88 .82 .74 .67 .63 .95 1.00 1.00 1.00 1.00 .99 .97 .92 .84 .75 .67 .631.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .86 .75 .67 .63

Note: A full set of tables can be found I Taylor and Russell (1939) and in McCormick and Ilgen (1980, Appendix B).

Mean Standard Criterion Score of Accepted Cases in Relation to Test Validity and Selection Ratio(From Brown & Ghiselli, 1953, p. 342)

Validity CoefficientSelectionRatio .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00

.05 .00 .10 .21 .31 .42 .52 .62 .73 .83 .94 1.04 1.14 1.25 1.35 1.46 1.56 1.66 1.77 1.87 1.98 2.08

.10 .00 .09 .18 .26 .35 .44 .53 .62 .70 .79 .88 .97 1.05 1.14 1.23 1.32 1.41 1.49 1.58 1.67 1.76 .15 .00 .08 .15 .23 .31 .39 .46 .54 .62 .70 .77 .85 .93 1.01 1.08 1.16 1.24 1.32 1.39 1.47 1.55 .20 .00 .07 .14 .21 .28 .35 .42 .49 .56 .63 .70 .77 .84 .91 .98 1.05 1.12 1.19 1.26 1.33 1.40.25 .00 .06 .13 .19 .25 .32 .38 .44 .51 .57 .63 .70 .76 .82 .89 .95 1.01 1.08 1.14 1.20 1.27.30 .00 .06 .12 .17 .23 .29 .35 .40 .46 .52 .58 .64 .69 .75 .81 .87 .92 .98 1.04 1.10 1.16 .35 .00 .05 .11 .16 .21 .26 .32 .37 .42 .48 .53 .58 .63 .69 .74 .79 .84 .90 .95 1.00 1.06.40 .00 .05 .10 .15 .19 .24 .29 .34 .39 .44 .48 .53 .58 .63 .68 .73 .77 .82 .87 .92 .97.45 .00 .04 .09 .13 .18 .22 .26 .31 .35 .40 .44 .48 .53 .57 .62 .66 .70 .75 .79 .84 .88.50 .00 .04 .08 .12 .16 .20 .24 .28 .32 .36 .40 .44 .48 .52 .56 .60 .64 .68 .72 .76 .80.50 .00 .04 .07 .11 .14 .18 .22 .25 .29 .32 .36 .40 .43 .47 .50 .54 .58 .61 .65 .68 .72.60 .00 .03 .06 .10 .13 .16 .19 .23 .26 .29 .32 .35 .39 .42 .45 .48 .52 .55 .58 .61 .64.65 .00 .03 .06 .09 .11 .14 .17 .20 .23 .26 .28 .31 .34 .37 .40 .43 .46 .48 .51 .54 .57.70 .00 .02 .05 .07 .10 .12 .15 .17 .20 .22 .25 .27 .30 .32 .35 .37 .40 .42 .45 .47 .50.75 .00 .02 .04 .06 .08 .11 .13 .15 .17 .19 .21 .23 .25 .27 .30 .32 .33 .36 .38 .40 .42.80 .00 .02 .04 .05 .07 .09 .11 .12 .14 .16 .18 .19 .21 .22 .25 .26 .28 .30 .32 .33 .35.85 .00 .01 .03 .04 .05 .07 .08 .10 .11 .12 .14 .15 .16 .18 .19 .20 .22 .23 .25 .26 .27.90 .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 .10 .11 .12 .13 .14 .15 .16 .17 .18 .19 .20.95 .00 .01 .01 .02 .02 .03 .03 .04 .04 .05 .05 .06 .07 .07 .08 .08 .09 .09 .10 .10 .11

Selection Ratio Example (cont.)

Ns rxy SDyZx – NT (C)

# of applicants selected

validity coefficient

standard deviation of job performance in

dollar terms

average score on the selection procedure of those selected (standard score)

number of applicants assessed

cost of assessing each applicant

Example of Brogden and Cronbach & Gleser Models

Construct ValidityMultitrait-Multimethod Matrix (Campbell & Fiske, 1959)

Types of Validity (cont.)

Construct Validity [extent to which a test assesses the construct it intends

to measure] • Correlation between scores measuring a construct (e.g., anxiety) with one method (e.g., paper & pencil) with scores on the same construct using a different method (e.g., interview) [Convergent validation]• Correlation between scores measuring a construct (e.g., anxiety) using one method (e.g., paper & pencil) with scores on a different construct (e.g., leadership) assessed with a different method (e.g., interview) [Discriminant validation]

“Construct validation is indeed a never-ending process. However, that should not preclude using the test operationally to help solve practical problems and reach real-life decisions as soon as the available validity information has reached an acceptable level for a particular application. This level varies with the type of test and the way it will be used. Establishing this level requires informed professional judgment within the appropriate specialty of professional practice.” (Anastasi, p.4)

Non minority

Minority

Performance Criterion

Satisfactory

Unsatisfactory

Reject AcceptPredictor Score

Equal validity, unequal criterion means

- Equal test scores; Minorities performing less well on job (over predicting performance)

- Minorities hired same as non minorities but probability of success is small. Can reinforce existing stereotypes.

Minority

Non minority

Satisfactory

Unsatisfactory

PerformanceCriterion

Reject AcceptPredictor Score

Equal validity, unequal predictor means

- Job performance is equal

- Test scores are greater for non-minorities

Intercept Bias (Test)

Minority

Non minority

Accept Reject

Performance Criterion

Unsatisfactory

Satisfactory

Equal predictor means, but validity only for non minority groups

Predictor score

• Equal test scores and criterion scores• No validity for minorities (only should be used for non minorities)• No adverse impact same numbers hired in each group- However, more non-minorities will succeed on jobs; can reinforced stereotypes

•

Situational specificity or

Generalizibility of test validity across samples?

Fluctuations in validity coefficients may often be due to:

• Small sample sizes (e.g., many have samples of 50 or less employees)

• Unreliable criterion measures

• Restriction of range in employee samples

Some evidence that certain tests (e.g., aptitude tests) may can be generalized across a variety of occupations

Significant effect foundNo significant effect found

Findings of study

No significanc

e exists

Significance exists

Reality

Correct decision(accept null)

Correct decision(reject null)

Type I error (“false positive”)

Type II error (“false positive”)

“Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.” (Fisher, 1935, p.19)

Statistical Power and Hypothesis Testing

Date post:	17-Jan-2018
Category:	Documents
Upload:	douglas-wilcox
View:	221 times
Download:	0 times

Test Validity “… the development of a valid test requires multiple procedures, which are...

Documents