+ All Categories
Home > Documents > THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

Date post: 01-Jan-2016
Category:
Upload: noble-myers
View: 17 times
Download: 0 times
Share this document with a friend
Description:
THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS. LAWRENCE R. JAMES PSYCHOLOGY AND MANAGEMENT GEORGIA INSTITUTE OF TECHNOLOGY. META-ANALYSIS. Correlations involving the same or very similar predictors and criteria are retrieved from prior studies. - PowerPoint PPT Presentation
Popular Tags:
56
1 THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META- ANALYSIS LAWRENCE R. JAMES PSYCHOLOGY AND MANAGEMENT GEORGIA INSTITUTE OF TECHNOLOGY
Transcript
Page 1: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

1

THE EFFECT OF CRITERION RELIABILITY ON MEANS AND

INTERACTIONS IN META-ANALYSIS

LAWRENCE R. JAMES

PSYCHOLOGY AND MANAGEMENT

GEORGIA INSTITUTE OF TECHNOLOGY

Page 2: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

2

META-ANALYSIS

• Correlations involving the same or very similar predictors and criteria are retrieved from prior studies.

• This set of validities constitutes a distribution that can be summarized statistically using standard descriptors such as the mean and the variance.

Page 3: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

3

VALIDITY GENERALIZATION

• Archival information pertaining to statistical artifacts that might affect each validity is obtained (e.g., sampling error, reliability of criterion and predictor, range restriction).

• Distributional summary statistics are corrected for artifacts to provide estimates of the mean true (population) validity and the variance among the mean true (population) validities.

Page 4: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

4

WHY VALIDITY GENERALIZATION?

• Validity generalization is founded on the possibility that true validities from different populations may be equal, and yet the sample validities may vary because of the operation of statistical artifacts (Hunter, Schmidt, & Jackson, 1982). (This is a question of interaction.)

• There is also the strong likelihood that true validities are underestimated by sample validities due to unreliability and range restriction.

Page 5: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

5

RESULTS OF VALIDITY GENERALIZATION

Meta-analyses based on validity generalization (VG) procedures continue to be impressive

Page 6: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

6

ILLUSTRATIVE RESULTS

• General intellectual ability is said to have an average corrected validity of .53 in predicting job performance (Hunter & Hunter, 1984).

• Structured interviews can attain corrected validities in the .47 to .60 range against job performance (Huffcut & Arthur, 1994).

• Perceptual speed has an average corrected validity of .47 against clerical performance (Schmidt, 1992).

• Integrity tests have average corrected validities of .40 against job performance (applicant samples) and .47 against counterproductive behaviors (all samples) (Ones, Viswesvaran, & Schmidt, 1993).

Page 7: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

7

INFERENCES

• Many VG studies suggest that a single intellectual, cognitive, or personality trait can account for upwards of 16% to 36% of the variance in some aspect of job performance.

• The days when 16% of the variance (or a validity of .40) was the maximum expected for a trait (Ghiselli & Brown, 1955) are gone, as are the days when validities in the .20s and .30s were commonplace in the reports of ”well-done" validity studies.

Page 8: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

8

QUESTION

What precipitated this boost in validities and accountable variance?

Page 9: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

9

BETTER SCIENCE?

• Improved measurement instruments?

• More sophisticated sampling techniques?

• Superior research designs?

Page 10: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

10

Well, not really. We still rely on

• the same measurement procedures

• the same small samples

• the same bivariate correlation designs

Page 11: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

11

Then what gave rise to this bountiful enhancement in

validities?

Page 12: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

12

ENHANCEMENT IN VALIDITIES

• The boosts in validities come from correcting the observed validities, which have stayed pretty much the same, for attenuation due to unreliability in the criterion (and sometimes the predictor) and direct range restriction in the predictor.

Page 13: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

13

WHAT CHANGED?

• Change was not due to improvements in science.

• What changed was our historical cautiousness in applying correction equations to validity coefficients?

Page 14: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

14

A CULTURE OF CORRECTIONS

The genesis of this “culture of corrections” can be traced to desires to estimate relationships devoid of statistical artifacts.

Page 15: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

15

A FORERUNNER: LATENT VARIABLES

For example, latent variable procedures such as LISREL frame the opportunity to employ estimates of perfectly reliable variables in studies of covariation as a major advance in science.

Page 16: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

16

ANOTHER FORM OF LATENT VARIABLE

No less dedicated to the pursuit of truth and scientific principle is VG (Schmidt, 1992), the objective being to estimate correlations among true scores (i.e., latent variables) unencumbered by statistical artifacts (e.g., unreliability).

Page 17: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

17

RECEPTIVENESS TO CORRECTIONS

• It is the idea that corrected coefficients give greater insight into scientific truths that engendered the current culture of corrections.

• Investigators are prone to compute corrected coefficients, and editors, reviewers, and readers tend to be receptive to them.

Page 18: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

18

OUR GOALS

• It is not our intent to stand between scientists and the seeking of truth via corrected coefficients.

• We do feel that it is reasonable, however, to inquire about the statistical values that are being used to make the corrections.

• We are specifically interested in corrections for attenuation due to unreliability in criteria assessed via ratings of job performance.

• Study the effects these corrections have on the estimates of the mean true validity and the variance among the estimated true validities from separate populations.

Page 19: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

19

INTERRATER RELIABILITY FOR RATINGS

Viswesvaran, Ones, & Schmidt (1996) concluded that

• job performance is typically assessed by ratings, • the reliability of ratings should be estimated via an

interrater reliability analysis, and • the mean interrater reliability for job performance

ratings over studies is approximately .52.

Page 20: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

20

WHERE AND WHEN TO USE .52

• If a given study in a VG analysis fails to report criterion reliability, and the criterion is based on ratings, then the best estimate of the missing interrater reliability is .52.

• If one is using one of the myriad of VG equations to estimate means and variances of true correlations, and interrater reliability for ratings is missing from many studies (as is often the case), then .52 is the value to insert into the estimating equations for mean observed criterion reliability.

Page 21: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

21

CONSEQUENCE OF USING .52

It is instructive to illustrate the product of using .52

as an estimate of interrater reliability. Using the

standard correction for attenuation• an observed validity of .25 becomes a .35 (i.e.,

25/[.52]1/2 )• .30 becomes a .42, • .35 becomes a .49,• .40 becomes a .55.

Page 22: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

22

MAGNITUDE OF INCREASE

So, simply by correcting for attenuation based on an interrater reliability of .52, we obtain an 89% increase (i.e., [.552-.402]/.402) in what is regarded as the maximum expected variance accounted for by a single predictor (i.e., .16 to .30).

Page 23: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

23

AN ADVANCE IN SCIENCE?

To what extent is this 89% increase in

maximum expected variance

accounted for reflective of science?

Page 24: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

24

COMPARISONS TO OTHER VARIABLES

• Where else in personnel research do we accept, and use, measurement procedures that produce variables with reliabilities of .52?

• Is it not true that almost every conceivable variable except performance ratings would be cast out of personnel research if its reliability were .52?

Page 25: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

25

NUNNALLY & BERNSTEIN, 1994

“A reliability of .80 may not be nearly high enough in making decisions about individuals….If important decisions are being made with respect to specific test scores, a reliability of .90 is the bare minimum, and a reliability of .95 should be considered the desirable standard.” (p.265)

Page 26: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

26

DESIRABLE STANDARD FOR PERFORMANCE RATINGS

If we desire a .95 reliability for the test scores that are used to hire people for jobs, it seems reasonable to expect the same standard of reliability for the ratings that are used to determine whether people keep their jobs.

Page 27: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

27

PRACTICAL CONSIDERATIONS

• Many reliabilities for scores used to make decisions about individuals are not in the .90s.

• Many, however, are in the .80s.

• With the exception of performance ratings, almost none are in the .50s.

Page 28: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

28

QUESTIONS

• Why are performance ratings allowed to survive in spite of what most would agree is questionable measurement?

• How do we allow observed validities to be corrected for unreliability in what appear to be flawed variables, and then act as if these corrected validities actually convey some sort of credible scientific information?

Page 29: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

29

QUESTIONS (continued)

• Does anyone really believe that it makes sense to talk about a "perfectly reliable criterion" when the observed criterion begins with an interrater reliability of .52?

• How exactly does a variable in which almost one-half of the observed variance is some form of bias or error become perfectly reliable?

Page 30: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

30

WHERE IS THE NEW TECHNOLOGY?

• It would seem that researchers would have instituted the necessary improvements, given that problems with performance ratings were documented as early as 50 years ago in Guilford’s (1954) classic text in psychometrics.

• Have not hundreds of articles been written on the biases and errors that affect performance ratings, especially after the classic articles on problems with performance ratings written by Feldman and Landy & Farr?

• We know what the problems are. Why have we not fixed them?

Page 31: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

31

IS THE PROBLEM INTRACTABLE?

• Maybe it is not possible to build ratings that can achieve high interrater reliabilities.

• If we admit that this is true, then should we also not admit that we cannot justify inserting .52 in corrections for attenuation because we know that “theoretically perfectly reliable” is not going to be even remotely approximated?

Page 32: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

32

Is .52 an accurate estimate of interrater reliability?

• This issue is currently being debated elsewhere (LeBreton, Kaiser, Burgess, Atchley, & James, 2001; Murphy & DeShon, 2000a, 2000b; Schmidt, Viswesvaran, & Ones, 2000).

• If this estimate is later shown to be inaccurate or ill-founded, then a different debate ensues.

• However, for now, let us assume that the .52 estimate is legitimate and accurate.

Page 33: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

33

THE ISSUE

We may then deal with the issue of concern here, which is basing substantive scientific judgments on corrections which employ a below threshold reliability for a criterion to produce an enhanced, sometimes much enhanced, estimate of corrected validity.

Page 34: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

34

Is 40 years of research wrong and job satisfaction

really is correlated with job performance? • Judge, Thoresen, Bono, and Patton (2001) used .52 as an

estimate of criterion reliability to repudiate 40 years of research findings and previous meta-analyses that concluded that job satisfaction has a low correlation with overall job performance.

• A mean observed correlation of .18 was corrected to a mean (estimated) true correlation of .30. Correction for unreliability in the criterion accounted for approximately 60% of this increase.

• The use of .52 in the correction for attenuation was justified by arguing that this approach was “consistent with all contemporary (post-1990) meta-analytic studies involving job performance.” (p.384)

Page 35: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

35

A COMPARISON

• Had criterion reliability been .85 instead of .52, the corrected correlation would have been approximately .23 (job satisfaction reliability was set at .74). Had the reliabilities for both variables been .85, the corrected correlation would have been approximately .21.

• Neither of these correlations suggests a substantial linear, additive relationship between job satisfaction and job performance.

• Are we going to change this conclusion based on corrections engendered by not being able to measure job satisfaction particularly well and performance hardly at all?

Page 36: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

36

STATISTICAL DYSFUNCTIONS OF

CORRECTING FOR LOW RELIABILITIES • At this juncture, I hope that you realize that we

have a problem. We cannot base our science on large corrections engendered by poor measurement.

• If you have yet to be convinced, then allow me to proceed to demonstrate some unanticipated dysfunctions of inserting low reliabilities into correction equations.

• Statistics are based on a working paper by James, LeBreton, and Ladd.

Page 37: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

37

A SINGLE VG ANALYSIS

A meta-analysis is conducted on the correlations between scores on a structured interview and ratings of overall job performance.

• The mean observed correlation is .35.• Mean criterion reliability is set at .52.• Mean predictor reliability is set at .80.• The ratio between the restricted and unrestricted standard

deviations on the predictor is set at .71. (a common value).

Page 38: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

38

Result of a Single VG Analysis

The estimate of mean true validity is .67 (Raju, Burke, Normand, & Langolis, 1991, Equation 2).

Page 39: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

39

ADDITIONAL PREDICTORS

Three additional predictors chosen to

contribute unique variance to prediction.

• intelligence test

• integrity test

• biographical questionnaire

Page 40: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

40

PSYCHOMETRICS OF SEPARATE PREDICTORS

• Each additional predictor has an observed validity of .35 against job performance, correlates .20 with each of the other predictors, and has a reliability of .80.

• The ratio between the restricted and unrestricted standard deviations is again set at .71.

Page 41: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

41

RESULTS OF THREE ADDITIONAL VG

ANALYSES

The estimate of mean true validity in each

additional VG analysis is .67

Page 42: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

42

MULTIPLE CORRELATION ANALYSIS

• Our four separate VG analyses each furnish an impressive increase in validity from .35 to .67.

• Now let’s compute a multiple correlation by inserting the results of each separate VG analysis into a multiple correlation analysis.

Page 43: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

43

RESULTS

The squared multiple correlation (R2) is 1.03

We account for more than 100% of the variance in the job performance ratings.

Page 44: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

44

COMPARATIVE RESULTS-1

A multiple correlation analysis based on the observed or uncorrected data produces an R2 of .31.

Page 45: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

45

COMPARATIVE RESULTS-2

If all corrections remained the same except that the performance ratings were given a reliability of .80 rather than .52, then• the mean estimated true validity for each of the

four variables would have been .54. • the R2 would have been .67.

Page 46: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

46

COMPARATIVE RESULTS-3

If all corrections remained the same except that the performance ratings were given a reliability of .70, which is often considered the lower bound for reliability (Nunnally & Bernstein, 1994), then• the mean estimated true validity for each of the

four variables would have been .57. • the R2 would have been .77.

Page 47: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

47

IMPROPER TERRITORY

• With reasonable values for criterion reliability set by accepted standards in psychometrics, corrected coefficients provide R2s in proper ranges.

• When accepted standards are suspended, the R2 may wander off into improper territory.

Page 48: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

48

SLIPPERY SLOPE

• We typically do not see r2s greater than 1.0 in bivariate studies.

• Investigators have thus failed to realize that once one begins to suspend judgment about acceptable thresholds for criterion reliability and to allow a value as low as .52 into correction equations, one is on a slippery slope.

• The multiple correlation analysis picked up on the slippery slope by producing an improper R2. It follows that the bivariate corrections that engendered this improper value have a tenuous foundation.

Page 49: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

49

VARIANCES

• Heretofore we have focused on the mean of a distribution of validities and the estimate of the mean true validity.

• It is also possible to focus on the variance of a distribution of validities and the estimate of the variance among true validities.

Page 50: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

50

ESTIMATED VARIANCE AMONG TRUE VALIDITIES

• Each sample validity is corrected for artifacts. This provides an estimate of the true validity for the population from which that sample was drawn.

• The variance among the estimated true validities is calculated.

• This variance is adjusted for sampling error (Raju et al., 1991).

• If artifact data are not available for each sample, estimating equations are available.

Page 51: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

51

OBSERVED AND CORRECTED VALIDITIES

• DATA WITH RELIABILITY OF .52r r yy r yy

1/2 r x r yy-1/2

0 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Mean 0 0 0Variance 0 0 0

Page 52: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

52

OBSERVED AND CORRECTED VALIDITIES

• DATA WITH RELIABILITY OF .75r r yy r yy

1/2 r x r yy-1/2

0 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Mean 0 0 0Variance 0 0 0

Page 53: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

53

ESTIMATES OF TRUE VARIANCE

0.000

0.005

0.010

0.015

0.020

0.025

N = 98 N = 250 N = 500 N = 750 N = 1000

ryy = .52

ryy = .75

Page 54: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

54

KEY IMPLICATIONS

• Lower criterion reliabilities result in higher estimates of true variance.

• This means that the interpretation of mean true validity is more likely to be subject to moderation.

• In other words, use of below threshold criterion reliabilities to enhance validity makes interpretation of that enhanced validity ambiguous.

Page 55: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

55

CONCLUSIONS

• It is time to call a moratorium on the use of low mean interrater reliabilities to enhance estimates of mean true validities in VG analyses.

• It is time to have a serious debate on how to estimate the reliability of ratings.

Page 56: THE EFFECT OF CRITERION RELIABILITY ON MEANS AND INTERACTIONS IN META-ANALYSIS

56


Recommended