+ All Categories
Home > Documents > Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... ·...

Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... ·...

Date post: 18-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
14
Journal of Applied Psychology 1995, Vol. 80, No. 5,607-620 Copyright 1995 by the American Psychological Association, Inc. 0021-9010/95/S3.00 Frame-of-Reference Effects on Personality Scale Scores and Criterion-Related Validity Mark J. Schmit University of Florida Ann Marie Ryan Bowling Green State University Sandra L. Stierwalt Kansas State University Amy B. Powell Psychological Assessment Resources, Inc. Increased use of personality inventories in employee selection has led to concerns regard- ing factors that influence the validity of such measures. A series of studies was conducted to examine the influence of frame of reference on responses to a personality inventory. Study 1 involved both within-subject and between-groups designs to assess the effects of testing situation (general instructions vs. applicant instructions) and item type (work specific vs. noncontextual) on responses to the NEO Five-Factor Inventory (P. T. Costa & R. R. McCrae, 1989). Results indicated that a work-related testing context and work- related items led to more positive responses. A second study found differences in the validity of a measure of conscientiousness, depending on the frame of reference of re- spondents. Specifically, context-specific items were found to have greater validity. Im- plications for personnel selection are discussed. In recent years, more organizations have begun to use personality tests in their personnel selection procedures (Moore, 1987). Advances in the definition and the mea- surement of personality traits and supportive evidence for the predictive validity of personality traits are likely to be followed by further increased use of personality tests in business and industry (Barrick & Mount, 1991; Ho- gan, 1991; Hough, Eaton, Dunnette, Kamp, & McCloy, 1990; Pervin, 1989; Tett, Jackson, & Rothstein, 1991). Still, many problems in the use of personality tests in per- sonnel selection remain unresolved, leading to subopti- mal predictive efficiency for the end user. One potential problem for the use of personality in- ventories in personnel selection is the frame-of-reference problem (Mount, Barrick, & Strauss, 1994). Most avail- able personality tests are composed of items referring to Mark J. Schmit, Department of Management, University of Florida; Ann Marie Ryan, Department of Psychology, Bowling Green State University; Sandra L. Stierwalt, Department of Psychology, Kansas State University; Amy B. Powell, Psycho- logical Assessment Resources, Inc., Lutz, Florida. Preparation of this article and the reported study were funded by the Ohio Board of Regents Academic Challenge Program. Correspondence concerning this article should be addressed to Mark J. Schmit, Department of Management, College of Business Administration, University of Florida, 201 Business Building, Gainesville, Florida 32611-2017. Electronic mail may be sent via Internet to [email protected]. behavioral tendencies, attitudes, relationships, prefer- ences, and social skills (McCrae & Costa, in press). Thus, these items represent individuals' characteristic adaptations that in aggregation are commonly thought to generalize across situations. It is assumed that individuals respond to items with an indication of their propensity to behave, feel, relate, and so on, in a general way across situations (or at least they present an image of how they wish to be regarded across situations). This may not be the case, however, for all job applicants completing per- sonality inventories, as some may adopt a specific frame of reference in answering items. Some job applicants may feel that the employer only wants to know (or only has the right to know) how they are likely to behave, feel, relate, and so on, at work. For example, general context items reflecting the Agreeableness factor (i.e., a trait from the Big Five model of personality) may be answered by some job applicants in relation to their self-perceived agreeableness at work specifically and by others in a way that reflects their general propensity for agreeableness across all situations. Indeed, one author of a popular press book on job-seeking skills suggested that to "beat the psychological tests" the applicant need not be dis- honest, but the applicant must not describe what he or she is like or how he or she behaves in general but rather should describe how he or she would act as a worker in the job for which he or she is applying (Yate, 1994). As a further demonstration of the frame-of-reference effect, consider the following statement from the Revised 607
Transcript
Page 1: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

Journal of Applied Psychology1995, Vol. 80, No. 5,607-620

Copyright 1995 by the American Psychological Association, Inc.0021-9010/95/S3.00

Frame-of-Reference Effects on Personality Scale Scoresand Criterion-Related Validity

Mark J. SchmitUniversity of Florida

Ann Marie RyanBowling Green State University

Sandra L. StierwaltKansas State University

Amy B. PowellPsychological Assessment Resources, Inc.

Increased use of personality inventories in employee selection has led to concerns regard-ing factors that influence the validity of such measures. A series of studies was conductedto examine the influence of frame of reference on responses to a personality inventory.Study 1 involved both within-subject and between-groups designs to assess the effects oftesting situation (general instructions vs. applicant instructions) and item type (workspecific vs. noncontextual) on responses to the NEO Five-Factor Inventory (P. T. Costa& R. R. McCrae, 1989). Results indicated that a work-related testing context and work-related items led to more positive responses. A second study found differences in thevalidity of a measure of conscientiousness, depending on the frame of reference of re-spondents. Specifically, context-specific items were found to have greater validity. Im-plications for personnel selection are discussed.

In recent years, more organizations have begun to usepersonality tests in their personnel selection procedures(Moore, 1987). Advances in the definition and the mea-surement of personality traits and supportive evidencefor the predictive validity of personality traits are likelyto be followed by further increased use of personality testsin business and industry (Barrick & Mount, 1991; Ho-gan, 1991; Hough, Eaton, Dunnette, Kamp, & McCloy,1990; Pervin, 1989; Tett, Jackson, & Rothstein, 1991).Still, many problems in the use of personality tests in per-sonnel selection remain unresolved, leading to subopti-mal predictive efficiency for the end user.

One potential problem for the use of personality in-ventories in personnel selection is the frame-of-referenceproblem (Mount, Barrick, & Strauss, 1994). Most avail-able personality tests are composed of items referring to

Mark J. Schmit, Department of Management, University ofFlorida; Ann Marie Ryan, Department of Psychology, BowlingGreen State University; Sandra L. Stierwalt, Department ofPsychology, Kansas State University; Amy B. Powell, Psycho-logical Assessment Resources, Inc., Lutz, Florida.

Preparation of this article and the reported study were fundedby the Ohio Board of Regents Academic Challenge Program.

Correspondence concerning this article should be addressedto Mark J. Schmit, Department of Management, College ofBusiness Administration, University of Florida, 201 BusinessBuilding, Gainesville, Florida 32611-2017. Electronic mailmay be sent via Internet to [email protected].

behavioral tendencies, attitudes, relationships, prefer-ences, and social skills (McCrae & Costa, in press).Thus, these items represent individuals' characteristicadaptations that in aggregation are commonly thought togeneralize across situations. It is assumed that individualsrespond to items with an indication of their propensity tobehave, feel, relate, and so on, in a general way acrosssituations (or at least they present an image of how theywish to be regarded across situations). This may not bethe case, however, for all job applicants completing per-sonality inventories, as some may adopt a specific frameof reference in answering items. Some job applicants mayfeel that the employer only wants to know (or only hasthe right to know) how they are likely to behave, feel,relate, and so on, at work. For example, general contextitems reflecting the Agreeableness factor (i.e., a trait fromthe Big Five model of personality) may be answered bysome job applicants in relation to their self-perceivedagreeableness at work specifically and by others in a waythat reflects their general propensity for agreeablenessacross all situations. Indeed, one author of a popularpress book on job-seeking skills suggested that to "beatthe psychological tests" the applicant need not be dis-honest, but the applicant must not describe what he orshe is like or how he or she behaves in general but rathershould describe how he or she would act as a worker inthe job for which he or she is applying (Yate, 1994).

As a further demonstration of the frame-of-referenceeffect, consider the following statement from the Revised

607

Page 2: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

608 SCHMIT, RYAN, STIERWALT, AND POWELL

NEO Personality Inventory (NEO-PI-R; Costa & McCrae,1992), a widely used personality inventory: "I try to becourteous to everyone I meet." The applicant indicates theextent to which he or she agrees or disagrees with the state-ment. One applicant may consider only work experienceswhen making a decision on an item response, whereas an-other applicant may take all aspects of his or her life intoconsideration. Those applicants who base their answers onwork experiences may be providing information that is abetter indicator of actual job performance than those appli-cants who use their overall life experiences as a reference toanswer the questions. Providing the same frame of referenceto all applicants (e.g., using items that specifically refer tobehavior at work) may, therefore, improve the predictivevalidity of personality inventories.

There are theory and evidence from personality psy-chology that indicate frame-of-reference differences doexist and potentially cloud the predictability of behavior.Wright and Mischel (1987) suggested that some peoplemay express quite reliable and stable patterns of behav-iors, but these behaviors are contingent on situationalconditions; they labeled these tendencies conditional dis-positions. Although there is a connection between per-sonality and behavior, the power to predict behavior onthe basis of a personality trait may be limited to a fairlyspecific range of situations; Wright and Mischel providedinitial support for this proposition (Shoda, Mischel, &Wright, 1989, 1993; Wright & Mischel, 1987). Supportof Mischel's propositions is evident in other recent per-sonality theories (e.g., Thorne, 1989; Zuroff, 1986) andempirical evidence (Moskowitz, 1994; Roberts & Don-ahue, 1994).

A metatheoretical framework for personality theoriesrecently developed by McCrae and Costa (in press) helpsto put the possibility of conditional dispositions into per-spective. They suggested that personality measures mustask questions about characteristic adaptations (i.e., theconcrete manifestations of basic tendencies) in order tomake inferences about underlying traits. Because charac-teristic adaptations are influenced by external factors(e.g., cultural norms or organizational norms), manycharacteristic adaptations may be situation specific, eventhough across both time and situations (in aggregation)the underlying trait is generally constant. Although an in-dividual's behavior may be limited to a specific range ofbehaviors (e.g., limited by role expectations eitherlearned from past experience or imposed by theorganization) in the critical work situations from whichperformance criteria are drawn, many other life situa-tions are still available for the demonstration of the gen-eral underlying trait. All of the theories and the researchfindings mentioned here suggest that general personalityinventories may say little about how an applicant wouldact in an actual work situation, because there is no spe-

cific frame of reference in which the respondent considersthe given behaviors. What may be needed are items orinstructions that provide a frame of reference related tothe workplace.

A theory of personality-item response with a growingbase of supporting evidence also offers support for usingitems with a work-related context; The self-presentationview of item responses suggests that individuals use per-sonality items to present an image of themselves that maynot be totally accurate but that is consistent with howthey hope to be regarded by others (Hogan, 1982, 1991;Johnson, 1981; Leary & Kowalski, 1990; Schlenker &Weigold, 1992). Some people have clearer self-images ofhow they want to be viewed by others; some people arebetter than others at self-presentation (Hogan, 1991).Hogan (1991) suggested that "item endorsements reflectautomatic and often nonconscious efforts on the part oftest-takers to negotiate an identity with an anonymousinterviewer" (p. 902). General personality-test itemsused in a personnel selection test may present error intothe self-presentation process, because many of the itemsare difficult to connect with a specific work role orcontext. Some researchers have suggested that the use offace-valid items could reduce the validity of personalitymeasures because of the potential for greater socially de-sirable responding, although recent research has shownthat socially desirable responding may not be a majorproblem in personnel selection contexts (Hough et al.,1990). However, self-presentation theory suggests thatgreater face validity should increase test validity. Johnson(1981) noted that "the best strategy for designing a validscale is not to make lying or misrepresentation difficult,but to make self-presentation as easy as possible" (p.767). Putting personality items into a work-specific for-mat would facilitate self-presentation.

The objective of our first study was to investigate theeffects of the respondent's frame of reference on person-ality-scale scores by altering personality-test items' spec-ificity (work specific vs. noncontextual) in different ad-ministration conditions (job-applicant instructions vs.general instructions). This study included both a be-tween-groups substudy and a mixed-design substudy.More specifically, in the between-groups substudy, partic-ipants were randomly assigned to one of four conditionsdiffering on instruction type and item specificity: (a) gen-eral instructions, noncontextual items, (b) general in-structions, work-specific items, (c) applicant instruc-tions, noncontextual items, and (d) applicant instruc-tions, work-specific items. In the mixed-design substudy,participants were randomly assigned to either the generalinstructions group or the applicant instructions group(between), and they completed both the work-specificand noncontextual item forms of the personality test(repeated measures). Both types of designs were used to

Page 3: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 609

compensate for the weaknesses of each (see Lauten-schlager, 1986; Schwab, 1971).

In organizations, a general set of clearly denned expec-tations for behavior are likely to exist that suggest appro-priate behavior or explicitly forbid inappropriate behav-ior (cf. Weiss & Adler, 1984). Thus, most individuals rec-ognize to some extent that their behavior at work maybe more constrained than their behavior in some othergeneral life situations. Consequently, we hypothesizedthat mean test scores of individuals responding to work-specific personality items would have more positive scalescores than would individuals responding to more gen-eral context personality items. Similarly, because someindividuals use work experience as a frame of referencewhen responding to items in a personnel selectioncontext, whether the items are work specific or not, indi-viduals in the applicant instructions condition were hy-pothesized to have more positive scale scores than thosein the sincere response instructions condition.

The self-presentation theory of item response also sug-gests that an interaction between instruction type anditem specificity should be evidenced. Because the appli-cant instructions present the clearest press for a specificsocial role to be presented and because work-specificitems have clear connections with work roles, individualsin the applicant instructions and work-specific items con-ditions should score significantly higher than individualsin the general instructions and noncontextual items con-ditions, in which neither this press nor the connection towork roles is present.

Finally, the manipulation of item content could havepsychometric consequences. Adding an "at work" tag toitems could change a multifactor inventory to a one-fac-tor inventory. If this were the case, then the new itemsand the associated scales would no longer be comparableto the test manual and to the relevant literature. The con-ceptual integrity of the scales would be called into ques-tion, and an argument might be made that personalitywas no longer being measured; in fact an argument mightbe made that the new items may approach a format morelike biodata items than personality items. To assess theimpact of item changes on psychometric properties of themeasures, we conducted several confirmatory factoranalyses that addressed both the multifactor versus one-factor question and the question of psychometric equiva-lence for the altered and unaltered forms of the personal-ity measures used in this study.

Study 1

Method

ParticipantsSubstudy 1. Students of an introductory psychology course

participated for course credit. Data were collected from a total

of 100 participants, with 25 participants randomly assigned toeach of the four study conditions. Across groups, participantsdid not differ significantly in age, class rank, or the number ofjobs previously held.

Substudy 2. Participants came from the same pool as thatin Substudy 1. Data were collected from a total of 200 partici-pants who did not participate in Substudy 1, with 100 partici-pants randomly assigned to each of the two between-groupsconditions. Across groups, participants did not differ signifi-cantly in age, class rank, or the number of jobs previously held.

Measures

The NEO Five-Factor Inventory (NEO-FFI; Costa &McCrae, 1989) was the personality measure used in this study.The NEO-FFI is a shortened form of the NEO-PI-R (Costa &McCrae, 1992), a measure based on the five-factor model ofpersonality. The five factors of personality that the NEO-FFImeasures are Neuroticism, Extraversion, Openness to Experi-ence, Agreeableness, and Conscientiousness. The NEO-FFIconsists of the 12 items having the highest positive or negativeloading on each of the corresponding five factors of the NEO-PI-R. Responses to each item are recorded on a 5-point scaleranging from 0 (strongly disagree) to 4 (strongly agree). Pre-vious estimates of internal consistency (coefficient alphas)for each of the 12-item scales were .86 (Neuroticism),.77 (Extraversion), .73 (Openness to Experience), .68(Agreeableness), and .81 (Conscientiousness; Costa &McCrae, 1992). Costa and McCrae (1992) also reported thatcorrelations between NEO-FFI scales and NEO-PI factorsranged from .75 to .89.

The second personality inventory used was an altered formof the NEO-FFI, on which a reference to work was appendedto each statement, usually at the beginning or the end of thestatement. For example, the item "I try to be courteous to ev-eryone I meet" was modified to read "I try to be courteous toeveryone I meet at work." Another example item was "I workhard to accomplish my work-related goals" instead of "I workhard to accomplish my goals." The Openness to Experiencescale was not used in this study because work appendages didnot make sense on many of the items.

Procedure

Substudy 1. Participants were randomly assigned to one offour groups and were told they would be completing a personal-ity inventory. Written instructions were attached to each per-sonality inventory. One group received an unaltered personalityinventory (i.e., NEO-FFI) and was instructed to answer thequestions as directed on the original version of the test (generalinstructions condition). Another group received the unalteredpersonality inventory, but their instructions indicated that theywere to complete the inventory as if they were applying for acustomer service representative job in a department store, a jobthey really wanted (applicant instructions condition). The re-maining two groups received the altered personality inventory,one with general instructions and the other with applicantinstructions.

Substudy 2. Participants were randomly assigned to one of

Page 4: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

610 SCHMIT, RYAN, STIERWALT, AND POWELL

Table 1Means and Standard Deviations for NEO Five-Factor Inventory Scalesas a Function of Item Type and Instruction Type

General,noncontextual

Scale

NeuroticismSubstudy 1Substudy 2

ExtraversionSubstudy 1Substudy 2

AgreeablenessSubstudy 1Substudy 2

ConscientiousnessSubstudy ISubstudy 2

M

24.2023.70

31.8432.19

32.4433.88

33.0432.31

SD

6.798.36

5.846.98

6.536.23

6.006.60

Job applicant,noncontextual

M

18.0816.32

35.4435.73

36.6835.08

37.4837.21

SD

6.828.04

6.196.08

4.686.09

6.306.80

General,work specific

M

19.9617.56

34.4432.07

36.2036.52

37.1237.50

SD

5.836.81

5.686.74

6.085.20

4.125.56

Job applicant,work specific

M

15.5213.81

36.1234.88

39.2037.47

42.0040.07

SD

7.847.09

4.985.83

5.556.31

4.286.33

two groups (general instructions group or applicant instruc-tions group). Participants in both groups completed both forms(counterbalanced within groups) of the personality test. Other-wise, all procedures were the same as those in Substudy 1.

Results

Participants' scores were calculated for each of the fourNEO-FFI scales used. Table 1 reports the means and thestandard deviations for each of these scales across thefour conditions for both substudies. We then performedanalyses of variance (ANOVAs) on each of the NEO-FFIscales to test for mean score differences across the fourexperimental conditions in each study (i.e., Substudy 1:between-groups analysis; Substudy 2: repeated measuresanalysis with between-groups analysis on the instructionconditions). For Substudy 1, the analyses revealed no sig-nificant interaction between instruction type and itemtype, but there were main effects for both instruction typeand item type on all scales except for the Extraversionscale. Extraversion scale scores were affected only by in-struction type. F values, probabilities, and effect sizes forthe main effects and the interactions are reported in Ta-ble 2.

For Substudy 2, the analyses showed a significant in-teraction between item type and instruction type for boththe Neuroticism scale and the Conscientiousness scale(see Table 2). Consistent with Substudy 1, main effectsfor item type were found for the Neuroticism, Agree-ableness, and Conscientiousness scales. Also consistentwith Substudy 1, only a significant main effect for in-struction type was found for the Extraversion scale. InSubstudy 2, no significant main effect was found for in-struction type for the Agreeableness scale. In general, thefindings were very consistent across the two substudies,

with the exception of the additional interactions thatwere found in Substudy 2.

The main effect findings confirmed the hypothesis thatmean scale scores of individuals responding to work-spe-cific personality items would be higher than those of in-dividuals responding to noncontextual personality items.Also, as hypothesized, individuals in the applicant in-structions condition had more positive scale scores thandid those in the general instructions condition. The col-lapsed means and standard deviations for both substudiesare presented in Table 3.

Consistent with the prediction made under the self-presentation theory of item responses, interactions be-tween instruction type and item specificity were evi-denced for two scales in Substudy 2. The nature of theseinteractions are depicted in Figures 1 and 2. As pre-dicted, on both scales, individuals in the applicant in-structions condition scored highest (i.e., in the most pos-itive direction; lower scores are more positive on the Neu-roticism scale) on the work-specific item scales, whereasindividuals in the general instructions condition scoredlowest on the noncontextual item scales.

The use of repeated measures designs in faking studieshas been advocated by numerous researchers (e.g., Gor-don & Gross, 1978; Lautenschlager, 1986). However, asdemonstrated by Schwab (1971) and evidenced by Houghet al. (1990), the order of administration of the differentresponse conditions often interacts with the effect of theresponse condition. This interactive effect was also foundin the current study. An Item Type X Order of Admin-istration interaction was found for the Conscientiousnessscale, F( 1, 196) = 9.47, p < .05; the Agreeableness scale,F(\, 196) = 8.1 l,p< .05; and the Neuroticism scale, F( 1,196) = 11.22, p < .05. The interaction for the Conscien-

Page 5: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 611

Table 2F Values, Probabilities, and Effect Sizes for NEO Five-Factor Inventory

Instruction TypeInstruction type

Scale

NeuroticismSubstudy 1Substudy 2

ExtraversionSubstudy 1Substudy 2

AgreeablenessSubstudy 1Substudy 2

ConscientiousnessSubstudy 1Substudy 2

F

14.8232.05

5.3812.69

9.901.62

19.4220.16

P

.000

.000

.023

.000

.002

.205

.000

.000

eta2

.13

.14

.05

.06

.09

.17

.09

Item type

F

6.1494.78

2.082.10

7.4575.10

16.18152.78

P

.015

.000

.153

.149

.008

.000

.000

.000

eta2

.06

.09

.07

.05

.15

.11

X Item Type

F

0.3819.54

0.711.22

0.290.47

0.0414.67

P

.540

.000

.400

.271

.591

.496

.843

.003

eta2

.02

.01

tiousness scale was demonstrative of the general effectacross the three scales; it is depicted in Figure 3. Figure 3shows that the difference between the noncontextual testscores and the work-specific test scores was larger whenthe noncontextual test was given first. However, the work-specific test scores were always highest for all three scales,regardless of the order of administration. This is contraryto the findings of Windle (1954), who suggested that thereis a general tendency for personality-test scores to increaseon second administration.

The between-groups study likely lacked the power nec-essary to identify the interactions found in the repeatedmeasures study. However, the between-groups main effectfindings confirmed those of the repeated measures studythat were tainted by the Item Type X Order of Admin-istration interaction. The similarity of the main effectfindings and the effect sizes in both studies suggests thatthe findings may be robust.

The tests of the psychometric equivalence of the alteredand unaltered forms of the measure followed the ap-proach outlined by Joreskog and others (Drasgow &Kanfer, 1985; Joreskog, 1971a, 1971b). We conductedall of the analyses using LISREL 8 (Joreskog & Sorbom,1993). We used three indicators of fit to assess the modelstested, including chi-square (and chi-square differencetests to examine loss of fit in nested models), root meansquare error of approximation (RMSEA; Browne & Cu-deck, 1993), and the comparative fit index (CFI; Bentler,1990). An RMSEA of .05 has been suggested as an indi-cator of close fit, whereas .08 suggests a reasonable fit ofthe model to the data (Browne & Cudeck, 1993). CFIvalues greater than .90 also suggest a reasonable fit(Bentler & Bonett, 1980).

In all the LISREL analyses, the data from both substud-ies were included; thus, each of the four groups (i.e., non-contextual items, general instructions; noncontextual

Table 3Means and Standard Deviations Collapsed Across Item Type and Instruction Type Conditions

Scale

Item type Instruction type

Noncontextual Work specific General

M SD M SD. M SD

Job applicant

M SD

NeuroticismSubstudy 1Substudy 2

ExtraversionSubstudy 1Substudy 2

AgreeablenessSubstudy 1Substudy 2

ConscientiousnessSubstudy 1Substudy 2

21.1420.01

33.6433.96

34.5634.48

35.3134.76

7.418.98

6.236.77

6.016.18

6.497.12

17.7415.69

35.2833.48

37.7036.99

39.5638.79

7.207.18

5.356.44

5.965.79

4.836.08

22.0820.63

33.1432.13

34.3235.20

35.1234.90

6.637.58

5.856.86

6.535.72

5.486.08

16.8015.07

35.7835.31

37.9436.28

39.7438.64

7.387.57

5.575.96

5.246.20

5.806.57

Page 6: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

612 SCHMIT, RYAN, STIERWALT, AND POWELL

44 - -

42 - -

40 - -

£ 38 - •

OO

36 - •

34 - •

32 - -

ApplicantInstructions

General. * Instructions

Nloncontextual Work Specific

Item Type

Figure 1. Item Type X Instruction Type interaction for the Conscientiousness scale. Scores onthe Conscientiousness scale range from 0 to 48, with higher scores indicating moreconscientiousness.

items, applicant instructions; work-specific items, gen-eral instructions; and work-specific items, applicantinstructions) had a sample size of 125. Because of therelatively small sample size and the large number of itemsper common factor (i.e., 12 per scale, or 48 total), weadopted an approach used by Schmit and Ryan (1993)in the analysis of the factor structure of the NEO-FFI (cf.

Drasgow & Kanfer, 1985). Accordingly, the 12 NEO-FFIitems that comprise the four subscales used in the currentstudy were randomly divided into three subsets, leaving atotal of 12 item sets. Unit-weighted scores from theseitem sets were summed to create 12 composites. A 12 X12 covariance matrix for each group was calculated andused in the analyses.

E'o

I<D

22 - -

20 - -

18 - -

16 - -

14 . .

12 - -

GeneralInstructions

ApplicantInstructions

-H

Noncontextual Work Specific

Item Type

Figure 2. Item Type X Instruction Type interaction for the Neuroticism Scale. Scores on theNeuroticism scale range from 0 to 48, with higher scores indicating more neuroticism.

Page 7: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 613

40 - -

39 - •

» 38 -•to<DOT 37 - •.1

'oco

O

36 - -

35 - -

34 - -

33 - -

Noncontextual„ Items First

Work-SpecificItems First

•4-

Noncontextual Work Specific

Item Type

Figure 3. Item Type X Order of Administration interaction for the Conscientiousness scale.Scores on the Conscientiousness scale range from 0 to 48, with higher scores indicating moreconscientiousness.

We conducted the first analyses to examine the one-versus four-factor structure issue for the altered personal-ity items (i.e., items with "at work" tags). The data fromthe group in the general instructions condition withwork-specific items were submitted to both a four-factorand a one-factor confirmatory factor analysis. In addi-tion, the data from the group in the applicant instructionscondition with work-specific items were submitted to thesame analyses. The results are presented in Table 4. Inboth cases, it is clear that the four-factor model provideda better fit to the data than did the one-factor model.

The second set of analyses was a series of multiplegroup tests of psychometric equivalence between thenoncontextual and work-specific item sets. In all analy-ses, four subtests were conducted with varying degrees of

model restraint. The first test compared the form of thetwo models; that is, it was a test of whether the model forthe two measures had the same number of latent vari-ables with the same indicators and the same specificationof fixed and free parameters. The second test constrainedthe latent-factor correlations to be invariant across mea-surement devices, and the third analysis further con-strained the item sets to have equal loadings across themeasures on the associated latent factors. The final anal-ysis specified error variances to be equal across measures.

The first subset of analyses compared the equivalenceof the altered items and the general items under the gen-eral instructions condition. The second subset of analysescompared the different types of items in the job-applicantinstructions condition. The results of the two sets of four

Table 4Comparison of One- and Four-Factor Models of Work-Specific Item Sets in Generaland Job-Applicant Instructions Conditions

Model df RMSEA CFI Xdifference df

Four factorOne factor

General instructions

Four factorOne factor

98.68288.07

4854

<.05<.05

.09

.19.91.58 189.39 6 <.05

Job-applicant instructions

107.76288.07

4854

<.05<.05

.10

.17.92.75 138.54 <.05

Note. For the chi-square values, N = 125. RMSEA = root mean square error of approximation; CFI =comparative fit index.

Page 8: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

614 SCHMIT, RYAN, STIERWALT, AND POWELL

Table 5Psychometric Equivalence Tests ofNoncontextual and Work-Specific Items

Model x2 df P RMSEA CFI Xdifference df P

General instructions

Equal structureEqual factor correlationsEqual loadingsEqual error variances

225.30236.10252.85278.97

96102114126

<.05<.05<.05<.05

.07

.07

.07

.07

.89

.89

.88

.87

10.8016.7526.12

61212

>.05>.05<.05

Job-applicant instructions

Equal structureEqual factor correlationsEqual loadingsEqual error variances

232.08242.58259.27295.87

96102114126

<.05<.05<.05<.05

.08

.08

.07

.07

.91

.91

.91

.89

10.5016.6936.60

61212

>.05>.05<.05

Note. For the chi-square values, A' = 250. RMSEA = root mean square error of approximation; CFI =comparative fit index.

models tested are presented in Table 5. In both cases, psy-chometric equivalence was strongly supported. Underboth conditions, invariant structure, invariant latent-fac-tor correlations, and invariant factor loadings were evi-denced across measures. Only error variances were notinvariant across measures. An examination of the errorvariances suggested that in both the general instructionscondition and the job-applicant instructions condition,the error variances were significantly larger for the non-contextual items (.31 for general instructions conditionand .28 for job-applicant instructions condition) than forthe work-specific items (.28 for general instructions con-dition and .26 for job-applicant instructions condition),^max(124) = 1.12./K.05, andFmax(124) = 1.08,/><.05, respectively.

Discussion

The tests of the four-factor structure and the psycho-metric equivalence provided evidence that the altering ofitems to include "at work" tags might not have adverselyaffected the structure or other psychometric properties ofthe scales. In light of these findings, the scale comparisonsmade in the two substudies make sense from both a prac-tical and a conceptual level.

All main effects of the experimental manipulationswere in the expected direction; participants in both thejob-applicant instructions and work-specific item condi-tions had more positive scale scores than did participantsin the general instructions and noncontextual item con-ditions, respectively. No interactions between instructiontype and item type were found in Substudy 1, but interac-tions for two of the four scales were found in Substudy 2.

The findings suggest that differences in personality-in-ventory scale scores are affected by the frame of referencethat the respondent considers when completing the mea-sures. As hypothesized, both the specificity of the context

in which the inventory is completed and the specificity ofthe items appear to affect the scale scores. However, thecombination of the two variables also appears to affect atleast some scale scores (e.g., Conscientiousness andNeuroticism). This finding is consistent with the predic-tion based on the theory of self-presentation in item re-sponses. The only scale on which no score differenceswere related to item type was the Extraversion scale. Thismay suggest that this particular scale generalizes betteracross situations than do the other scales used in thestudy. Alternatively, this scale may represent a set of be-haviors less constrained by the workplace or less subjectto social desirability than the other scales used in thestudy; however, given the fictitious job used in the currentstudy, customer service representative, these alternativesseem unlikely.

Somewhat larger effect sizes for the instruction typevariable than for the test type variable were found in mostcomparisons, and their pervasiveness across all scales inboth substudies suggests that social desirability may be atleast a partial explanation for score differences. Further-more, the interactions between instruction type and itemtype may be suggestive of social desirability effects ratherthan self-presentation effects. As noted earlier, sociallydesirable responding may be a detriment to validity,whereas self-presentation effects serve to increase valid-ity. Thus, we designed a second study with the objectiveof trying to identify which may be the better explanationfor score differences.

Study 2

The primary objective of the second study—a between-groups criterion-related validity study—was to comparethe validities of altered and unaltered tests in the two in-struction conditions. That is, does the situation, the itemcontext, or both affect the criterion-related validity of per-

Page 9: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 615

sonality tests? A secondary objective was to explore the roleplayed by social desirability and self-presentation effects onthe criterion-related validities. As noted earlier, both so-cially desirable responding and frame-of-reference confu-sion should lead to greater error variance in the predictionof a criterion by a personality test. Thus, if situation-in-duced social desirability (through both instructions anditems) plays a role in response differences, validity shouldbe significantly smaller (or nonsignificant) in the applicantinstructions-work-specific item condition than in the otherthree conditions. If item social desirability alone affects re-sponses (such that variability is reduced through a negativeskew of the distribution), then the validity in the generalinstructions-work-specific item condition should be lowerthan that in the general instructions-noncontextual itemcondition. However, if the self-presentation argument is cor-rect, then validity should be higher in the applicant instruc-tions-work-specific item condition than in the applicant in-structions-noncontextual item condition, because all re-spondents will be using the same or more similar framesof reference, thereby reducing error variance. Furthermore,the theory of conditional dispositions suggests that the va-lidity of the specific items should be higher than that of thenoncontextual items in the general instructions condition,because individuals are posited to be more predictablewithin specific situations (e.g., at work) than across all situ-ations. Finally, the conditional disposition and self-presen-tation theories considered in tandem would rank order va-lidity predictions, from highest to lowest as (a) applicantinstructions, work-specific items, (b) general instructions,work-specific items, (c) general instructions, noncontextualitems, and (d) applicant instructions, noncontextual items.

The Conscientiousness factor of the Big Five personal-ity factors has been shown to be a useful predictor in mostjobs (Barrick & Mount, 1991; Hough et al., 1990; Tett etal., 1991). In addition, Hough et al. specified both pre-dictors and criteria in a meta-analysis of personality vari-ables and found that both dependability and achievementmeasures were useful predictors of educational criteria(uncorrected criterion-related validity coefficients were.15 and .30, respectively). These two variables are in-cluded as facets of the Conscientiousness scale of theNEO-PI-R (Costa & McCrae, 1992), a measure basedon the five-factor model of personality. In the currentstudy, the Conscientiousness scale and an altered school-specific version of it were used to predict students' gradepoint averages (GPAs). Previous research has shown theConscientiousness scale to be a useful predictor of collegecourse grades, with a criterion-related validity of .25(Dollinger&Orf, 1991).

MethodParticipants

Students of an introductory psychology course participatedfor course credit. Data were collected from a total of 400 par-

ticipants, with 100 participants randomly assigned to each ofthe four study conditions; data from 6 participants were incom-plete and were dropped before data were analyzed. Acrossgroups, participants did not differ significantly in age, classrank, or the number of jobs previously held.

Measures

NEO-PI-R Conscientiousness scale. The Conscientiousnessscale of the NEO-PI-R (Costa & McCrae, 1992) was the person-ality measure used in the current study. The Conscientiousnessscale consists of 48 items measuring six facets, including Compe-tence, Order, Dutifulness, Achievement Striving, Self-Discipline,and Deliberation. Responses to each item are coded on a 5-pointscale ranging from 0 (strongly disagree) to 4 (strongly agree). Pre-vious estimates of internal consistency (coefficient alphas) foreach of the facet subscales were .67 (Competence), .66 (Order),.62 (Dutifulness), .67 (Achievement Striving), .75 (Self-Discipline), and .71 (Deliberation), while the entire Conscien-tiousness scale had an alpha of .90 (Costa & McCrae, 1992).

The second personality inventory used was an altered formof the Conscientiousness scale, on which a reference to schoolwas appended to each statement, usually at the beginning orthe end of the statement. For example, the item "I strive forexcellence in everything I do" was modified to read "I strive forexcellence in everything I do at school."

Criterion. The criterion measure used was college cumula-tive GPA. Permission was obtained from the students to gainaccess to the registrar's records containing their college GPAs.Six individuals refused to give this permission, and their predic-tor data were dropped from the analysis.

Procedure

Participants were randomly assigned to one of four groupsand were told they would be completing a personality inventory.Written instructions were attached to each personality inven-tory. One group received an unaltered personality inventoryand were instructed to answer the questions as directed on theoriginal version of the test (general instructions condition). An-other group received the unaltered personality inventory, buttheir instructions asked them to imagine that they had just ar-rived at the admissions office of a university they really wantedto attend. They were then told that admission decisions wouldbe based in part on their performance on the personality testthey were about to complete. Finally, they were told that thosestudents meeting the qualification standards on the test wouldreceive prize money ($ 10) in lieu of the admission. The remain-ing two groups received the altered personality inventory, onewith general instructions and the other with school-applicantinstructions, which also included the prize money (i.e., $10)incentive clause. Finally, following the completion of the test,participants were asked for their consent to have the registrarrelease their GPAs to the researchers.

The prize money incentive provided in the two school-appli-cant instructions conditions was used to induce motivation inthe students similar to motivational forces that might be foundin actual applicants. After all testing was completed, partici-pants in all four conditions were debriefed with a letter stating

Page 10: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

616 SCHMIT, RYAN, STIERWALT, AND POWELL

the purpose and the hypotheses of the study. The need for moti-vating some participants with a reward incentive was explained.Participants were also informed that a random drawing wouldtake place to distribute fifteen $10 prizes, the total of the poolof potential money promised earlier to some participants asmotivating incentives.

Results

We calculated criterion-related validity coefficients forboth the broad predictor scale (the Conscientiousnessscale) and the more narrow facet subscales using GPA asthe criterion. The validity coefficients were calculated foreach of the four experimental groups. The results are pre-sented in Table 6. Also included in Table 6 are the alphareliability coefficients for each scale, calculated withingroup. With the exception of the reliabilities for the Du-tifulness subscale, which were low and variable acrossconditions, the scales shared very similar reliability esti-mates across conditions.

The rank order of the validity coefficients for the Con-scientiousness scale across the four conditions was con-sistent with the predictions made by the conditional dis-position and self-presentation theories: (a) applicant in-structions, school-specific items (r = .46, p < .01), (b)general instructions, school-specific items (r = .41, p <.01), (c) general instructions, noncontextual items (r =.25, p < .01), and (d) applicant instructions, non-contextual items (r = —.02, ns). In support of self-pre-sentation theory, the validity of the school-specific items(r = .46) was significantly higher in the applicant instruc-tions condition than the validity of the noncontextualitems (r = -.02), z = 3.30, p < .05. In further supportwas the finding of significantly higher validity in the gen-eral instructions-school-specific item condition (r = .41)than in the applicant instructions-noncontextual itemcondition (r = — .02), z = 3.15, p < .05. The conditional

disposition hypothesis received more support than theitem social desirability hypothesis, as the validity of theschool-specific items (r = A1) was greater than the valid-ity of the noncontextual items (r = .25) in the generalinstructions condition. Although the validity differencewas in the direction that supported the condition dispo-sition hypothesis, the difference was not significant, z =1.25, ns. Finally, the validities for school-specific itemsdid not differ significantly between the applicant instruc-tions condition (r = .46) and the general instructionscondition (r = .41), z = 0.43, ns; the validities of thenoncontextual items were also not significantly higher inthe general instructions condition (r = .25) than in theapplicant instructions condition (r= -.02), z = 1.94,ns.In general, these findings at the broad level of measure-ment were also repeated at the more narrow facet level ofmeasurement (see Table 6).

In an attempt to replicate the findings of Study 1, weperformed a between-groups ANO\A to identify meanConscientiousness scale score differences across condi-tions. As was the case in Substudy 1, significant maineffects for instruction type, F(l, 390) = 23.94, p < .05,and item type, F(l, 390) = 5.56, p < .05, were found;however, the interaction was not significant, F( 1, 390) =0.68, ns. Individuals in the applicant instructions condi-tions (M = 129.86, SD = 21.95) scored significantlyhigher than individuals in the general instructions condi-tions (M = 119.47, SD = 20.40). Individuals in theschool-specific item conditions (M = 127.18, SD =22.39) scored significantly higher than those in the non-contextual item conditions (M = 122.17, SD = 20.95).

A confirmatory factor analysis was also conducted inthis study in a manner similar to that in Study 1. That is,the psychometric equivalence of the measures across thefour independent groups was assessed. The developers ofthe Conscientiousness scale intended each of the six facet

Table 6Criterion-Related Validities for Grade Point Average and Reliabilities of ConscientiousnessScale and Subscales Across Experimental Conditions

General,noncontextual8

Scale

ConscientiousnessCompetenceOrderDutifulnessAchievement StrivingSelf-DisciplineDeliberation

r

.25*

.31**

.02

.19

.25*

.17

.23*

a

.91

.71

.64

.34

.81

.79

.77

Applicant,noncontextuaP

r

-.02-.02-.08-.08-.10

.06

.10

a

.90

.68

.55

.42

.68

.79

.72

General,school-specific1'

r

.41**

.38**

.20

.20

.38**

.36**

.31**

a

.89

.69

.51

.24

.82

.74

.64

Applicant,school-specific1'

r

.46**

.53**

.12

.46**

.44**

.38**

.35**

a

.94

.84

.56

.56

.87

.86

.76

a « = 99. b/i = 98.*/?<.05 **p<.01.

Page 11: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 617

subscales of the broader measure to have substantialloadings on the underlying trait of conscientiousness(Costa, McCrae, & Dye, 1991). Accordingly, each of thesix facet subscales was hypothesized to have significantindependent loadings on a single latent variable. The testof similar factor structure across groups showed a closefit between the data and the model, x2(36, N = 394) =41.71,;?= .24, RMSEA = .02, CFI = 1.00. With the ad-ditional constraint of equal factor loadings across groups,the model still fit well, x2(54, N= 394) = 67.83,;? = .10,RMSEA = .03, CFI = .99, and no significant loss of fitwas indicated, X

2( 18, N = 394) = 26.12, p > .05. In ad-dition, as hypothesized, all six facet subscales had sig-nificant loadings on the latent variable. The third modeltested constrained error variances to be equal acrossgroups. As in Study 1, this model did not fit the data aswell as the previous model, x2(72, N = 394) = 99.22, p= .02, RMSEA = .07, CFI = .98, and resulted in a sig-nificant loss of fit, x2(18, N = 394) = 31.39, p< .05.Also parallel to Study 1, the error variances were foundto be significantly larger for the noncontextual items(.43) than for the school-specific items (.36), Fmsa(l99)= 1.19,p<.05.

Discussion

In Study 2, the findings offer more support for the con-ditional disposition and self-presentation hypothesesthan for the social desirability hypothesis. Validity washighest in the condition that used context-specific itemsin the applicant situation; conversely, validity was essen-tially zero when general context items were used in theapplicant condition. As Johnson (1981) hypothesized,making self-presentation easier by using work-specificitems appeared to increase validity. In other words, alter-ing the items gave all respondents a common frame ofreference, which appears to have reduced error varianceand increased validity.

In Study 2, the significant mean differences in scoresbetween conditions found in Study 1 were replicated;participants in both the applicant instructions andschool-specific item conditions had more positive Con-scientiousness scale scores than did participants in thegeneral instructions and noncontextual item conditions,respectively. The finding of psychometric equivalence ofthe measures across conditions was also similar to thefindings of Study 1. Thus, replication of results acrossstudies was evidenced.

Earlier we suggested that the consequence of job appli-cants using different frames of reference was an increasein the error of prediction. This proposition was sup-ported but not fully explained in the current study. Aportion of the error is likely to be nonrandom, resultingin a moderator variable. One moderator of personality-

test validity already identified, the self-monitoring vari-able (Snyder & Ickes, 1985), may help explain why ap-plicants use different frames of reference. High self-mon-itors tend to rely on the immediate situation to guidetheir behavior more than do low self-monitors. Conse-quently, high self-monitors given a noncontext-specificpersonality test in a personnel selection situation areprobably more likely than low self-monitors to respondto personality-test items using work-related experiencesas a guide. However, if personality-test items referred to awork-related context, the difference between high andlow self-monitors would likely be attenuated. Future re-search is needed to determine whether self-monitoring isthe moderator that may explain the results found in thecurrent study or whether some other moderator may beresponsible (see Chaplin, 1991); for example, scalabilityis a potential alternative moderator (Lanning, 1988).

General Discussion

In both Studies 1 and 2, the mean differences in person-ality-test scores were consistent with the conditional dis-position and self-presentation hypotheses, although socialdesirability effects could not be ruled out as an alternativeexplanation by these findings alone. However, the crite-rion-validity differences across conditions in Study 2clearly supported the self-presentation theory hypothesisand, to a lesser extent, the conditional disposition theoryhypothesis over the alternative social desirability hypothe-sis. The distinction here is between positive self-presenta-tion alone and positive and accurate self-presentation. Ifindividuals present themselves in a positive light, but theydo so inaccurately, that is social desirability. This form ofmeasurement error should result in lower validity or pos-sibly no change in validity if few individuals do it (e.g.,Hough et al., 1990). If individuals present themselves pos-itively and accurately because they have a frame of refer-ence, then validity should increase. To expect job appli-cants not to engage in some positive impression manage-ment is unrealistic (Leary & Kowalski, 1990; Schlenker &Weigold, 1992), regardless of the method of measure-ment. The key is to find ways to help them present them-selves more accurately given the job context; the currentstudy suggests a way to do that.

Although replication in a field setting is still needed, thepractical implications for personnel selection are clear.Face-valid items do appear to have the potential for realiz-ing higher criterion-related validity (and consequently, in-creased utility) when used for personnel selection. Rynes(1993) recently made a plea to researchers to test the ideathat increases in the face validity of a test may be relatedto increases in criterion-related validity of the test. Othershave noted both that face validity has been a "well keptsecret in the empirical tradition of test development" and

Page 12: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

618 SCHMIT, RYAN, STIERWALT, AND POWELL

that "the best items (from an empirical perspective) tendto be the ones with good face validity" (Hogan, Carpenter,Briggs, & Hansson, 1985, p. 30). This seems to be true atleast for the Conscientiousness scales used in the currentstudy.

Future research is needed to test whether changes inface validity (i.e., item context changes) increase the cri-terion-related validity of other measures of personality.For example, mean differences across item type condi-tions were not found for the Extraversion scale in bothSubstudies 1 and 2. This finding leaves open the possibil-ity that validity differences across conditions comparableto those found for the Conscientiousness scale in Study2 may not be found for a measure of extraversion. Theconscientiousness trait has been found to generalizeacross most jobs (Barrick & Mount, 1991), and thislikely includes the "job" of being a college student. Thus,we probably have optimized our chance for generalizabil-ity of the Study 2 findings by using only this trait, andother trait measures may not show the same improvedvalidities associated with simple frame-of-referencechanges. Indeed, moderator research in personality psy-chology has found the domain of conscientiousness themost fruitful for demonstrating moderator effects; otherdomains have been less promising (Chaplin, 1991).

More research also needs to be done with variables ata bandwidth more narrow than the Big Five (Rothstein,Jackson, & Tett, 1994). In the current study, the Orderfacet of Conscientiousness was not a useful predictor inany condition. In addition, the findings for the Dutiful-ness facet were less consistent with the overall findingsfor the Conscientiousness scale than was the case for theremaining four facets. Lower reliability for these scales,however, explains much of their lower and inconsistentcriterion-related validity in the current study. Future re-search with other Big Five constructs and the associatedfacets may produce different results than those found forConscientiousness.

It appears that the use of general personality items ina personnel selection context does result in significantlymore error on the predictor side of the predictor-crite-rion equation, resulting in lower validities than is the casefor context-specific items. This was evidenced by thelower error variances for the work-specific item sets thanfor the noncontextual item sets in both studies. This ad-ditional predictor error is also likely to affect the factorpattern of personality measures used in a personnel se-lection context. Indeed, Schmit and Ryan (1993) foundthat a simple factor pattern for a set of personality mea-sures completed by a volunteer sample was substantiallydifferent from a complex factor pattern found in an ap-plicant sample. Furthermore, the complex factor patternof the applicant sample included a primary factor thatthe authors labeled as an ideal employee factor. This fac-

tor included most of the items on the personality formthat were most directly related to work. The self-presen-tation item response theory would suggest that these werethe items on which self-presentation was easiest to do;these items would also be most consistent with the itemdevelopment advice of Johnson (1981). In Study 1, wefound evidence for psychometric equivalence of the twotest forms used within applicant and nonapplicant con-ditions. A test of applicant versus nonapplicant differ-ences, as was presented by Schmit and Ryan (1993), wasnot possible because of the nonindependence of thegroup combinations that would make this test possible.However, a post hoc examination of differences in averagefactor intercorrelations in Study 1 revealed that higherfactor intercorrelations were present in the job-applicantinstructions condition than in the general instructionscondition for both noncontextual and work-specificitems (results are available from Mark J. Schmit).

The higher factor intercorrelations in the applicantgroups are consistent with Paulhus, Bruce, and Trap-nell's (1995) findings that in the job-applicant instruc-tions condition, even when told to fake being the best can-didate for the job, some individuals respond in a purelyself-descriptive manner. Paulhus et al. showed that thisled to inflated factor correlations in a condition in whichrestriction of range might be expected. Inflated corre-lations in an applicant population may also come fromapplicants using different frames of reference (i.e., gen-eral vs. work specific). However, if this were the case, fac-tor correlations would be expected to drop when frameof reference is held constant with work-specific items;this was not found to be true in the current study. It couldbe that when work-specific items are used, factor inter-correlations are increased because individuals have clearconceptions about the social norms that call for moreconsistency of trait behaviors within certain situations,such as work-related situations, than across all situations(cf. Mischel, 1973;Moskowitz, 1994). Future research isneeded to explore this possibility.

Clearly, the greatest limitation of the current studies isthe potential for limited external validity because of thefact that they were simulated selection-context lab stud-ies. The replication of findings within the set of studies isa positive indicator for the possibility of external validity,but field replication is still needed. For example, in Study1 we found psychometric equivalence for both forms ofthe test items in the job-applicant instructions condition.Although the factor correlations in this study were in-flated in the applicant instructions condition, as in thatof Schmit and Ryan (1993), the complex factor loadingfound by Schmit and Ryan in a real applicant populationwas not evidenced. In Study 2, a monetary incentive wasused in the applicant instructions condition, but this mayhave fallen short of having the motivating potential of a

Page 13: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

FRAME-OF-REFERENCE EFFECTS 619

real position opening. Finally, Study 2 involved an aca-demic predictor and criterion that may not have the samegeneralizability to work predictor-criterion relation-ships. Field replication should address these issues. Thisfield replication should also explore the possibility thateven more narrowly denned situational contexts may berequired for some jobs in the writing of personality itemsto potentially increase validity. For example, the item "Iam courteous to everyone I meet at work" might be mademore situationally relevant by changing it to "I am cour-teous to every customer I meet at work." An individualin a customer service position may be very courteous tocustomers but very rude to coworkers. If the behavior ofmost value to the organization is related to customer ser-vice and if the criterion reflects this, then the second itemmay be more predictive. It may be that items from scales,other than possibly the Conscientiousness scale, requiredifferent levels of context specificity to make them morevalid than the general items when used for personnel se-lection. Finally, a logical alternative to altering every testitem to be context specific would be to simply instructthe respondents to answer all questions with respect tothe work context1; although, as shown by Paulhus et al.(1995), some respondents may ignore instructions andanswer the items in the general context to which they arereferenced.

Given the recent increased interest in the use of per-sonality measures in personnel selection, it is importantthat researchers continue to look for ways to improve thepredictive efficiency of these types of tests. This set ofstudies represents a first step in that direction.

'This suggestion was made by an anonymous reviewer.

References

Barrick, M. R., & Mount, M. K. (1991). The Big Five person-ality dimensions and job performance: A meta-analysis. Per-sonnel Psychology, 44, 1-26.

Bentler, P. M. (1990). Comparative fit indexes in structuralmodels. Psychological Bulletin, 107, 238-246.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests andgoodness of fit in the analysis of covariance structures. Psy-chological Bulletin, 88, 588-606.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of as-sessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testingstructural equation models (pp. 136-162). Newbury Park,CA: Sage.

Chaplin, W. F. (1991). The next generation of moderator re-search in personality psychology. Journal of Personality, 59,143-178.

Costa, P. T, Jr., & McCrae, R. R. (1989). The NEO PI/FFImanual supplement. Odessa, FL: Psychological AssessmentResources.

Costa, P. T, Jr., & McCrae, R. R. (1992). NEO PI-R profes-

sional manual. Odessa, FL: Psychological AssessmentResources.

Costa, P. T, Jr., McCrae, R. R., & Dye, D. A. (1991). Facetscales for Agreeableness and Conscientiousness: A revision ofthe NEO Personality Inventory. Personality and IndividualDifferences, 12, 887-898.

Dollinger, S. J., & Orf, L. A. (1991). Personality and perfor-mance in "personality": Conscientiousness and Openness.Journal of Research in Personality, 25, 276-284.

Drasgow, F, & Kanfer, R. (1985). Equivalence of psychologicalmeasurement in heterogeneous populations. Journal of Ap-plied Psychology, 70, 662-680.

Gordon, M. E., & Gross, R. H. (1978). A critique of methodsfor operationalizing the concept of fakeability. Educationaland Psychological Measurement, 38, 771-782.

Hogan, R. T. (1982). A socioanalytic theory of personality. InM. Page (Ed.), Nebraska symposium on motivation (Vol. 30,pp. 56-89). Lincoln: University of Nebraska Press.

Hogan, R. T. (1991). Personality and personality measurement.In M. D. Dunnette & L. M. Hough (Eds.), Handbook of in-dustrial and organizational psychology (2nd ed., Vol. 2, 873-919). Palo Alto, CA: Consulting Psychologists Press.

Hogan, R. T, Carpenter, B. N., Briggs, S. R., & Hansson, R. O.(1985). Personality assessment and personnel selection. InH. J. Bernardin & D. A. Bownas (Eds.), Personality assess-ment in organizations (pp. 21-51). New York: Praeger.

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., &McCloy, R. A. (1990). Criterion-related validities of person-ality constructs and the effect of response distortion on thosevalidities. Journal of Applied Psychology, 75, 95-108.

Johnson, J. A. (1981). The "self-disclosure" and "self-presen-tation" views of item response dynamics and personalityscale validity. Journal of Personality and Social Psychology,40,161-169.

Joreskog, K. G. (197la). Simultaneous factor analysis in sev-eral populations. Psychometrika, 36,409-426.

Joreskog, K. G. (1971b). Statistical analysis of sets of conge-neric tests. Psychometrika, 36, 109-133.

Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: Structuralequation modeling with the SIMPLIS command language.Chicago: Scientific Software International.

Lanning, K. (1988). Individual differences in scalability: Analternative conception of consistency for personality theoryand measurement. Journal of Personality and Social Psychol-ogy, 55, 142-148.

Lautenschlager, G. J. (1986). Within-subject measures for theassessment of individual differences in faking. Educationaland Psychological Measurement, 46, 309-316.

Leary, M. R., & Kowalski, R. M. (1990). Impression manage-ment: A literature review and two-component model. Psy-chological Bulletin, 107, 34-47.

McCrae, R. R., & Costa, P. T. (in press). Toward a new genera-tion of personality theories: Theoretical contexts for the five-factor model. In J. S. Wiggins (Ed.), The five-factor modelof personality: Theoretical perspectives. New York: GuilfordPress.

Mischel, W. (1973). Toward a cognitive social learning recon-ceptualization of personality. Psychological Review, 80, 252-283.

Page 14: Frame-of-Reference Effects on Personality Scale Scores and ... Classes/Fall 07/Org... · FRAME-OF-REFERENCE EFFECTS 609 compensate for the weaknesses of each (see Lauten-schlager,

620 SCHMIT, RYAN, STIERWALT, AND POWELL

Moore, T. (1987, March 30). Personality tests are back. For-tune, 115, 74-82.

Moskowitz, D. S. (1994). Cross-situational generality and theinterpersonal circumplex. Journal of Personality and SocialPsychology, 66, 921-933.

Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validityof observer ratings of the Big Five personality factors. Journalof Applied Psychology, 79, 272-280.

Paulhus, D. L., Bruce, M. N., & Trapnell, P. D. (1995). Effectsof self-presentation strategies on personality profiles and theirstructure. Personality and Social Psychology Bulletin, 21,100-108.

Pervin, L. A. (Ed.). (1989). Handbook of personality theoryand research. New York: Guilford Press.

Roberts, B. W., & Donahue, E. M. (1994). One personality,multiple selves: Integrating personality and social roles.Journal of Personality, 62, 199-218.

Rothstein, M., Jackson, D. N., & Tett, R. P. (1994, April). Per-sonality and job performance: Limitations and challenges tovalidation research. In R. C. Page (Chair), Personality and jobperformance: Big Five versus specific traits. Symposium con-ducted at the Ninth Annual Conference of the Society for In-dustrial and Organizational Psychology, Nashville, Tennessee.

Rynes, S. L. (1993). Who's selecting whom? Effects of selectionpractices on applicant attitudes and behavior. In N. Schmit &W. C. Borman (Eds.), Personnel selection in organizations(pp. 240-274). San Francisco: Jossey-Bass.

Schlenker, B. R., & Weigold, M. F. (1992). Interpersonal pro-cesses involving impression regulation and management. An-nual Review of Psychology, 43, 133-168.

Schmit, M. J., & Ryan, A. M. (1993). The Big Five in personnelselection: Factor structure in applicant and nonapplicantpopulations. Journal of Applied Psychology, 78, 966-974.

Schwab, D. L. (1971). Issues in response distortion studies ofpersonality inventories: A critique and replicated study. Per-sonnel Psychology, 24, 637-647.

Shoda, Y., Mischel, W., & Wright, J. C. (1989). Intuitive inter-

actionism in person perception: Effects of situation-behaviorrelations on dispositional judgments. Journal of Personalityand Social Psychology, 56, 41-53.

Shoda, Y, Mischel, W., & Wright, J. C. (1993). The role ofsituational demands and cognitive competencies in behaviororganization and personality coherence. Journal of Personal-ity and Social Psychology, 65, 1023-1035.

Snyder, M., & Ickes, W. (1985). Personality and social behavior.In G. Lindzey & E. Aronson (Ed.), The handbook of socialpsychology (3rd ed., Vol. 2, pp. 883-947). Hillsdale, NJ:Erlbaum.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personalitymeasures as predictors of job performance: A meta-analyticreview. Personnel Psychology, 44, 703-742.

Thorne, A. (1989). Conditional patterns, transference, and thecoherence of personality across time. In D. M. Buss &N. Cantor (Eds.), Personality psychology: Recent trends andemerging directions (pp. 149-159). New York: Springer-Verlag.

Weiss, H. M., & Adler, S. (1984). Personality and organiza-tional behavior. In B. M. Staw & L. L. Cummings (Eds.),Research in organizational behavior (Vol. 6, pp. 1-50).Greenwich, CT: JAI Press.

Windle, C. (1954). Test-retest effect on personality question-naires. Educational and Psychological Measurement, 14,617-633.

Wright, J. C., & Mischel, W. (1987). A conditional approachto dispositional constructs: The local predictability of socialbehavior. Journal of Personality and Social Psychology, 53,1159-1177.

Yate, M. J. (1994). Knock 'em dead: The ultimate job seekers'handbook. Holbrook, MA: Adams.

Zuroff, D. C. (1986). Was Gordon Allport a trait theorist?Journal of Personality and Social Psychology, 51, 993-1000.

Received December 20, 1994Revision received April 20, 1995

Accepted April 20, 1995 •


Recommended