Attitudes Towards Personnel Selection Methods

APPLIED PSYCHOLOGY: AN INTERNATIONAL REVIEW, 2003,

52

(4), 515

−

532

© International Association for Applied Psychology, 2003. Published by Blackwell Publishing,9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

Blackwell Publishing LtdOxford, UKAPPSApplied Psychology: an International Review0269-994X© International Association for Applied Psychology, 2003October 20035241000Original ArticleATTITUDES TOWARDS SELECTION METHODSMARCUS

Attitudes Towards Personnel Selection Methods: A Partial Replication and Extension in

a German Sample

Bernd Marcus*

Chemnitz University of Technology, Germany

Cette recherche qui fait appel à un échantillon de 213 étudiants allemandsporte sur les attitudes envers un ensemble de méthodes utilisées dans la sélectionprofessionnelle. Son but premier était d’apporter un nouvel éclairage sur lesdifférences culturelles qui marquent les réactions des candidats devant lestechniques de sélection en reconstituant partiellement une étude de Steiner &Gilliland (1996) qui recueillirent des évaluations de l’acceptation du processuspour dix procédures différentes auprès d’étudiants français et américains. Desdivergences significatives sont apparues au niveau des moyennes, mais aucunestructure sous-jacente ne put rendre compte de ces différences. En général, lessujets des trois nations ont note les plus favorablement les méthodes répandues(l’entretien et le C.V.), ainsi que les procédures en rapport évident avec letravail (les tests d’échantillon de travail), puis les tests papier-crayon, tandisque les contacts personnels et la graphologie étaient négativement appréciés.Autre objectif important: éprouver la validité des courtes descriptions desinstruments de sélection généralement utilisées dans les études comparativesportant sur ce thème. On a évalué deux fois les attitudes envers quatre typesde tests imprimés, une premiére fois après la présentation de la description etune seconde fois à l’issue de la passation du test. La convergence prétest-posttest, de basse à moyenne, met en évidence de sérieux problémes en ce quiconcerne ces descriptions des tests papier-crayon. On aborde aussi les leçonsà en tirer quant aux jugements sur les pratiques de sélection du point de vuedes candidats et pour les recherches à venir.

* Address for correspondence: Bernd Marcus, Department of Psychology, ChemnitzUniversity of Technology, Wilhelm-Raabe-Str. 43, D-09107 Chemnitz, Germany. Email:[email protected]

Formerly at the University of Tübingen, Germany.This research was supported by a grant from the

Deutsche Forschungsgemeinschaft

(SCHU422/9-2; granted to Heinz Schuler) and by a doctoral scholarship, granted by the

BundeslandBaden-Wuerttemberg.

I am grateful to Dick Weissert and Stefan Höft for many helpfulcomments on an earlier draft of this paper, and to Dirk Steiner and Stephen Gilliland for theirpermission to reprint parts of their original results. I also wish to thank the editor and theanonymous reviewers for many insightful suggestions

516

MARCUS

© International Association for Applied Psychology, 2003.

This research examined attitudes towards a variety of personnel selectionmethods in a German student sample (

N

=

213). Its first objective was to shedfurther light on cultural differences in applicant reactions to selection techniquesby partially replicating a study by Steiner and Gilliland (1996), who obtainedratings of process favorability for ten different procedures from two groupsof French and American students. Results indicated a number of significantmean discrepancies but no systematic pattern appeared to underlie thesedifferences. In general, subjects in all three nations rated widespread methods(e.g. interview, résumés) or obviously job-related procedures (work sampletests) most favorably, followed by paper-and-pencil tests, whereas personalcontacts and graphology appeared in the negative range. A second majorobjective was to examine the validity of the brief descriptions of selectioninstruments often used in comparative studies on this topic. Attitudes towardsfour different types of written tests were assessed twice for this purpose, onceafter presenting descriptive information, and a second time after actual testadministration. Low to moderate pretest–posttest convergence pointed toserious problems with these descriptions for paper-and-pencil tests. Implica-tions for current evaluations of selection practices from the applicants’perspective and for future research are discussed.

INTRODUCTION

Applicant reactions to various selection procedures have received con-siderable attention by I/O psychology in recent years. One reason for thisemerging trend is that the first personal contact between an employer and aprospective employee is usually established through the selection processwhich might affect an applicant’s attitudes towards the organisation andinfluence his or her decision to accept a job offer (Rynes, 1993). Anothercause is the growing interest in the applicant’s perspective on selection situ-ations relative to that of the employer, a perspective Schuler (1993; Schuler& Stehle, 1983) labeled

social validity

in contrast to the more organisation-centered criterion-related validity of selection devices. According to Schulerand Stehle, applicants evaluate the selection process and the instrumentsapplied therein on the basis of four distinguishable aspects: (1) how

inform-ative

they are with respect to job requirements, (2) the degree to which itis possible to

participate

in the selection process and control its outcomes,(3) how

transparent

the methods are, and (4) whether an acceptable

feedback

is provided. A more recent yet highly influential contribution is Gilliland’s(1993) application of justice theory to the selection process, in which hedistinguished between the dimensions of procedural and distributive justiceto develop a formal model of the antecedents, rules, and outcomes ofapplicants’ perceptions of the fairness of selection systems.

As a consequence of these and other developments, dozens of empiricalstudies were conducted in the past decade to investigate the favorability ofattitudes towards specific instruments, compare their relative acceptability,

ATTITUDES TOWARDS SELECTION METHODS

517


and explore the bases of these evaluations (e.g. Gilliland, 1994; Harland,Rauzi, & Biasotto, 1995; Kravitz, Stinson, & Chavez, 1996; Macan, Avedon,Paese, & Smith, 1994; Ryan, Greguras, & Ployhart, 1996; Smither, Reilly,Millsap, Pearlman, & Stoffey, 1993; Whitney, Diaz, Mineghino, & Powers,1999, to quote only a few). Among the more generalisable results fromthese studies are the finding that certain types of selection procedures (e.g.interviews, work sample tests) are viewed most favorably while others (e.g.graphology, polygraphs) are almost uniformly rejected, and the observationthat theoretically different facets of fairness are often highly intercorrelatedempirically, with face validity or perceived job relatedness playing a particu-larly crucial role for overall evaluations.

The purpose of the present study is twofold. First, it is intended to addanother piece of evidence on the relative fairness of selection procedures, asperceived by test takers, with the emphasis on cultural differences. Cross-national surveys (e.g. Lévy-Leboyer, 1994; Ryan, McFarland, Baron, &Page, 1999; Schuler, Frier, & Kauffmann, 1993; Shackleton & Newell, 1994)consistently demonstrated that the extensiveness of method use differssubstantially across nations. For example, written ability and personality testsare much more extensively employed in North America than in Germany(e.g. Schuler et al., 1993, found that intelligence tests are almost exclusivelyused for selecting apprentices in Germany, and the usage rates of personalitytests are 10 per cent or less for all job categories). Graphology as a selectiondevice seems to be extensively used in France and the French-speaking partof Belgium (Lévy-Leboyer, 1994; Schuler et al., 1993; Shackleton & Newell,1994), and is also employed by many Spanish companies (Schuler et al.,1993), but is very rarely used in other nations (see the above cited sources). Suchdifferences may affect favorability, for example via mere exposure effects(Zajonc, 1968). Findings from previous investigations on test takers’ attitudes,mostly conducted with US samples, may therefore not generalise to other coun-tries. The only study to date that directly compared test takers’ attitudes fromtwo different nations was that by Steiner and Gilliland (1996) who had twogroups of French and American students rate ten different selection methods.In the present research, one part of their study examining relative process fav-orability is replicated with a German sample, using the same measures andprocedures as Steiner and Gilliland to provide directly comparable results.

The second purpose of this paper is to highlight some methodologicalproblems with current research on test takers’ attitudes. Surveys in generalcan only be reliable when participants are familiar with the object ofattitude, that is, they know what they are talking about. Because laypeopleoften have a very limited knowledge of many selection procedures, thissuggests the importance of administering the tests in question first andthen assessing favorability. On the other hand, it is most informative tocompare ratings for a wide variety of selection methods collected from

518

MARCUS


the same sample. There is an obvious tradeoff between these two goals,since it is usually not possible in a cost-effective way to administer a largenumber of instruments in one study. As a consequence, most comparativestudies on this topic (e.g. Fruhner, Schuler, Funke, & Moser, 1991; Kravitzet al., 1996; Smither et al., 1993), including that by Steiner and Gilliland, hadto rely on introducing the various procedures by brief descriptions and, inpart, controlling for differential familiarity by asking for prior experience.

It is not self-evident that this procedure can serve as a sufficient sub-stitute for real test applications. If the descriptions were valid, they shouldlead to essentially the same results, both at the group and the individuallevel, as actually administering the tests in question. Put differently, agroup of laypeople who evaluate a number of tests on the basis of briefdescriptions should provide highly similar ratings as on the basis of actualexperience with the same kinds of tests. If this were true, it would beexpected that (a) the group means of favorability ratings for a singlecategory of selection procedures do not differ substantially between thetwo modes of presentation (group level), and (b) high correlations betweenratings of merely described tests and actually experienced tests are obtained(individual level).

Two recent studies (Bauer, Maertz, Dolen, & Campion, 1998; Chan,Schmitt, Sacco, & DeShon, 1998) addressed this issue by examining testattitudes before and after actually taking cognitive ability measures; Chanet al. also administered a personality test. In neither investigation did groupmeans change notably after test administration. Despite some differences inattitude measurement, however, both studies indicated only moderate pre-posttest reactions stability for the intelligence tests (correlations range from.34 to .60), whereas Chan et al. found somewhat higher values (.60 to .66)for their personality test. While these findings suggest that actual experiencewith a test changes attitudes to a limited yet not negligible extent, it is note-worthy that in neither study were procedures introduced by the descriptionstypically used for multi-method comparisons. The highly diverse contentof the latter type of investigations precludes use of general test fairnessperceptions (as in Bauer et al.), or sample items (as in Chan et al.).

The present study addresses the question whether such descriptions arevalid proxies for real experience more directly by providing subjects withthe cues used by Steiner and Gilliland before attitude measurement and testadministration. It is also more comprehensive in that four different types ofselection instruments are assessed thereafter: a cognitive ability measure, ageneral personality inventory, an integrity test, and a biographical question-naire. In addition, the research design permits examination of the extentto which test scores and attitudes are related for different instruments (cf.Bauer et al., 1998; Chan et al., 1998; Jones & Joy, 1991; Ryan & Sackett,1987; Whitney et al., 1999, with mixed results).


519


METHOD

Sample

Two hundred and thirteen undergraduate students from a German university,majoring in diverse subjects (e.g. economics, business and administration,biology and other natural sciences, agriculture) participated for this study.In all, 90.4 per cent of the sample indicated having at least one month ofprior job experience and 54.8 per cent had an employment record of morethan one year. Of the participants 89 (41.8%) were female. Mean age was23.7 years with a standard deviation of 2.9 and a range from 20 to 41. Thecorresponding figures from the Steiner and Gilliland study are 68 per centjob experience, 83 per cent women, and a mean age of 20.4 years for theFrench sample, and 99 per cent job experience, 75 per cent female particip-ants, and 20.4 years mean age for the US sample, respectively. In contrastto the German sample, almost all of Steiner and Gilliland’s French subjectsand one-third of their American participants were majoring in psychology.At least with respect to age and gender, the demographic composition ofthe present sample differs from both of Steiner and Gilliland’s groupsmore than the latter two samples differ from one another. It will beaddressed in the results section whether these differences translate intodivergent ratings.

Procedure

Subjects were invited via campus advertising to participate in a psycholo-gical research project. A reward of DM 30 cash was offered in order toreduce any bias due to voluntary participation (see Ullman & Newcomb,1998, for evidence that a monetary incentive effectively enhances the willing-ness of otherwise reluctant subgroups to participate in time-consumingresearch). This is approximately the wage the university pays to studentemployees for two working hours. Administration of all materials tookabout 90 to 120 minutes for each participant.

The study was conducted in 20 group sessions of 10 to 12 subjects eachunder the surveillance of either a male or a female test proctor. At thebeginning of each session, participants were instructed to read the briefdescriptions of ten selection procedures and then provide their initial rat-ings. They next took a battery of eight psychological tests, including thosedescribed in the measures section (three specific personality tests are omittedfrom the following analyses, as well as one social desirability scale for whichno ratings were collected). Sequence of administration was counterbalancedfor all tests, except for the intelligence measure which was always presentedfirst, because it was the only test with a fixed time limit. Immediately aftertest administration, examinees rated the instruments they had just completed

520

MARCUS


on the same form used to measure pretest reactions and then received theirrewards in a separate room.

Measures

Attitudes and Test Descriptions.

Descriptions of the ten selection pro-cedures and items on process favorability were adapted from Steiner andGilliland (1996). Method descriptions were in part slightly revised in orderto make them more compatible with German selection practices, therebyleaving content unchanged as far as possible. The most notable example isthe personal references where the American practice, as expressed in theSteiner and Gilliland description (“. . . you must request letters of referenceor provide the names of your prior employers so that the employer canobtain information about your suitability on the job”, p. 136) contrastssharply to a much more formalised process in Germany (present descrip-tion: “references provided by your prior employers in which they evaluateyour behavior and performance in the prior job”. Note that employers inGermany have a legal obligation to write such references, and their contentand wording is highly formalised). Another change occurred for the honestyor integrity test which Steiner and Gilliland had introduced as a typicalexemplar of the overt category for these instruments (“Tests that ask youabout your thoughts on theft and experiences related to your personalhonesty”, p. 136) while, in the present study, both an overt and a personality-based test are presented with items in a mixed order (see Sackett, Burris, &Callahan, 1989, for this distinction). The present description was moreneutral with respect to this categorisation, introducing the method as “aspecific written personality test which asks you questions related to yourtrustworthiness, reliability, and honesty” (the exact wording of all descrip-tions is available upon request).

For assessing process favorability, the two items of Steiner and Gillilandwere more literally translated by the present author. The items examine, ona 7-point Likert-type scale, perceived predictive validity of the method(“How would you rate the effectiveness of this method for identifyingqualified people for a job you were likely to apply for after graduation?”),and an evaluation of test fairness (“If you did not get the job based on thisselection method, what would you think of the fairness of this procedure?”).All translations were checked by an independent bilingual reviewer.

Tests Administered.

A German version of the Wonderlic Personnel Test(Wonderlic, 1996), a brief measure of

g

widely used for personnel selectionin the US, represents cognitive ability tests in the present study. As a generalpersonality inventory, the 240-item NEO-PI-R (Costa & McCrae, 1992;German version by Angleitner & Ostendorf, 1993) was used, a comprehensive


521


measure of the non-cognitive traits comprising the five-factor model ofpersonality. As already mentioned, the integrity test covers both the overt(58 items) and the personality-based (53 items) type which were jointlypresented. It was newly constructed because no German-language integritytest existed at the time of data collection but it had already gone successfullythrough a series of construct and criterion-related validation studies (Marcus,2000; Marcus, Höft, Riediger, & Schuler, 2000; Marcus, Schuler, Quell, &Hümpfner, 2002).

The exemplar of biographical questionnaires, however, is somewhat lessprototypical for the entire class of selection procedures. It is a newlydeveloped (Marcus, in press) 67-item measure tapping into the specificconstruct of self-control, as defined by Gottfredson and Hirschi (1990). It istherefore not a typical biodata scale, as used for personnel selection, but itrelies exclusively on reports of overt behavior in the past and, thus, meetsthe definition of a biographical questionnaire as specified by Mael (1991).Results for this instrument may nevertheless be interpreted with caution.

RESULTS

Means and standard deviations for favorability ratings from Steiner andGilliland and from the present study, assessed before and after test admin-istration in the latter case, are presented in Table 1. Significant differencesbetween methods within one sample, as well as those between differentsamples for the same selection device, are also indicated in the table.

ANOVAs indicated a significant main effect for selection method(within-subject factor; SS: 2692.16, df: 9, MS: 299.13,

F

: 4.00,

p

<

.01), asfound by Steiner and Gilliland, and a significant interaction with gender(SS: 47.75, df: 9, MS: 5.31;

F

: 4.00,

p

<

.01), where women rated intelligencetests (

d

=

−

.54) and personal contacts (

d

=

−

.40) less favorably than menbut slightly preferred interviews (

d

=

.27) and résumés (

d

=

.29). If thesmaller proportion of women in the present sample compared to Steiner andGilliland’s study had translated into mean differences for the entire samples(i.e. these were due to a gender effect rather than an effect for country),precisely the opposite pattern of single contrasts than those indicated inTable 1 would have been expected for the four respective procedures.The main effect for gender was non-significant (between-subjects factor;SS: 7.21, df: 1, MS: 7.21,

F

: 3.31,

p

>

.05).The single contrasts shown in Table 1 indicate that the present subjects

in general evaluated interviews and work sample tests most favorably,followed by references and résumés. Paper-and-pencil tests appear in theneutral range, with the intelligence test most and the biodata inventory leastpreferred, especially after test taking. In general, however, test application hadonly minor influence on

mean

ratings (none of the

t

-tests on pre-posttest

522

MARCUS


differences was significant). Personal contacts and, in particular, graphologyreceived clearly negative reactions on average. This tendency to prefer facevalid and commonplace procedures over written tests, and to reject non-transparent methods like graphology, generally confirms findings from allthe comparative studies cited above.

In contrast to this rough correspondence in the general pattern, a moredetailed inspection of Table 1 reveals a large number of significant differ-ences between the present sample and both the French and the Americanparticipants of the Steiner and Gilliland study for single tests. However, no

TABLE 1Mean Process Favorability Ratings from Steiner and Gilliland (1996)

and the Present Study (Standard Deviations in Parentheses)

Selection Method

Steiner & Gilliland (1996) Present Study

USA France Pretest Posttest

1. Résumés 5.37a

(1.19)> 4.54b

(1.19)U> 4.85b

(1.13)2. Written ability test 4.50b

(1.25) 4.21b.c

(1.36)U> 4.10c

(1.17)4.30a

(1.43)3. Work-sample test 5.26a

(1.49) 5.26a

(1.19) 5.34a

(1.20)4. Personal contacts 3.29c

(1.64) 2.92d.e

(1.67)U> 2.62f

(1.51)5. (General) personality test 3.50c

(1.30)< 3.96c

(1.35)U< 4.18c

(1.09)3.96b

(1.20)6. Honesty (or integrity) test 3.41c

(1.62)> 2.54e

(1.24)F< 3.64d

(1.28)3.83b

(1.19)7. Personal references 4.38b

(1.30) 4.12b.c

(1.10)U<F<

4.91b

(1.21)8. Interview 5.39a

(1.26)> 4.56b

(1.19)F< 5.67a

(.99)9. Graphology 1.95d

(1.18)< 3.23d

(1.62)F> 1.90g

(1.10)10. Biographical questionnaire 4.59b

(1.31)> 3.91c

(1.31)U>F>

3.20e

(1.27)3.02c

(1.15)

Note: Same subscripts within columns indicate that means are not significantly different at p < .05 (Tukeypost-hoc tests). Greater or less than signs stand for significant (t-tests; p < .01) differences between adjacentcolumns (same study), or between German pretest reactions (third column) and the US (U, first column) orFrench (F, second column) sample. Ns are 142 for the American sample, 117 for the French sample, and209 to 213 for the present study.

Data for the US and French samples from Table 3, p. 137, in “Fairness reactions to personnel selectiontechniques in France and the United States” by D.D. Steiner and S.W. Gilliland, 1996, Journal of AppliedPsychology, 81, 134–141. Copyright 1996 by the American Psychological Association. Adapted withpermission.

ATTITUDES TOWARDS SELECTION METHODS 523


systematic tendency appears to underlie these differences. The most markeddistance between the US and the German sample occurred for a paper-and-pencil test (biodata) but a null finding (integrity) and the opposite direction(intelligence) may be found within the same category. Relative to the Frenchexaminees, the German sample showed the largest negative difference forgraphology, an expected result given the widespread use of this procedurein France, but the largest positive contrasts occurred for interviews, astandard procedure in both countries, and honesty tests, which are virtuallynonexistent in either nation’s selection practices. Thus, evidence for system-atic cultural differences is not readily revealed from the present data. Thisconclusion is even strengthened if one concentrates only on the differencesbetween the rank orders of procedures across countries, as proposed byone anonymous reviewer. Here, the only remarkable differences appearto remain for integrity tests between Germany and France, and for biodataand references between Germany and the USA. To the best of my know-ledge, the current literature provides no firm basis to explain these particulardiscrepancies so that it would be mere speculation to interpret them asmore than unsystematic deviations from a general pattern of similarity inrankings.

Table 2 presents the intercorrelations and, in parantheses, internal con-sistencies (Cronbach’s α) for all study variables. (The focus of the presentstudy is mainly on the convergent relationships given in bold in the table.But it has to be mentioned that a principal components analysis on the tentwo-item pretest ratings indicated no evidence for a general “favorability”factor across all instruments (four components extracted; eigenvalues: 1.9,1.6, 1.2, 1.0). After varimax-rotation, the first two factors were interpretableas comprising non-cognitive tests, and standard procedures (résumé, refer-ences, interview), respectively, while the last two factors had substantialloadings only for specific instruments. Hence, it appears as if subjects inthe present investigation sharply distinguished between most selection pro-cedures. This certainly does not mean that they also distinguished betweendifferent aspects of evaluation.) As in almost any study on this topic before,where the aspects of face validity, fairness, transparency, etc. were oftentreated separately, the relatively high internal consistencies of the presenttwo-item measures seem to point to an overall evaluative factor. This topicsurely merits attention in future studies.

With respect to pretest–posttest correlations, it is striking that ratings forthe same instruments were only modestly related between the two points ofmeasurement. The correlations are even lower than those found by Baueret al. and Chan et al., particularly for the personality tests examined here.This would point to the conclusion that the present examinees, despitethe very modest differences in mean ratings (see Table 1), individuallychanged their minds in many cases after they had actually experienced a

524M

AR

CU

S

© International A

ssociation for Applied P

sychology, 2003.

TABLE 2Intercorrelations and Internal Consistency Reliabilities of Study Variables

Variables Pretest Favorability Posttest Favorability Test Scores

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18)

Pretest favorability:(1) Résumé (.52)(2) Ability test .01 (.71)(3) Work-sample − .15* .09 (.70)(4) Pers. contact − .21** .09 .03 (.44)(5) Personalitytest

− .01 .04 − .13 .11 (.62)

(6) Honesty test .17* .03 .11 .14* .46** (.65)(7) References .31** − .04 − .03 .04 − .00 − .04 (.63)(8) Interview .33** .03 .11 − .09 − .01 .02 .24** (.43)(9) Graphology − .06 .05 − .03 .09 .10 .20** .02 − .17* (.81)(10) Biodata .06 .14* .01 .10 .34** .23** − .02 − .05 .16* (.76)

Posttest favorability:(11) WPT .17* .40** .06 − .01 .09 .00 .01 .00 − .03 .09 (.82)(12) NEO-PI-R .11 .13 − .03 − .06 .30** .18** .05 .10 .14* − .02 .07 (.78)(13) Integr. test .09 .01 − .05 .04 .23** .21** .11 .14* .17* .05 .02 .59** (.72)(14) Biograph.questionn.

.06 − .06 .01 .16* .08 .09 .07 − .06 .16* .24** .05 .20** .22** (.69)

Test scores:(15) Intelligence .01 .22** .21** − .08 − .04 .02 .05 − .01 − .17* .06 .27** .01 − .06 − .04 −(16) Conscient. .02 .02 − .01 .12 .01 .10 .04 .10 .03 .09 − .01 .10 .11 .12 .09 (.92)(17) Integrity .00 − .04 − .04 − .02 .08 .05 .02 .04 − .04 .07 − .05 .03 .08 .05 .02 .50** (.92)(18) Self-control(biograph.)

.06 − .08 − .01 − .06 − .03 .07 .01 − .05 .05 .01 − .09 − .02 .07 .03 − .02 .43** .54** (.92)

Note: Variables (1) through (10) are the same as in Table 1. Integr. test = mixed overt and personality-based integrity test; Biogr. questionn. = Retrospective BehavioralSelf-Control Scale (RBS); Conscient. = NEO-PI-R Conscientiousness. Convergent correlations between procedures of the same kind (pretest-posttest attitudes, or testscore–favorability relationships) are given in boldface. * = p < .05; ** = p < .01; N = 209 to 213.



test situation.1 The information provided by descriptions of two to three lineslength, as presented here and in other studies, does not seem to representthe objects of attitude sufficiently for a decided evaluation.

One final research question of this investigation applied to the relation-ships between attitudes and test scores. Conscientiousness was chosen out ofthe five NEO domains because this factor has a more generalisable relevancefor job performance across job categories than the other four (Barrick &Mount, 1991) and was also investigated by Chan and colleagues (1998). Insummary, the findings presented in Table 2 show that only performance inthe cognitive abilities measure was associated with ratings of the sameinstruments’ validity and fairness, as assessed in the favorability scores, andeven this correlation was not overly substantial. Thus, from the presentdata, it has to be concluded that test performance and evaluation are largelyindependent of each other.

DISCUSSION

Two major goals were pursued in this study. The first was to enhanceknowledge on cultural differences in applicant reactions to various selectionprocedures by replicating the only previous cross-national comparison ofFrance and America (Steiner & Gilliland, 1996) with a German sample. Inagreement with this study, the present results in general suggest that thesedifferences are less pronounced than might be expected from the greatvariation in method use. In all three countries, subjects evaluated interviews,work sample tests, and résumés most favorably, held a neutral attitudetowards most written tests, and expressed reservations against personalcontacts and graphology. Generalisability of this tripartite rank order ofevaluations is further corroborated by findings from past research where asomewhat different methodology has been applied (Fruhner et al., 1991;Kravitz et al., 1996; Rynes & Connerley, 1993; Smither et al., 1993). Thus,when investigated with student samples and brief descriptive stimuli, mean

1 One anonymous reviewer, providing hypothetical data in support, raised the concern thatlow correlations may be found even when changes from t1 to t2 are only trivial. While thiscould certainly happen under extreme circumstances, the actual data of the present study donot support this argument. The absolute values of the difference scores for the four proceduresevaluated twice have means between 1.09 and 1.23, a quite substantial effect size of about onestandard deviation. Between 13 and 16 per cent of the sample gave identical ratings at bothtimes of measurement, whereas between 21 and 26 per cent changed their initial ratings by twoscale points or more on a 7-point scale. The most extreme values of absolute difference scoreswere between 4.5 and 5.5. Thus, there is evidence from both the correlational analysis and theexamination of absolute difference scores that a non-trivial proportion of the sample evaluatedthese procedures substantially differently before and after test administration.

526 MARCUS


ratings of selection procedures appear to correspond roughly across differ-ent countries.

Although, unlike Steiner and Gilliland, the present study did not explicitlyaddress the bases for these evaluations, the robustness of findings acrosscomparative investigations permits us to draw some conclusions on featuresof selection procedures that applicants consider for their ratings. First, thereappears to be little reason for being concerned with a “justice dilemma”(Cropanzano & Konovsky, 1995), the notion that higher social validity maycome at the price of lower criterion-related validity and vice versa. Instead,the least preferred selection techniques are also characterised by a lack ofscientific evidence for their ability to predict important aspects of jobperformance, and the most valid methods (e.g. ability tests, biodata, worksamples, integrity tests; cf. Schmidt & Hunter, 1998) received quite variableratings. This seems to indicate that social and criterion-related validity arelargely unrelated rather than negatively related. Thus, there is no evidencefor a dilemma; both important goals in personnel selection may well beachieved at the same time.

Apart from the fact that there are some differences in test takers’ attitudes,and in their emphasis on various justice dimensions across countries andstudies, the repeated findings of a relatively robust rank order for selectionmethods as well as highly intercorrelated facets of fairness or justice (e.g.Gilliland, 1994; Kravitz et al., 1996; Macan et al., 1994; Ployhart & Ryan,1997; Smither et al., 1993; Thorsteinson & Ryan, 1997) raise the questionwhether differences are overemphasised relative to commonalities in currentresearch on applicant reactions. Certain selection procedures may be simplyliked or disliked, as indicated by a potential overall evaluative factor. Thatis, a person may feel comfortable or not with one kind of test and thenproject this somewhat diffuse emotion onto rational judgments of fairness,job-relatedness, or invasiveness. The actual reasons for these evaluationsmay well differ across individuals, depending on their perceived strengthsand weaknesses. More in-depth, perhaps qualitative, research methodscould provide additional insights into the exact mechanisms underlyingattitudinal ratings.

With respect to the mean effects replicated herein, it appears as if somefeatures of selection procedures are particularly valued, on average, in anyculture investigated so far. Some of these are obvious. For instance, inter-views, résumés, and references by prior employers are almost ubiquitous inpersonnel selection throughout Western culture and may therefore raise fewobjections (that is, may be seen as a natural part of the selection process notto be called into question). Other methods do not possess this advantageand will therefore have to persuade by different means. For example, worksample tests by definition have an obvious relationship to the position oneis applying for. As has been shown in prior research (e.g. Smither et al.,



1993; Whitney et al., 1999), a more obviously job-related content of writtentests may also improve evaluations for this kind of instruments. Graphology,on the other hand, may have a logical appeal (Steiner & Gilliland, 1996) butcertainly provides no information to evaluate actual test performance aswell as job-relatedness. As Steiner and Gilliland pointed out, it is strikingthat this technique received negative evaluations even in France where it iswidely used in practice. More research is needed to investigate how differentfeatures of selection devices interact or perhaps cancel each other out inevaluations. Future studies may also examine the effects of combining severalinstruments to batteries (see Rosse, Miller, & Stecher, 1994, for a small-scalestudy on this topic), as is usually the case in real-world applications.

The second major objective of the present investigation was to examinehow well real test experience is approximated by the brief descriptions oftenused in comparative studies on applicant reactions. While the mean ratingsremained relatively stable after actual test administration, the findings oflow to moderate correlations for pretest-posttest attitudes cast doubt on thevalidity of some conclusions drawn from such investigations. Past experiencewas also investigated prior to test administration, indicating that between1.4 (integrity test) and 22.5 (ability test) per cent of the sample believed theyhad taken one of the paper-and-pencil procedures before. That is, the vastmajority of subjects had their first personal experience with these kinds oftests during the present study.

The very limited convergence of pretest-posttest evaluations suggests that,for many people, the image of these instruments provided by brief descrip-tions changes considerably after actual experience, whereas the directionsof these changes are almost equally distributed (additional analyses indicatedthat pretest-posttest differences [see footnote 1] are not substantially relatedto any of the personality traits measured in this study). Low bivariatecorrelations are usually taken as evidence that both variables carry a dif-ferent meaning. If pretest ratings based on descriptions measure experience-based attitudes only to a limited extent, they would appear to measuresomething else.

With respect to the paper-and-pencil tests examined here, two explanationsfor this finding seem plausible. First, given that a lot of myths surroundpsychological testing—ranging from “complete nonsense” to “big brotheris watching you”—are discussed in Germany (and perhaps elsewhere), itseems likely that initial ratings were largely based on prejudice (note that nostudents majoring in psychology participated in this study). Second, in lightof the result that all these tests were most often rated neutrally, it is possiblethat many subjects were simply insecure about how to evaluate unfamiliarinstruments and therefore provided an indifferent rating. Both explanationswould shed an unfavorable light on the validity of short descriptions whichwould have been concealed by the apparent stability of mean ratings; the

528 MARCUS


sample items used by Chan et al. may serve this purpose much better. Buthow would one introduce non-standardised techniques by sample items?For instance, in the case of the interview, the generally favorable evaluationof this procedure as a category may change substantially, depending on howan actual interview is conducted. It is now agreed that applicants in generalaccept being interviewed, but it is not so clear that they would accept beingasked invasive questions, or being confronted with aggressive or uninterestedinterviewers. Such topics surely merit attention in future research.2

Finally, except for the measure of intelligence, test scores were largelyunrelated to attitudes in the present investigation. If this were a generalis-able result, it would further corroborate the aforementioned notion thatsocial and criterion-related validity are roughly orthogonal dimensions forthe evaluation of these procedures. For human resources practitioners, thiswould mean that they could apply a decision rule for selecting a methodwhere both factors are weighted with respect to their relative importance andthen simply combined algebraically. Utility analysts of selection proceduresmay incorporate this in their formulas.

This discussion would be incomplete if some shortcomings of the presentstudy were not mentioned. As pointed out in many articles cited throughoutthis paper, especially those which employed applicant samples, laboratorysettings with university students may lead to different conclusions thanactual selection situations. However, provided one is interested in an honestevaluation of selection procedures, it is not self-evident that field settingsreally reveal more reliable results. Actual applicants may be inclined todisguise their real opinion of whatever they experience during the selectionprocess, particularly when this opinion is negative, in order not to offend aprospective employer.

A problem more specific to the replication part of this study arises fromthe fact that all three samples differ with respect to their demographiccomposition and prior job experience. Steiner and Gilliland’s French andAmerican participants were more similar in age and gender than the presentsubjects, whereas both the German and American students had considerablymore job experience than the French subjects. Further, all three samplescontained substantially different proportions of psychology majors. It is notclear from the data how this complex pattern of discrepancies may have

2 An anonymous reviewer pointed out that similar problems of generalising results for acategory as a whole to exemplars of that category may also apply to more standardised selec-tion procedures, like tests or even items within one test. Although I would believe that there ismuch more variation to be expected within a category of non-standardised techniques, I sub-scribe to the notion that test categories should be more sharply distinguished from single testsor items. I would add that different tests than those chosen here may have received differentevaluations, although this statement remains to be empirically examined.



affected the results across countries. However, given that in all threecountries, all participants were university students of predominantly youngage, and that the majority had prior job experience, the similarities insample composition may be seen as outweighing the differences. On theother hand, the more balanced gender composition of the present samplecompared to Steiner and Gilliland’s mostly female subjects is one possiblesource of variation, particularly for those selection methods for whichgender differences were revealed (i.e. intelligence tests, personal contacts,interviews, and résumés). The present results indicate that the single con-trasts reported in Table 1 are probably conservative estimates of the truedifferences for these procedures.

A third potential drawback of the present investigation is the fact thatthe administration of several personality tests in addition to those examinedin this paper may have affected the ratings in an unknown way. However,at least one possible bias—that participants may have mixed up the numer-ous tests on remembrance—has been controlled by obtaining ratings fortwo of the more specific scales. These ratings were both significantly lower(M = 3.40, SD = 1.12, and M = 3.34, SD = 1.20, respectively) and onlymoderately correlated (mean r = .36) to those of the personality inventoryand the integrity test examined here, indicating that the subjects of this studywere able to distinguish between the different tests.

The applicant’s perspective on personnel selection procedures has longbeen overlooked by I/O psychologists, and a number of still underresearchedtopics remain, some of which are mentioned above. The present studywas conducted to fill one of these gaps. However, the considerable overlapbetween different studies may eventually be summarised in one or severalmeta-analyses.

REFERENCES

Angleitner, A., & Ostendorf, F. (1993). Deutsche Version des NEO-PI-R (Form S)[German version of the NEO-PI-R]. Unpublished test manuscript. Bielefeld,Germany: Author.

Barrick, M.R., & Mount, M.K. (1991).The big five personality dimensions and jobperformance: A meta-analysis. Personnel Psychology, 44, 1–26.

Bauer, T.N., Maertz, C.P., Dolen, M.R., & Campion, M.A. (1998). Longitudinalassessment of applicant reactions to employment testing and test outcome feed-back. Journal of Applied Psychology, 83, 892–903.

Chan, D., Schmitt, N., Sacco, J.M., & DeShon, R.P. (1998). Understanding pretestand posttest reactions to cognitive ability and personality tests. Journal of AppliedPsychology, 83, 471–485.

Costa, P.T., & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R)and NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL:Psychological Assessment Resources.

530 MARCUS


Cropanzano, R., & Konovsky, M.A. (1995). Resolving the justice dilemma byimproving the outcomes: The case of employee drug screening. Journal of Businessand Psychology, 10, 221–243.

Fruhner, R., Schuler, H., Funke, U., & Moser, K. (1991). Einige Determinanten derBewertung von Personalauswahlverfahren [Some determinants of the evaluationof personnel selection methods]. Zeitschrift für Arbeits- und Organisations-psychologie, 37, 119–178.

Gilliland, S.W. (1993). The perceived fairness of selection systems: An organizationaljustice perspective. Academy of Management Review, 18, 694–734.

Gilliland, S.W. (1994). Effects of procedural and distributive justice on reactions toa selection system. Journal of Applied Psychology, 79, 691–701.

Gottfredson, M.R., & Hirschi, T. (1990). A general theory of crime. Stanford, CA:Stanford University Press.

Harland, L.K., Rauzi, T., & Biasotto, M.M. (1995). Perceived fairness of personalitytests and the impact of explanations for their use. Employee Responsibilities andRights Journal, 8, 183–192.

Jones, J.W., & Joy, D.S. (1991). Empirical investigation of job applicants’ reactionsto taking a preemployment honesty test. In J.W. Jones (Ed.), Preemploymenthonesty testing: Current research and future directions (pp. 121–131). Westport,CT: Quorum Books.

Kravitz, D.A., Stinson, V., & Chavez, T.L. (1996). Evaluations of tests used formaking selection and promotion decisions. International Journal of Selection andAssessment, 4, 24–34.

Lévy-Leboyer, C. (1994). Selection and assessment in Europe. In H.C. Triandis,M.D. Dunnette, & L.M. Hough (Eds.), Handbook of industrial and organizationalpsychology (Vol. 4, pp. 173–190). Palo Alto, CA: Consulting PsychologistsPress.

Macan, T.H., Avedon, M.J., Paese, M., & Smith, D.E. (1994). The effects ofapplicants’ reactions to cognitve ability tests and an assessment center. PersonnelPsychology, 47, 715–738.

Mael, F.A. (1991). A conceptual rationale for domains and attributes of biodataitems. Personnel Psychology, 44, 763–792.

Marcus, B. (2000). Kontraproduktives Verhalten im Betrieb: Eine individuumsbezogenePerspektive. [Counterproductive behavior in organizations: An individual differ-ences perspective]. Göttingen, Germany: Verlag für Angewandte Psychologie.

Marcus, B. (in press). An empirical examination of the construct validity of twoalternative self-control measures. Educational and Psychological Measurement.

Marcus, B., Höft, S., Riediger, M., & Schuler, H. (2000). What do integrity testsmeasure? Two competing views examined. Paper presented at the 108thAnnual Convention of the American Psychological Association, Washington,DC, August.

Marcus, B., Schuler, H., Quell, P., & Hümpfner, G. (2002). Measuring counter-productivity: Development and initial validation of a German self-report ques-tionnaire. International Journal of Selection and Assessment, 10, 18–35.

Ployhart, R.E., & Ryan, A.M. (1997). Toward an explanation of applicant reac-tions: An examination of organizational justice and attribution frameworks.Organizational Behavior and Human Decision Processes, 72, 308–335.



Rosse, J.G., Miller, J.L., & Stecher, M.D. (1994). A field study of job applicants’reactions to personality and cognitive ability testing. Journal of Applied Psychology,79, 987–992.

Ryan, A.M., Greguras, G.J., & Ployhart, R.E. (1996). Perceived job relatedness ofphysical ability testing for firefighters: Exploring variations in reactions. HumanPerformance, 9, 219–240.

Ryan, A.M., McFarland, L., Baron, H., & Page, R. (1999). An international lookat selection practices: Nation and culture as explanations for variability inpractice. Personnel Psychology, 52, 359–391.

Ryan, A.M., & Sackett, P.R. (1987). Pre-employment honesty testing: Fakability,reactions of test takers, and company image. Journal of Business and Psychology,2, 248–256.

Rynes, S.L. (1993). Who’s selecting whom? Effects of selection practices on appli-cant attitudes and behavior. In N. Schmitt & W.C. Borman (Eds.), Personnelselection in organizations (pp. 203–239). San Francisco, CA: Jossey-Bass.

Rynes, S.L., & Connerley, M.L. (1993). Applicant reactions to alternative selectionprocedures. Journal of Business and Psychology, 7, 261–277.

Sackett, P.R., Burris, L.R., & Callahan, C. (1989). Integrity testing for personnelselection: An update. Personnel Psychology, 42, 491–529.

Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of personnel selectionmethods in personnel psychology: Practical and theoretical implications of 85years of research findings. Psychological Bulletin, 124, 262–274.

Schuler, H. (1993). Social validity of selection situations: A concept and someempirical results. In H. Schuler, J.L. Farr, & M. Smith (Eds.), Personnel selectionand assessment: Individual and organizational perspectives (pp. 11–26). Hillsdale,NJ: Lawrence Erlbaum.

Schuler, H., Frier, D., & Kauffmann, M. (1993). Personalauswahl im europäischenVergleich [Personnel selection in European comparison]. Göttingen, Germany:Hogrefe/Verlag für Angewandte Psychologie.

Schuler, H., & Stehle, W. (1983). Neuere Entwicklungen des Assessment-Center-Ansatzes—beurteilt unter dem Aspekt der sozialen Validität [New developmentsin the assessment center approach, evaluated with the focus on social validity].Psychologie und Praxis. Zeitschrift für Arbeits- und Organisationspsychologie, 27,33–44.

Shackleton, V., & Newell, S. (1994). European management selection methods: A com-parison of five countries. International Journal of Selection and Assessment, 2, 91–102.

Smither, J.W., Reilly, R.R., Millsap, R.E., Pearlman, K., & Stoffey, R.W. (1993).Applicant reactions to selection procedures. Personnel Psychology, 46, 49–76.

Steiner, D.D., & Gilliland, S.W. (1996). Fairness reactions to personnel selectiontechniques in France and the United States. Journal of Applied Psychology, 81,134–141.

Thorsteinson, T.J., & Ryan, A.M. (1997). The effect of selection ratio on perceptionsof the fairness of a selection test battery. International Journal of Selection andAssessment, 5, 159–168.

Ullman, J.B., & Newcomb, M.D. (1998). Eager, reluctant, and nonresponders to amailed longitudinal survey: Attitudinal and substance use characteristics differ-entiate responses. Journal of Applied Social Psychology, 28, 357–375.

532 MARCUS


Whitney, D.J., Diaz, J., Mineghino, M.A.E., & Powers, K. (1999). Perceptions ofovert and personality-based integrity tests. International Journal of Selection andAssessment, 7, 35–45.

Wonderlic, Inc. (1996). Wonderlic Personal Test (WPT—German Version, Form Aand B). Libertyville, IL: Wonderlic Personnel Test, Inc.

Zajonc, R.B. (1968). Attitudinal effects of mere exposure. Journal of Personality andSocial Psychology, 9(2, pt. 2), 1–27.

Date post:	08-Nov-2014
Category:	Documents
Upload:	mihai-alexandru-kresnik
View:	23 times
Download:	2 times

Attitudes Towards Personnel Selection Methods

Documents