TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC
CRITERIA PHENOMENON
by
DAVID BRENT BIRKELBACH
(Under the Direction of Charles Lance)
ABSTRACT
Dynamic Criteria refers to the systematic instability of criterion measures and
predictive validities examined across longitudinal time periods. To date, much of the
research used to support the dynamic criteria phenomenon has been fraught with
methodological flaws (Barrett et al., 1985), limited by the utilization of single-task
performance as the principle criteria of interest, and has failed to establish boundary
conditions for qualitatively distinct predictor constructs. For the current study, meta-
analytic techniques were used to examine the criteria-related validates of two common
selection instruments, namely cognitive ability assessments and personality inventories, in
relation to time-bound performance appraisals. In addition, performance trajectories were
investigated through the use of weighted least squares multiple regression analyses to
establish the systematic nature of change in predictive-validity coefficient trends over time.
Results indicated that the criterion-related validities specific to the General Mental Ability,
Emotional Stability, and Openness to Experience predictors do, in fact, change over time
when measured against either general and/or specific criterion types. Performance
trajectories for each of the aforementioned predictors offer support for the simplex-like
patterns traditionally subscribed to changes in predictive validities over time (Henry &
Hulin, 1987). Findings are discussed in the context of Murphy’s (1989) dynamic model of
job performance.
INDEX WORDS: Dynamic Criteria, Meta-Analysis, Weighted Least Squares Multiple
Regression, Cognitive Ability, Personality
TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC
CRITERIA PHENOMENON
by
DAVID BRENT BIRKELBACH
B.A., Southwestern University, 2001
M.S., Saint Mary’s University, 2007
A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial
Fulfillment of the Requirements for the Degree
DOCTOR OF PHILOSOPHY
ATHENS, GEORGIA
2013
© 2013
David Brent Birkelbach
All Rights Reserved
TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC
CRITERIA PHENOMENON
by
DAVID BRENT BIRKELBACH
Major Professor: Charles Lance Committee: Nathan Carter Robert Mahan Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia May 2013
iv
TABLE OF CONTENTS
Page
LIST OF TABLES ......................................................................................................................... vi
LIST OF FIGURES ..................................................................................................................... viii
CHAPTER
1 INTRODUCTION .........................................................................................................1
Historical Overview .................................................................................................2
Definitions of Dynamic Criteria ..............................................................................4
Murphy’s (1989) Dynamic Model of Performance .................................................8
Criticisms and Limitations of the Previous Dynamic Criteria Literature ..............11
2 CURRENT STUDY.....................................................................................................16
Purpose ...................................................................................................................17
Cognitive Ability Measures ...................................................................................17
Cognitive Ability and Dynamic Criteria ................................................................20
Personality Tests ....................................................................................................21
Personality and Dynamic Criteria ..........................................................................24
3 METHODS ..................................................................................................................30
Literature Search per Selection Device ..................................................................30
Criteria for Inclusion ..............................................................................................32
Coding Procedures .................................................................................................34
Data Analysis .........................................................................................................35
v
Moderator Detection ..............................................................................................36
Moderator Estimation ............................................................................................37
4 RESULTS ....................................................................................................................45
Overall Validity Coefficients .................................................................................45
Overall Continuous Moderator Analysis ...............................................................49
Validity Coefficients by Criterion Type ................................................................59
Continuous Moderator Analysis by Criterion Type ...............................................64
Tests for Availability Bias .....................................................................................77
5 DISSCUSION ..............................................................................................................78
GMA-Performance Relationships over Time ........................................................79
FFM-Performance Relationships over Time .........................................................80
Implications and Future Research ..........................................................................84
Limitations .............................................................................................................88
Conclusion .............................................................................................................92
REFERENCES ..............................................................................................................................94
REFERENCES FOR GMA META-ANALYSES .......................................................................109
REFERENCES FOR FFM META-ANALYSES ........................................................................112
APPENDICES
A ORIGINGS OF DYNAMIC CRITERIA ...................................................................115
B ACKERMAN’S MODEL OF SKILL AQUISTITION .............................................121
C CHANGING TASKS AND CHANGING SUBJECTS MODELS ...........................125
D PERFORMANCE TRAJECTORIES ........................................................................127
vi
LIST OF TABLES
Page
Table 1: GMA Studies Used in the Meta-Analyses .......................................................................39
Table 2: Big Five Personality Studies Used in the Meta-Analyses ...............................................42
Table 3: Meta-Analysis Results for the Criterion-Related Validities between GMA, the
Big Five Personality Dimensions, and Performance .........................................................48
Table 4: Results for Continuous Moderators of Predictor-Performance Relationships ................52
Table 5: Results for Continuous Moderators of Predictor-Performance Relationships
Without Outliers .................................................................................................................53
Table 6: Meta-Analysis Results for the Criterion-Related Validities between Predictors
and Criteria Type ...............................................................................................................63
Table 7: Results for Continuous Moderators of Predictor-Training Performance
Relationships ......................................................................................................................67
Table 8: Results for Continuous Moderators of Predictor-Training Performance
Relationships Without Outliers ..........................................................................................68
Table 9: Results for Continuous Moderators of Predictor-Job Performance Relationships ..........69
Table 10: Results for Continuous Moderators of Predictor-Job Performance Relationships
Without Outliers .................................................................................................................70
Table 11: Results from File-Drawer Test for Availability Bias ....................................................77
Table 12: Intercorrelations of Semester Grades in Electrical Engineering,
Humphreys (1960) .........................................................................................................119
vii
Table 13: Intercorrelations of Pattern Comprehension over Repeated Trials,
Fleishman and Hemple (1955) ......................................................................................119
viii
LIST OF FIGURES
Page
Figure 1: GMA-General Performance Validity over Time ............................................................54
Figure 2: GMA-General Performance Validity over Time without Outliers ................................55
Figure 3: Emotional Stability-General Performance Validity over Time ......................................56
Figure 4: Openness-General Performance Validity over Time .....................................................57
Figure 5: Openness-General Performance Validity over Time without Outliers ..........................58
Figure 6: GMA-Training Performance Validity over Time ..........................................................71
Figure 7: GMA-Training Performance Validity over Time without Outliers ...............................72
Figure 8: GMA-Job Performance Validity over Time ...................................................................73
Figure 9: Emotional Stability-Training Performance Validity ......................................................74
Figure 10: Openness-Job Performance Validity over Time ..........................................................75
Figure 11: Openness-Job Performance Validity over Time without Outliers ...............................76
Figure 12: Ackerman’s Model of Skill Acquisition ....................................................................122
1
CHAPTER 1
INTRODUCTION
The relationship between an individual’s personal qualities and their ability to
perform in a given position has been the cornerstone of industrial psychology since the
advent of the Army Alpha and Beta tests of mental ability during World War I. The goal of
selecting and promoting employees who could succeed in the workplace has led to more
than a century of validity studies designed to identify the individual differences that best
result in increased efficiency, effectiveness, and productivity. The importance of predictive
validity in personnel selection is due, in part, to the direct proportional relationship
between predictive validity coefficients and the practical utility of the selection method
(Schmidt & Hunter, 1998). In other words, economic gains largely rest on the accuracy of a
selection measure to predict job performance.
One issue specific to the current study that can potentially affect the estimates of
predictive validity involves the stability of the criteria over time. Performance criteria has
been treated as a static concept throughout the history of validity studies in industrial-
organizational (I-O) psychology as evidenced by the practice of collecting criterion data at a
single time-point, the use of aggregate scores or composites, the overwhelming use of
cross-sectional data, and the practice of validating instruments with initial performance
(Henry & Hulin, 1987). However, a growing body of research has provided support for the
notion that criteria are not static and that job performance varies systematically when
examined longitudinally (Austin & Villanova, 1992).
2
Also known as dynamic criteria, the concept that performance does not remain
temporally stable has profound consequences for the conduct of validity studies and
subsequent utility of selection devices. For instance, if criteria do change over time,
assumptions regarding the longitudinal stability of predictive estimates for selection into
schools, advanced training program, employment, and promotion may be founded on a
flawed pretence, thus limiting the opportunity to identify true, sustainable talent. Since the
majority of selection and placement programs utilize criteria gathered at a single point in
time, or validate with the use of cross-sectional data, validity estimates may be greatly
distorted and only reveal part of a greater picture (Henry & Hulin, 1989). The current
study contributes to the issue of dynamic criteria by examining the criterion-related
validities of two common selection devices (i.e., cognitive ability measures and personality
inventories) in relation to job performance over time through the use of meta-analytic
techniques. Steps will be taken to determine the nature of the performance trends in terms
of directional change, magnitude, and linearity.
Historical Overview
As evidence of unstable criteria and decaying predictive-validities began to emerge
in the industrial psychology literature (e.g. Adams, 1953; Fleishman & Hemple, 1954, 1955;
Rothe, 1946, 1947, 1951; Tiffin, 1942; Worbois, 1951), Ghiselli (1956) called into question
the field’s stance of job performance as a stable construct and advocated for research that
explored, what he termed “dynamic criteria.” According to Ghiselli (1956), the study and
use of static criteria did not account for the instability of criteria over time, but simply
relegated criteria to the mere summation of data collected at a single time point.
Furthermore, Ghiselli (1956) provided two operational methods to identify the dynamic
3
nature of performance. First, he suggested that intercorrelations among criterion
measures at different time points could be used to ascertain an overall pattern of
performance. Ideally, correlations examined over a long time period, such as a span of
years, could inform the extent that the criterion systematically varies with time. Second,
Ghiselli (1956) suggested that changes in predictive validity could be accounted for by
examining the correlations between scores on selection tests and production measures at
varying time points.
Ghiselli and Haire (1960) and Bass (1962) were the first to directly implement
Ghiselli’s (1956) suggestions into empirical field studies. For example, Ghiselli and Haire
(1960) examined a sample of newly hired taxicab drivers over their first 18 weeks of
employment. Intercorrelations among criteria generally declined suggesting that the rank
order of performance had changed with time. Validity coefficients between a test battery
and the criterion also generally declined over the 18 week period, although this was not the
case for all predictors. Bass (1962) extended the length of time to 48-months in an
examination of sales personnel. Consistent with Ghiselli and Haire’s (1960) findings,
intercorrelations of the criteria across time periods began to decline with the greatest
reduction occurring between the first and last ratings. In this case, all predictive validity
coefficients declined over the 48-month period.
While Ghiselli (1956), Ghiselli and Haire (1960) and Bass (1962) sought to
specifically examine dynamic criteria in the workplace, researchers exploring the temporal
reliability of performance measures (e.g. Rambo, Chomiak, & Price, 1983; Rambo, Chomiak,
& Roundtree, 1987; Rothe, 1946a, 1946b, 1947, 1951, 1970, 1978; Rothe & Nye, 1958,
1959, 1961; Tiffin, 1942) and those uncovering simplex patterns in ability-performance
4
coefficients (e.g. Bass, 1962; Deadrick & Madigan, 1990; Dennis, 1954, 1956; Dunham,
1974, Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963; Ghiselli
& Haire, 1960; Hanges, Schneider, & Niles, 1990; Henry & Hulin 1987; Humphreys, 1960,
1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959) also indirectly contributed to
the growing dynamic performance criteria literature by providing evidence of the
phenomenon (See Appendix A for full summary of both temporal reliability and simplex
pattern studies).
Definitions of Dynamic Criteria
After the initial conceptualizations of dynamic performance, a series of critical
reviews based on the extant literature at the time provoked debates concerning definitions
of dynamic performance, the ubiquity of unstable criteria, proper methods to identify
changes in performance, alternative explanations, and underlying causes. Barrett,
Caldwell, Alexander (1985) were the first to question what they coined “the received
doctrine of dynamic performance.” They consolidated the earlier literature in an attempt
to clarify and distinguish the various operationalizations of dynamic performance, as well
as, provide a critical reanalysis of the evidence for each. Referring to previous sources,
they identified three definitions of dynamic criteria: (a) Changes in group average
performance over time (Casico, 1982; Ghiselli, 1956; Hanges et al., 1990; McCormick &
Ilgen, 1980), (b) changes in the rank-ordering of scores on the criterion over time (Bass,
1962; Blum & Naylor,1968; Deadrick & Madigan, 1990; Ghiselli,1956; Ghiselli & Haire,
1960; Hanges, et al., 1990; Korman, 1971; MacKinney, 1967, McCormack & Ilgen,1980), and
(c) changes in predictive validity over time (Austin, Humphreys, & Hulin, 1989; Blum &
Naylor, 1968; Cascio, 1982; Ghiselli, 1956; Guion, 1965; Korman, 1971; MacKinney, 1967;
5
Prien, 1966; Smith, 1976; Steele-Johnson, Osburn, & Pieper, 2000). The following is a
synopsis of the key arguments made by researchers pertaining to the merits of each of the
aforementioned definitions of dynamic performance, the literature and methods used to
support each definition, and the conclusions drawn about the legitimacy of the dynamic
criteria phenomenon.
Changes in mean performance over time. In their earlier works, Ghiselli and Haire
(1960) and McCormick and Ilgen (1980) proposed that dynamic criteria be defined as
changes in average group performance over time. This definition of dynamic performance
is usually measured by grouping a sample into categories, such as age, taking the mean
performance of each group, and comparing the means longitudinally. Most studies that
utilize this approach are concerned with the concept of job tenure (often mislabeled job
experience), and how differences in tenure relate to job performance (e.g. Avolio,
Waldman, & McDaniel, 1990; Gordon & Fitzgibbons, 1982; Gordon & Johnson, 1982;
Hoffman, Jacobs, & Guerra, 1992; Jacobs, Hofmann, & Kriska, 1990; McDaniel, Schmidt, &
Hunter, 1988; McEvoy & Cascio, 1989; Medoff & Abraham, 1980, 1981; Schmidt, Hunter, &
Outerbridge, 1986; Schmidt, Hunter, Outerbridge, & Goff, 1988). This definition has been
criticized as being conceptually and operationally weak (Austin et al., 1989; Barrett et al.,
1985) because average performance may not reflect the individual performances
comprising them. Group-level performance could even change while individuals’
performance remains constant if the performance level of those leaving the organization
were different than the performance level of those entering (Boudreau & Berger, 1985).
Austin et al. (1989) continued the criticism by stating that while mean performance could
be used to capture systematic changes over repeated practice, the measurement of
http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib23#idbib23�
6
relationships over time, ideally identified through simplex matrices, should be the principle
focus when studying dynamic criteria.
The change in the rank-ordering of scores on the criterion over time directly
addresses the issue of stability (Hanges, et al., 1990). Changes in rank-order would imply,
as an extreme example, that high performers may eventually become low performers, and
vise versa (Ployhart & Hakel, 1998). This second definition is often measured through the
examination of correlations between criterion scores at multiple points in time (Barrett et
al., 1985; Deadrick & Madigan, 1990; Hanges et al., 1990). Such studies have been framed
as considering the test–retest reliability or the stability of performance ratings. If
performance is truly dynamic the criterion correlations are proposed to decrease as time
points increase essentially forming a simplex-like pattern.
Hulin, Henry, and Noon (1990) used meta-analytic techniques to investigate the
stability of performance measures across time by examining Time Period by Time Period
matrices of performance intercorrelations and found that all 23 validity sequences
examined in their study decreased over time. Abundant empirical evidence has verified the
definition of changing rank order of individual performance scores (Deadrick & Madigan,
1990; Hanges et al., 1990; Henry & Hulin, 1987; Hofmann, Jacobs, & Baratta, 1993;
Hofmann et al., 1992) principally through examinations of simplex matrices.
Changes in predictive validity over time. Central to the current study is the definition
that dynamic performance occurs when predictive validities change over time. If
predictive relationships are temporally variant, continued validity assessment may be
required. Research using this definition has focused on examinations of the criterion-
related validity of predictors such as intelligence and psychomotor ability for predicting
http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib36#idbib36�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�
7
task performance over multiple time periods. While stability coefficients tend to decrease
over time across studies (Deadrick & Madigan, 1990; Hanges et al., 1990; Henry & Hulin,
1987; Hofmann et al., 1992, 1993), there is some debate as to the nature of changes in
predictive validities. Some argue that dynamic criteria universally leads to a degradation in
validity over time (Austin et al., 1989; Henry & Hulin, 1987, 1989; Hulin et al., 1990; Keil &
Cortina, 2001). Whereas others suggest that the nature of change in predictive validities is
determined by the predictor in question or external factors that may influence
performance over time as evidenced in some studies where predictive validities either
remained stable or increased with time (Ackerman, 1987, 1988, 1989, 1992; Barrett et al.,
1985; Barrett & Alexander, 1989; Deadrick & Madigan, 1990; Hanges et al., 1990; Murphy,
1989).
Hulin et al. (1990) conducted a meta-analysis to determine if time was a source of
systematic variance in test validities by utilizing literature on temporal ability-performance
relationships that spanned organizational, educational, and developmental research. The
authors found that time accounted for the variance of predictive validities beyond variance
attributable to statistical artifacts. In general, predictive validities decreased monotonically
over time. Of all the validity sequences analyzed, 44 out of 54 showed negative slopes for
the regressions of predictive validity onto time.
Kiel and Cortina (2001) also expanded on Hulin et al. (1990) through the addition of
potential moderators to examine changes in predictive validities over time. Furthermore,
they tested the nature of the relationships using polynomial equations. Their findings
provide strong evidence that validities do deteriorate over time as observed across
predictors (i.e., cognitive ability, perceptual speed ability, and psychomotor ability), criteria
http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�
8
(i.e. consistent and inconsistent task performance), and time periods (i.e. short-term and
long-term performance). Patterns were also found that suggested ability-performance
relationships began to decay in the early stages of task performance for both consistent and
inconsistent tasks.
Of particular interest are Keil and Cortina’s (2001) findings concerning curvilinear
effects. Both quadratic and cubic effects were found for all three abilities under all
moderating conditions. Keil and Cortina (2001) attributed the curvilinear relationships to
a “Eureka effect” where individuals with high levels of ability maintain or steadily increase
their level of performance over time then come up with an insight that results in a sudden
jump in performance. The Eureka effect can be captured by a bifurcation in the ability-
performance variables. Keil and Cortina (2001) offered two alternatives for how the
bifurcation could be utilized in research. Each bifurcation may cause a “different predictor
to wane in importance such that it can be used to predict while performance remains on a
given plateau” (Keil & Cortina, 2001, p. 689), or knowing when a bifurcation is likely to
occur may inform researchers of the length of time they have before a predictor diminishes
in utility.
Murphy’s (1989) Dynamic Model of Performance
While evidence was mounting in support of the dynamic criteria phenomenon,
researchers began to speculate the theoretical causes for changes in performance over
time. The theoretical impetus for the current study is based on Murphy’s (1989) dynamic
model of performance. In response to the growing acceptance that cognitive ability-
performance relationships remained invariant over time (Schmidt et al., 1986), Murphy
(1989) offered a model of job performance that focused on two classes of predictors:
9
Abilities and dispositional variables. Murphy’s (1989) definition of abilities included both
higher and lower order abilities such as general cognitive ability and perceptual speed.
Dispositional variables included individual differences in personality, interests, values, and
motivation. In the dynamic model of job performance, rank order changes in job
performance over time and declining predictive validities are the result of fluctuations in
activities requiring varying levels of either abilities or dispositional variables.
Building on Ackerman’s three stages of skill acquisition as it applied to the
workplace (See Appendix B for review of Ackerman’s model), Murphy (1989) posited the
dynamic model of job performance as a progression between two distinct stages: The
transition stage and the maintenance stage. During the transition stage, employees are
faced with some manner of change. They may be new to a job, recently promoted, or an
organizational intervention has fundamentally changed the job duties required of the
employee. In such cases of transition, the employee must heavily rely on the use of
cognitive ability and sound judgment to lean the new duties, goals, and strategies for
execution. In the maintenance stage, major requirements for the job are well-learned and
do not heavily weigh on cognitive ability to be performed. At this point, dispositional
variables, such as personality and motivation, have a greater influence on job performance
than cognitive ability.
Deadrick and Madigan (1990) provided empirical evidence for Murphy’s (1989)
dynamic performance model in their attempt to distinguish between the predictive
influences of employee experience. Concerned that the standard definitions of dynamic
criteria did not adequately distinguish between actual changes in job performance with
changes in the performance evaluation context, Deadrick and Madigan (1990) defined
10
criterion changes as attributes of either individual differences (i.e., performance
consistency), the organizational context (i.e., evaluation consistency), or changes in
measurement procedure (i.e., measurement reliability). To test the performance
consistency definition, Deadrick and Madigan (1990) collected periodic measures of both
objective (i.e. weekly output) and subjective (i.e. supervisory ratings of production
quantity) performance for sewing machine workers over a period of six months.
Distinctions were made between both experienced and inexperienced employees.
Predictors also included cognitive and psychomotor ability. The results for performance
consistency strongly supported the simplex pattern for stability measures despite previous
experience, but failed to do so for the tests of predictive validity where cognitive ability
actually began to increase after training and psychomotor ability remained relatively
stable. The conflicting results were interpreted as evidence of Muphy’s (1989) dynamic
model of job performance, as dispositional variables such as motivation were proposed to
account for the changing patterns in predictive validities.
Hanges et al. (1990) applied the interactionist perspective of psychology to
Murphy’s (1989) dynamic performance model as a means to account for aberrations to the
simplex pattern in stability measures. According to interactionist psychology, behavior is
not merely determined by either the person or situational variables but is a function of the
interaction between person and situation. Furthermore, a simplex pattern is expected
when the stability of performance over perceptually different situations is explored,
however, when the situations are similar, behavior should remain relatively stable over
time.
11
Murphy’s (1989) dynamic model of performance is clear regarding predictive
validities during the transition phase, but the maintenance stage can be confounded by
both stable and dynamic dispositional variables. Hanges et al. (1990) maintained that the
interactionist perspective can help clarify the effects situational variables have on
performance during the maintenance stage. For example, as situations become more stable
over time, such as those found in the maintenance phase, an individual’s performance in
that situation would become stable as well. Hanges et al. (1990) empirically evaluated the
interactionist perspective by examining student evaluations of university professors over
time (i.e., the person), the particular courses taught by the professors (i.e., the situation),
and the professors who taught the same course over time (i.e., the person-situation
interaction). Results showed that a simplex pattern was observed in the situation and
person analyses, but not in the person-situation interaction analysis, thus supporting the
utility of the interactionist perspective in predicting the conditions where a simplex pattern
may or may not appear.
Criticisms and Limitations of the Previous Dynamic Criteria Literature
While Barrett et al. (1985) did concede that, in some cases, predictive validities may
deteriorate over long time spans, the authors speculated that dynamic criteria are quite
plausibly the result of changes in the abilities and skills required for the job (i.e. changing
subjects model; Adams, 1957) or changes in the job itself (i.e. the changing-task model;
Woodrow, 1938a; 1938b; Fleishman, 1960; 1972). The changing-subjects model is based
on the hypothesis that abilities change over time even as the tasks remain relatively stable.
The changing-task model assumes that the structure of the task is the variable component
that undergoes change during skill acquisition (Alvares & Hulin, 1972, 1973; See Appendix
12
C for review of changing-task and changing-person models of performance). In the few
instances where they did find significant change over time, Barrett et al. (1985) felt that
dynamic criteria were more the result of methodological artifacts than systematic
variation. The authors pointed out the studies used to support the dynamic criteria
phenomenon were so rife with methodological flaws, that any fluctuations in predictor-
criteria validities were most likely the result of a number of study design related artifacts,
including: (a) temporal unreliability of the criterion, (b) contamination from unmatched
samples (i.e., criterion scores were based on individuals with differing levels of experience
and tenure), and (c) the lack of a standardized measure of performance. In light of these
findings, Barrett et al. (1985) claimed that the error variance caused by the unreliability of
the criterion measure probably accounted for a majority of fluctuations in validity
coefficients. Unfortunately, many of the studies that followed Barrett et al.’s (1985)
literature review continued to suffer from the design flaws noted by the authors.
In terms of limitations, studies that proposed the ubiquity of dynamic predictive
validities across all forms of ability did not distinguished between classes of individual
differences and, thus, failed to establish boundary conditions for examining the predictor-
criteria relationships over time. For example, in their meta-analysis to determine the
systematic variability of predictive validities as a function of time, Hulin et al. (1990)
gathered data from areas that spanned the research regarding the prediction of
performance (e.g. experimental studies, studies of academic performance, and growth and
development research) but did not provide an inclusion criterion to classify the type of
predictors used. Consequently, an entire host of individual differences ranging from
psychomotor skills to aerial orientation were lumped together. While the results of Hulin
13
et al.’s (1990) meta-analysis provided evidence that the majority of predictor-criterion
relationships systematically follow a decreasing temporal trend, valuable information may
have been lost through the act of indiscriminately clustering predictors.
Given the implications that changes in predictive validities over time have on human
resource practices such as selection, promotion, and interventions, the lack of research
dedicated to dynamic criteria under specific boundary conditions is surprising. If
predictive validities associated with construct-based selection measures change over time,
the specific predictability trends should be evaluated to improve decision-making
procedures and the utility of the selection device under consideration. By grouping
predictors under construct-based selection measures, both initial performance and
subsequent performance curves can be used to inform longitudinal policy decisions. Henry
and Hulin (1987) further articulated the point by stating that “the failures of researchers to
develop models that address long-term predictions and build into predictive equations
measures that will reflect expected changes in the abilities of the selected employees or
students is a source of serious concern” (Henry & Hulin, 1987, p. 461). Currently, very little
empirical support has been provided to determine the temporal validities of common
selection devices and their underlying constructs.
Another limitation found in the dynamic performance literature concerns the
operational definitions of the criteria. Many of the studies used to support the dynamic
criteria phenomenon utilized experimental designs conducted in laboratories where
analysis centered on a task performance criterion. For example, all criteria used in Kiel and
Cortina’s (2000) study were characterized as either task performance or GPA, with only
one criterion indicative of job performance ratings (i.e. Deadrick & Madigan, 1990).
14
Substantively speaking, job performance differ from task performance in that job
performance is multidimensional and made up of many tasks, while task performance is
typically represented by a single facet of the job. Researchers have questioned the
generalizability of using tasks as a criterion, especially those measured in short time frames
(i.e. a matter of minutes), noting that they shared little resemblance to job performance
criteria measured in the environment of an applied setting (Barrett et al., 1989; Farrell &
McDaniel, 2000).
As a consequence, experiments used to examine skill acquisition in task
performance over time consist mostly of student samples or of individuals taken out of the
job context. Such studies have a tendency to isolate the individual from real and complex
high stake scenarios where adapting, understanding, and successfully performing the
elements that comprise a job is imperative. In cases where actual workers were included
in an applied context, inquiries into dynamic criteria failed to design predictive studies
with an expressed point of entry into a new job, a training period, a new position, or after
an organizational intervention. Such studies, instead, capitalized on samples made up of
individuals with differing experience levels, and, possibly, in separate employee stages (i.e.
transitional or maintenance stages). To ensure shared equivalent histories, samples
should consist of a cohort that is in the same or comparable level of entry, training, or
promotion.
Finally, while increasing empirical evidence has been used to verify the changes in
predictive validities over time through the examinations of simplex matrices (Ackerman,
1987, 1988, 1992; Deadrick & Madigan, 1990; Henry & Hulin, 1987; Hofman et al., 1992,
1993; Hulin, 1990) the use of simplex patterns to support dynamic criteria suffers from
15
many limitations. For instance, the simplex pattern provides little information about
intraindividual change (i.e. changes within an individual) over time and does not shed light
on the nature of the pattern changes. A growing body of research has begun to transition
from the use of autoregressive simplex patterns which primarily allow for modeling the
effects of past performance scores on future performance scores, to investigations of
intraindividual change in latent trajectories (Deadrick, Bennett, & Russell, 1997; Hofmann
et al., 1992, 1993; Ployhart & Hakel, 1998; Stewart & Nandkeolyar, 2006; Sturman &
Trevor, 2001; Thoreson, Bradley, Bliese, & Thoreson, 2004; See Appendix D for a full
review of latent performance trajectories in examining dynamic criteria).
While not fully enveloped into the mainstream literature, the notion that job
performance varies over time for a given employee is becoming increasingly accepted in
the field of I-O psychology. Furthermore, the relative ubiquity of the simplex pattern in
almost all studies concerning ability-performance relationships over time, and the
examinations of latent performance trajectories has contributed to a firm empirical
foundation for support of the dynamic criteria phenomena. In light of these findings, it
seems necessary to reevaluate the practice of using a single indicator of performance in
validation studies, and address the aforementioned criticisms and limitations in an effort to
move toward a more accurate understanding of how selection devices fair over time. By
not examining dynamic criteria in relation to even the most common of selection methods,
I-O researchers may limit key conceptual understanding of the evolutionary sources of
variance in performance, and ostensibly deny increases in economic gains from proper
method selection.
16
CHAPTER 2
CURRENT STUDY
The current study draws heavily on Murphy’s (1989) dynamic model of
performance to address the limitations concerning boundary conditions, study design,
participants utilized, and operational definitions of criteria. Consistent with Murphy’s
distinction between ability and dispositional variables, two sets of analyses were
conducted. The first involved selection devices used to measure cognitive ability as
representative of the ability variables identified by Murphy. The second set of analyses
consisted of Big Five personality inventory dimensions as representative of dispositional
variables. In order to capture the progression from the transitional stage to the
maintenance stage, criterion-related predictive validity studies containing an initial
starting point of entry into a new job, a training period, a new position, or after an
organizational intervention were identified. As Murphy’s model is specifically associated
with conditions within a business environment, participants and the criteria of interests
were represented by actual workers in the field appraised through job performance
measures or participants in a real job training scenario. The use of stable work cohorts
that are in the same or comparable level of entry, training, or promotion were used to
address Barrett et al.’s (1985) criticism of contamination from unmatched samples and
ensured equivalent sample histories. Furthermore the use of meta-analytic techniques
were used to address Barrett et al.’s (1985) claim that dynamic criteria is the product of
temporal unreliability, range restriction, and insufficient power.
17
Purpose
The purpose of this study was to separately determine the criterion-related
validities of common selection devices, namely cognitive ability measures and personality
inventories, in relation to job performance over time through the use of meta-analytic
techniques. When predictive validities of the common selection devices were, indeed,
dynamic, further steps were taken to determine the nature of the performance trends in
terms of directional changes in magnitude and linearity. The following section is an
overview of the two selection devices (i.e. cognitive ability tests and personality
inventories) chosen for the current study, their relation to the job performance criterion as
determined by previous research, and an examination of how time has been explored as a
source of systematic variance in test validities for each predictor classification.
Cognitive Ability Measures
The one consistent finding concerning the dynamic nature of the predictor –criteria
relationship is that time-lagged correlations between ability measures and performance
have a tendency to deteriorate over increasing intervals (Henry & Hulin, 1987; Hulin et al.,
1990; Keil & Cortina, 2001). The majority of ability measures in the dynamic performance
literature are generally characterized by assessments of cognitive ability (Alvares & Hulin,
1972; Bass, 1962; Ghiselli & Haire, 1960, Flieshman & Hemple, 1954, 1955; Fleishman &
Rich, 1963; Humphreys, 1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959),
psychomotor ability (Ghiselli & Haire, 1960, Fleishman, 1960; Flieshman & Hemple, 1954,
1955; Fleishman & Rich, 1963; Hinrichs, 1970; Parker & Fleishman, 1959) and sensory
perception (Ackerman, 1988, 1990; Ackerman & Kanfer, 1993; Ackerman, Kanfer, & Goff,
1995; Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963;
18
Hinrichs, 1970; Parker & Fleishman, 1959; Powers, 1982) in relation to experimental task-
performance and educational assessments over time.
Due to the inherent differences between controlled task-based experiments and the
workplace, it is difficult to fully generalize task-proficiency as a criterion to job
performance. Unfortunately, there are surprisingly few studies that explore the criterion-
related validity of general mental ability (GMA or g) over subsequent measures of job
performance. The purpose of this portion of the study is to contribute to the dynamic
criteria literature by exploring the dynamic nature of individual GMA-job performance
validities over time by addressing the following questions: Are GMA-performance
validities, indeed, dynamic? If so, what is the nature and direction of the validity patterns
when plotted across time, and, finally, what implications do systematically changing
validity patterns have on validity generalization and utility issues? The following is an
overview of the current state of the literature regarding the use of GMA as a selection tool
and the predictive validities found in terms of job performance.
Interest in the relationship between cognitive ability and job performance has
predominately been approached in I-O psychology through the use of Spearmanian
frameworks (Lang, Kerstring, Hulsheger, & Lang, 2010). In 1904, Charles Spearman
proposed a two-factor theory of abilities that included general cognitive ability (g) and one
or more specific abilities (s). The conceptualization of GMA was used to explain the
positive manifold present across a set of ability tests. Specific abilities refer to unique test
properties that correspond to the variance in ability tests not attributed to a latent GMA
construct or error. When applied to certain factor analytic techniques, cognitive ability
tests reveal a multiple factor solution, but a second-order factor analyses based on the
19
correlation matrices of the first-order dimensions do commonly result in a single factor
(Carroll, 1993). As a result, GMA is characterized as a higher-order factor that accounts for
the variance in narrower first-order content ability factors.
Research findings have clearly established GMA as an important predictor of job
performance (Campbell, Glasser, & Oswald, 1996; Ree & Earles, 1992; Schmidt & Hunter,
1998). From a theoretical perspective, GMA is linked to general models of job performance
by directly influencing both declarative and procedural knowledge. According to
Campbell’s (1990) model of job performance, declarative and procedural knowledge are
determinants of job performance, thus, GMA influences the level of job performance
indirectly (e.g. Ackerman, 1987; Schmidt & Hunter, 1993, 1998; Schmidt et al., 1986). As
such, the acquisition of knowledge and the necessary skills to perform a job during training
and maintenance of those knowledge and skills throughout an employee’s tenure is highly
influenced by GMA (Jensen, 1998; Ree, Earles, & Carretta, 1998). Abundant empirical
evidence demonstrates that GMA predicts training and job performance across numerous
jobs and job families (Carretta, Perry, & Ree, 1996; Chan, 1996; Crawley, Pinder, & Herriot,
1990; Hunter & Hunter, 1984; Ree & Earles, 1992; Roth & Campion, 1992, Salgado, 1995;
Schmidt & Hunter, 1998; Vineburg & Taylor, 1972). For example, Hunter and Hunter
(1984) conducted a broad-based meta-analysis to assess the validity of GMA for both
training and job performance criteria. Their analysis included several hundred jobs across
numerous job families, as well as reanalysis of data from previous studies. The authors
estimated a true validity of GMA as .54 for training criteria and .45 for job performance
with the predictive validity of GMA increasing as a function of job complexity.
20
Cognitive Ability and Dynamic Criteria
The substantial body of research conducted to examine the predictive validity of
GMA and job performance (Hunter & Hunter, 1984; Jensen, 1986; Ree & Earles, 1992; Ree
et al., 1994; Schmidt, 2002; Schmidt & Hunter, 1998) treated performance as a stable
criterion, and therefore collected data during a single period of time, used a cross-sectional
sample, or validated the measures through concurrent design, thus resulting in a lack of
evidence to support the notion of unstable predictive validities in GMA-job performance
relationships. In light of the limited resources in applied psychology, a number of studies
have used cognitive based entrance exams such as the Scholastic Aptitude Test (SAT, e.g.,
Butler & McCauley, 1987; Mael & Hirsch, 1993) and the Law School Admission Tests (LSAT,
e.g., Hathaway, 1984; Powers, 1982) as predictors of Grade Point Average (GPA) over
subsequent semesters or years. Other researchers have relied on previous GPA or aptitude
composites as predictors of future GPA (Humphreys, 1960, 1968; Humphreys & Tabet,
1973; Lin & Humphreys, 1977; Powers, 1982; Winterbottom, Pitcher, & Miller, 1963).
Overall, results showed a general deterioration of predictive validities over time, but this
finding in not consistent across all educational studies (e.g. Powers, 1982; Winterbottom, et
al., 1963). Barrett and Alexander (1989) attributed the mixed results in educational
studies and the “fleeting nature of the prediction of grades” to incomparable metrics for the
criteria. They argued that GPAs from different schools, across different courses, and
curricula did not comprise the same measurement scale.
Much of the dynamic criteria literature produced from experimental psychology
utilized task performance as the central criteria (Ackerman, 1986, 1988, 1992; Ackerman &
Kanfer, 1993; Ackerman et al., 1995; Ackerman & Woltz, 1994; Fleishman & Hempel, 1954,
21
1955, Fleishman & Rich, 1963; Keil & Cortina, 2001; Parker & Fleishman, 1959).
Recognizing the limitations associated with the use of task performance as a criterion, a
number of studies introduced criteria that directly represented the elements comprising
job performance (e.g. Farrell & McDaniel, 2001; Kolz, McFarland, & Silverman, 1998;
Schmidt et al.,1988). Unfortunately, these studies suffered from the use of cross-sectional
data, which, in the context of dynamic performance, provides no opportunity for examining
within-person changes in individual differences (Hulin et al., 1990) and relies on two
critical assumptions: That the mean level of the characteristic does not vary with time (i.e.
cohort equivalence), and that characteristics of the hiring process remain stable over time
(Sturman, 2007). If the two assumptions are not met, specification error may distort the
results. Of the handful of studies that do examine the changes in GMA-job performance
using a longitudinal design (i.e. Bass, 1962; Deadrick & Madigan, 1990; Deadrick et al.,
1997; Ghiselli & Haire, 1960) mixed results have been found in regard to the directionality
of the predictive validities over time.
Personality Tests
Inquiries into the phenomena of systematically decaying predictor-criteria
relationships primarily focus on individual differences in abilities as predictors of
performance (Austin et al., 1989; Henry & Hulin, 1987; Hulin et al., 1990), but little effort
has been made to determine if the predictive validities of dispositional variables, such as
personality, behave in a similar fashion. Henry and Hulin (1987) claimed that the principle
of decreasing predictive validities can be found in nearly every longitudinal study involving
any type of individual differences, including personality. Unfortunately, longitudinal
examinations of personality-performance relationships are rare making it difficult to verify
22
Henry and Hulin’s (1987) claim. The paucity of information regarding the influence of time
on personality-performance relationships has left a vacuum in the dynamic criteria
literature that requires further exploration. The purpose of this portion of the study is to
fill in the gaps concerning the dynamic nature of individual personality trait-performance
validities over time by satisfying the following questions: Are personality-performance
validities, in fact, dynamic? If so, what is the nature and direction of the validity patterns
when plotted across time, and, finally, what implications do systematically changing
validity patterns have on validity generalization and utility issues? The following is an
overview of the current state of the literature regarding the use of personality inventories
as a selection device and the predictive validities found in terms of job performance.
Prior to the 1990s, personality testing was generally considered an inferior method
for selecting employees. This view was qualified by low validities in personality-job
performance relationships (Hogan, 2005; Schmitt, Gooding, Noe, & Kirsh, 1984) and the
lack of standardized frameworks to support and organize the dizzying array of available
personality measures (Barrick & Mount, 1991; Hurtz & Donovan, 2000; Ones, Mount,
Barrick, & Hunter, 1994). Renewed interest in personality inventories began as mounting
evidence of a five-dimension factor solution emerged across qualitatively different studies
(Cattell, 1946; Digman & Inouye, 1986; Fiske, 1949; Goldberg, 1981, 1990; John, 1990;
McCrae & Costa, 1985, 1987; Peabody & Goldberg, 1989; Saucier & Goldberg, 1996; Tupes
& Christal, 1961). The prominence of a five-factor model of personality, later dubbed the
“Big Five” by Goldberg (1981), resulted in the creation of multiple personality inventories
ranging from Trait Descriptive Adjectives (TDA, Goldberg, 1990, 1992), questionnaires
(NEO Personality Inventory Revised, NEO PI R, Costa & McCrea, 1992; NEO FFI, Costa &
23
McCrae, 1989, 1992), and short phrase assessments (Big Five Inventory, BFI, John &
Srivastava, 1999). The prototypical Big Five personality factors are commonly indentified
as Extraversion, Agreeableness, Conscientiousness, Emotional Stability (also referred in
reverse pole as Neuroticism), and Openness to Experience. Each broad personality trait is
comprised of several narrow facets varying in number and substance depending on the
measure in question.
Conceptually, Extraversion (Factor I) implies an energetic disposition toward the
social and material world, and refers to the extent to which a person is talkative, lively,
assertive, excitable, and emotionally positive. Agreeableness (Factor II) contrasts a
prosocial and communal orientation with antagonism, and refers to the extent to which a
person is good-natured, helpful, trusting, and cooperative. Conscientiousness (Factor III)
describes socially prescribed impulse control that facilitates task and goal directed
behavior, such as thinking before acting, delaying gratification, and following rules.
Conscientiousness, also, refers to the extent to which a person is consistent, organized,
careful, self-disciplined, and responsible. Neuroticism (Factor IV) contrasts emotional
stability and even-temperedness with negative emotionality, such as feelings of
nervousness and anxiety. Finally, Openness to Experience (Factor V) describes the
breadth, depth, originality, and complexity of an individual’s mental and experiential life.
People high in Openness are commonly described as imaginative, independent, and having
a preference for variety (John & Srivastava, 1999).
The application of the five-factor model as a legitimate selection tool coincided with
notable meta-analytic findings from Barrick and Mount (1991) and Tett, Jackson, and
Rothstein (1991). Both studies identified Conscientiousness as one of the few viable Big
24
Five personality traits for predicting job performance. Conscientiousness has been shown
to provide consistent positive associations with job performance across a multitude of
occupations and job situations (Barrick & Mount, 1991; Barrick, Mount, Judge, 2001; Hurtz
& Donovan, 2000; Salgado, 1997, Tett et al., 1991; Vichur, Schippman, Switzer, & Roth,
1998). Furthermore, Conscientiousness tests are recognized as adding an 18 percent
increase in incremental predictive validity beyond cognitive ability in predicting job
performance (Schmidt & Hunter, 1998).
Aside from Conscientiousness, the rest of the superordinate Big Five personality
dimensions have shown little generalizable predictive relationships with performance
across jobs, and in many cases validities approach zero (Barrick & Mount, 1991; Barrick et
al., 2001; Hurtz & Donovan, 2000; Salgado, 1997). However, there are specific occupations
and situations where personality traits, such as Extroversion and Openness, manifest as
meaningful predictors. Extroversion, for instance, does seem to have particular salience for
sales effectiveness (Barrick, Stewart, & Piotrowski, 2002; Vinchur et al., 1998). Likewise,
Openness has been linked to the ability to adapt to changing work roles and demands
(Stewart & Nandkeolyar, 2006). Judge, Thoresen, Pucik, and Welbourne (1999) reported a
statistically significant positive relationship between Openness and a manager’s ability to
cope with various organizational changes, including mergers, acquisitions, and downsizing.
Similarily, LePine, Colquitt, and Erez (2000) found that Openness helped participants adapt
to changing task demands in a computerized decision-making simulation.
Personality and Dynamic Criteria
While very little empirical data have been gathered regarding the temporal nature
of the personality-performance relationship, there are two competing perspectives that can
25
be used to hypothesize the pattern of directionality and linearity of the projected validity
coefficients. The first perspective involves the precedent of a universal simplex pattern set
by previous inquiries into dynamic criteria. Humphreys (1985) argued that the simplex
pattern of correlations can be found in any data pertaining to individual differences and
performance over time. If personality dimensions do follow the assumptions of a simplex
pattern, predictive validities would degrade over time in a manner consistent with results
reported for time-lagged ability-performance estimates. In their examination of GMA, the
Big Five personality dimensions, and career success, Judge, Higgins, Thorsen, and Barrick
(1999) reported that each Big Five trait produced decreasing validities when related to
career success across five time intervals. Burrus (2006) also found evidence of decreasing
predictive validities for Conscientiousness over 16 task trials given to students in a
laboratory study designed to examine dynamic performance. The study did suffer from key
limitations: Simulated tasks did not reflect the complexity and multi-dimensionality of job
performance, sample size was not large enough for adequate power, and the trials only
took place over the course of a week.
The second perspective is based on claims that predictive validities for personality
dimensions actually increase over time as opposed to following a simplex-like pattern.
Such alternative views originate from Helmich, Sawin, and Carsurd’s (1986) examination of
the strength of the personality-performance relationship across time within a relatively
consistent job context. According to Helmreich et al. (1986), cognitive ability was an
important determinant of early performance but eventually declined. The non-cognitive
measures (i.e. measures of achievement motivation and interpersonal orientation), on the
other hand, increased in predictive validity from a relatively low starting point. Helmreich
26
et al. (1986) attributed the switch in predictive magnitude from cognitive ability to
personality to, what they described as, the “honeymoon effect.” The honeymoon effect is
characterized as the time period early in a job when everything is new and exciting. During
this period the employee utilizes cognitive ability to absorb the organization’s culture,
values, work systems, and the necessary knowledge and skills to perform the job. Once the
novelty begins to wane some employees become increasingly disenchanted. At this point,
personality becomes more salient as a predictor of job performance.
Murphy (1989) expanded on Helmreich’s et al. (1986) conceptualization of the
honeymoon effect in his model of dynamic performance. Progression from the transition to
the maintenance stage, in essence, represents an employee’s changing reliance on GMA to
dispositional variables. In this case, the rank-order of performance measure scores are
argued to change with time, but movement from the transition to maintenance stage
results in increasing personality-performance validity coefficients as opposed to simplex-
like decreases.
While empirical examinations of Murphy’s (1989) model of dynamic performance
solely focused on the relationships between GMA and job performance over time (Deadrick
& Madigan, 1990; Deadrick et al., 1997), Thoreson et al. (2004) was the first to draw on
Murphy’s (1989) distinction between transition and maintenance stages to examine Big
Five personality traits in the context of individual changes in latent performance
trajectories over time. Thoresen et al. (2004) reported that Openness and Agreeableness
were positively associated with mean performance and performance change over time
during the transitional stage but not in the maintenance stage. Conscientiousness and
Extraversion were positively associated with mean performance and performance change
27
over time in the maintenance stage but not in the transition stage. Unfortunately, Thoresen
et al. (2004) did not longitudinally examine a single sample as they gradually transitioned
from one phase to the other, but, instead, compared two samples, each characterized by the
transition or maintenance scenario, over four three month periods. A single sample may
have revealed a more complete picture of the latent trajectories of the Big Five dimensions.
Thoresen et al.’s (2004) results did, however, provide empirical evidence that
certain predictive validities did, in fact, increase over time, at least within the designated
employment stage. Of particular interest is the finding that personality traits play separate
roles based on the situational characteristics found in either the transition or maintenance
stages. For example, Openness has been associated with an individual’s responsiveness to
changes in job demands (Stewart & Nandkeolyar, 2006), and may play a more significant
role in the transitional stage where an individual is faced with novel challenges. On the
other hand, Conscientiousness is argued to positively influence job performance for both
stages due to its generalizability across occupation and job situations performance (Barrick
& Mount, 1999; Barrick et al., 2001; Hurtz & Donovan, 2000; Mount & Barrick, 1995;
Salgado, 1997), but may produce positively sloping performance growth trajectories as
employees transition into the maintenance stage as suggested by Helmrich et al. (1986)
and Murphy (1989). Zyphur, Bradley, Landis, and Thoreson (2008) also found
Conscientiousness to increase in predictive validity in their study to examine the extent to
which cognitive ability and Conscientiousness predict initial GPA and changes in predictive
28
performance over the course of college student careers. Consistent with Murphy’s
(1989) theoretical assertions, cognitive ability did predict initial performance, but
beyond the 3rd semester, Conscientiousness became a better predictor of student
performance over cognitive ability.
The influence of time on the relationship between common selection devices
and job performance remains elusive in the I-O literature. A majority of validation
studies utilize concurrent designs over predictive methods and very rarely compare
estimates at separate time points. Studies that have longitudinally examined
construct specific predictive validities have produced mixed results as to the
directionality of coefficient patterns, or have been plagued by study artifacts. The
goal of the current study was to clarify the aforementioned issues by meta-
analytically examining the criteria-related validates of two common selection
instruments (i.e. GMA and FFM measures) in relation to performance over time. To
date, no study has either provided a quantitative review of the degree to which time
attributes to observed variances for distinct constructs within a real work context or
the systematic patterns of change derived thereof.
Scientific progress in understanding the nature of the dynamic criteria
phenomena is optimally based on the evaluation and extension of theoretical and
empirical findings from within-person data. Cross-sectional designs can rely on
untenable assumptions and are fundamentally limited for understanding individual-
level change processes, however time-bound predictive validity studies, especially
those containing multiple time points through a longitudinal design provide an
optimal basis for describing change patterns. Unfortunately, longitudinal
29
examinations of changes in predictive validities are relatively scarce due to the time,
resources, and effort required. In the case of the current study, time-bound
criterion-related validity studies, consisting of either single or multiple time points,
were integrated to create generalized validity estimates across a progressive
timeframe though meta-analytic techniques.
The meta-analysis approach also allowed for the examination of moderators
that are difficult to examine in primary studies alone. In the case of the current
study, the principle moderator of interest was time. Time points from each primary
study were progressively plotted to establish the directional trends in changing
validities through the use of weighted least squares (WLS) regression. Finally,
polynomial terms were introduced into the regression equation in an effort to
model linear and curvilinear trends in the relationships between time and the
indicated predictor-performance coefficients.
30
CHAPTER 3
METHODS
In the current study, a separate meta-analysis was conducted for each of the
designated selection devices (i.e. GMA and Big Five personality inventories) in an effort to
establish construct based boundary distinctions that would, otherwise confound valuable
insights into the nature of dynamic performance if combined in the same study. The meta-
analyses were used to examine the time-bound relationships between the indicated
selection instrument as a predictor and a performance criterion specific to work-related
samples. Further analysis consisted of examinations of the data categorized by criterion
type (i.e. training and job performance).
Literature Search per Selection Device
Cognitive Ability Measures: An extensive literature search was conducted to identify
studies with explicitly time-bound predictive validity coefficients for GMA measures and
performance. First, meta-analyses conducted by Hunter and Hunter (1984), Levine,
Spector, Menon, Narayanan, and Cannon-Bowers (1996), Schmitt et al. (1984), and Schmidt
and Hunter (1998) were used to locate previously identified criterion-related validity
studies that utilized some form of job or training performance as a criterion and GMA as a
predictor. Second, published studies were identified using a computer-based literature
search in PsycInfo, Business Complete Resources, and ProQuest Thesis and Dissertations
using keywords such as cognitive ability, general mental ability, intelligence, g, Armed
Forces Qualification Test, Wonderlic, and the Generalized Abilities Test Battery with
31
performance, job performance, training performance, selection, promotion, and validation.
Search items included peer-reviewed articles, popular-press articles, books, edited book
chapters, and unpublished dissertations. Third, studies were identified using a manual
search in the following journals: Journal of Applied Psychology, Personnel Psychology,
Academy of Management Journal, Human Performance, Journal of Management, and
Organizational Behavior and Human Decision Processes. The literature search yielded an
initial total of 4,245 articles, reports and dissertations.
Personality Inventories: A comprehensive literature search was conducted to
identify studies with explicitly time-bound predictive validity coefficients between the
dates of January 1992 and September 2012. According to Hurtz and Donovon (2000),
previous meta-analyses that examined the role of Big Five personality dimensions in
relations to job performance (i.e., Barrick & Mount, 1991; Salgado, 1997; Tett et al., 1991)
mapped personality predictors not explicitly designed to measure Big Five dimensions
onto actual Big Five dimensions which can potentially threatened construct validity and
lead to inaccurate conclusions. The year 1992 marks the beginning development of
empirically validated Big Five personality inventories (e.g. NEO Personality Inventory,
Costa & McCrae, 1992; Goldberg’s Big Five markers, Goldberg, 1992) for application in a
business context. First, the meta-analysis conducted by Hurtz & Donovon (2000) was used
as a starting point to identify relevant criterion-related validity studies. The authors
limited their article search to studies with established Five-Factor Model inventories as
predictors and performance as the criterion. Second, published studies were located using
a computer-based literature search in PsycInfo (1992 – 2012), Business Complete Resource
(1992 – 2012), and ProQuest Thesis and Dissertations using keywords such as personality,
32
five factor model, big five, conscientiousness, extraversion, emotional stability, neuroticism,
openness, and agreeableness, with performance, job performance, training performance,
selection, promotion, and validation. Search items included peer-reviewed articles, popular-
press articles, books, edited book chapters, and unpublished dissertations. Third, studies
were identified using a manual search in the following journals for previously designated
period of time: Journal of Applied Psychology, Personnel Psychology, Academy of
Management Journal, Human Performance, Journal of Management, and Organizational
Behavior and Human Decision Processes. The literature search yielded an initial total of
1,519 articles, reports and dissertations.
Criteria for Inclusion
For a study to be included in the present meta-analyses, seven criteria had to be
met:
1. The study had to use actual workers as participants. Educational studies were
generally excluded in cases where experiments were conducted on students over the
duration of a college semester. However, students in educational settings that could be
argued as vocational training (i.e. medical school, trade school, specialty training) were
considered.
2. The study had to include one of the two selection devices (i.e., Cognitive Ability
Measures, and Personality Inventories) as a predictor of interest. Due to the relatively
stable nature of both general intelligence and personality it was not necessary for the
researchers to gather either cognitive ability or personality inventory data during the
hiring or promotion process. Data pertaining to cognitive ability or personality could be
collected at any point and used as a predictor.
33
A number of stipulations for personality inventories were also required: The
personality measures used for each study had to either have been explicitly designed a
priori to measure one or all dimensions of the Five Factor Model (i.e., Extraversion,
Conscientiousness, Emotional Stability/Neuroticism, Agreeable, and Openness to
Experience) or sizeable empirical evidence had to show that a measure could be
significantly reduced to load on the Big Five dimensions. Five established a priori
measures were identified in the studies collected for the present analysis: the NEO
Personality Inventory Revised (NEO-PI-R) and the five factor inventory versions (NEO-FFI;
Costa & McCrae, 1992), the International Personality Item Pool (IPIP; Golderberg, 1992),
the Personal Characteristics Inventory (PCI; Barrick & Mount 1993), the Big Five Inventory
(BFI; John & Srivastava, 1999), and Saucier’s Mini Marker (Saucier, 1994).
3. The study had to include an explicit measure of job or training performance as a
criterion.
4. In terms of criterion-related validity, the study had to utilize a predictive design
where an expressed point of entry into a new job, a training period, a new position, or after
an organizational intervention. GMA and Big Five studies that utilized a concurrent design
were included if the time period between entry and the criterion measurement was clearly
designated and the sample shared equivalent histories.
5. Primary study samples had to consist of a cohort with equivalent histories (i.e.
the same or comparable level of entry, training, or promotion).
6. The study had to report the sample size for each correlation presented.
7. Finally, the time between a point of entry into a new job, training, a new position,
or after an organizational intervention and the criterion measurement had to be firmly
34
established. The timeframe had to be greater than a week. In cases where all other
inclusion criteria were met, attempts to contact the principle authors concerning the
timeframe of the study were made.
Coding Procedure
For each individual study, the correlates between the selection instrument and the
performance criterion were coded along with sample sizes and scale reliabilities when
available. Longitudinal studies that included coefficients for more than one time point
were divided into the number of time points present (e.g. entry point to year 1, entry point
to year 2, entry point to year 3, etc.) and treated as independent data points for the
analysis. Information for the time moderator was also coded, and converted to the smallest
increment of measurement present in the analyses (i.e. days). All personality variables
were based on broad-based dimensions (i.e. Agreeableness, Extroversion, etc.) as opposed
to the narrow dimensions that comprise the broad-base identifiers. If a primary study
solely relied on a narrow dimension in relation to performance, or in cases where more
than one narrow dimension of the same principle construct was used, the correlations
were averaged and subsumed under the coinciding Big Five dimension.
Of the initial total, 30 articles were considered for inclusion in the GMA analysis and
30 for the Big Five analyses. The 30 GMA articles yielded a total of 49 validity coefficients.
For the Big Five analyses, a total of three articles were excluded due to substandard FFM
measures (i.e. produced low correlations when compared to established Big Five
inventories). The 27 remaining studies yielded a total of 37 validity coefficients for
Extraversion, 36 for Agreeableness, 42 for Conscientiousness, 39 for Neuroticism, and 37
for Openness.
35
An outlier analysis was conducted using the Sample-Adjusted Meta-Analysis
Deviancy technique (SAMD; Arthur, Bennet, Huffcutt, 2001). The SAMD identifies potential
outliers by comparing the value of each study coefficient to the mean sample weighted
coefficient computed without the coefficient in the analysis. The difference is then adjusted
for the sample size in the study. Scree plots of the SAMD values and subjective
comparisons were used to isolate individual study coefficients. Given the nature of the
current study, outliers could possibly represent systematic changes in coefficients given
the amount of elapsed time and sample size. No outliers were identified for any of the GMA
or FFM dimensions save two coefficients for Extroversion (e.g. Lievens, Ones, & Dilchert,
2009: Time 1, Ployhart, Lim, & Chan, 2001: AC performance) and one for Agreeableness
(Rothstien, Paunonen, Rush, & King, 1994). With these cases removed, both Extroversion
and Agreeableness analyses were left with a total of 35 predictive validities. Detailed lists
of the articles used for GMA and the FFM dimensions are provided in Tables 1 and 2.
Data Analysis
Meta-analyses procedures based on a correlation model were conducted for each
selection device using Arthur et al.’s (2001) SAS PROC MEANS. Arthur et al.’s method
assumes a random effects model which is in line with the study’s assumption that
population effect sizes are variable. Sampling error was calculated in a manner consistent
with Hunter and Schmidt’s (2004) methods where sample-weighted average correlations,
sample-weighted variances, and sampling error variances were computed to identify and
remove the variance attributed to sampling error from the total variance across
coefficients. Furthermore, the current study utilizes meta-analytic techniques developed
by Raju, Burke, Normand, and Langlois (1991) to correct for attenuating artifacts (i.e.
36
measurement error and range restriction) as a departure from traditional validity
generalization correlation methods originated by Schmidt and Hunter (1977). Raju et al.’s
(1991) procedure allows for estimating mean population-level correlations and
population-level variances when attenuating artifact information is only sporadically
presented in the primary studies. Traditional correlational methods rely on population
values to correct for measurement error and range restriction. The general lack of
available population values undermines researchers’ efforts to produce accurate results
under the assumptions of the traditional correlational method. Previous researchers have
attempted to overcome this limitation through the use of hypothetical artifact distributions
or sample-based artifact distributions. Unfortunately, hypothetical artifact distributions
can limit the accuracy of mean and variance estimates of validities if they do not closely
match true artifact distributions and sample-based artifact distribution are subject to
sampling error within the attenuating artifacts which may bias results (Raju et al., 1991).
Raju et al.’s (1991) procedure provides a more accurate method for estimating the mean
and variance of the population correlation by disregarding the assumption that population
correlations (ρ), predictor reliability, and criterion reliability are not correlated across
populations. Raju et al.’s (1991) procedure allows for observed correlations and study
reliabilities to be obliquely measured across studies in the analysis. Arthur et al.’s (2001)
SAS PROC MEANS was adapted to meet Raju et al.’s (1991) assumptions.
Moderator Detection
Multiple tests for homogeneity were conducted to detect possible moderators: The
75% Rule (Hunter & Schmidt, 2004), the Q-statistic (Hunter & Schmidt, 2004), credibility
and confidence intervals (Hunter & Schmidt, 2004; Whitener, 1990). The 75% Rule states
37
that if 75% or more of the variance is accounted for by the ratio of artifactual variance to
corrected observed variance, then the rest of the variance is considered a function of
uncorrectable artifacts. If less than 75% of the variance is accounted for then a moderator
may be present. The Q-statistic tests the hypothesis that the observed variance is the
product of sampling error and attenuating artifacts. A significant chi-square value
indicates the presence of a potential moderator in the research domain. Both the 75% Rule
and the Q-statistic have the lowest occurrences of Type I error and the highest power rates
for meta-analysis consisting of 60 to 100 studies when compared to other moderator
detecting techniques (Sagie & Koslowsky, 1993). The credibility intervals are computed
around the corrected population variance and provide the variability of individual
correlations in the population, as well as, lower bound estimates. The size of the credibility
intervals and the inclusion of zero are indicative of a moderating variable. Confidence
intervals provide an estimate of the variability of the corrected mean correlation due to
sampling error and are computed around the corrected population correlation using the
standard error of the mean correlation. The confidence interval provides a range of values
for the mean effect sizes and indicates whether the corrected effects differ from zero.
Moderator Estimation