Elizabeth Knutson & Autumn Mabie
“Performance appraisal is a topic that is of both theoretical interest and practical importance. As such, it is one of the most researched topics in industrial and organizational psychology. Several measurement issues are central to performance appraisal including: (a) how performance has been measured, (b) how to improve performance appraisal ratings, (c) what is meant by performance, and (d) how the quality of ratings has been defined. Each of these are discussed along with the shortcomings of the extant literature in helping to come to grips with these important issues. Next, some of the new challenges facing performance appraisal, given its historical focus on single individuals being valuated, are highlighted. In particular, the appraisal problems inherent in the assessment of team performance and the complexities inherent in multisource feedback systems are covered. We conclude with a short discussion of the litigious issues that can arise as a result of poor performance management practices.”
Kline, T. J.B., and Sulsky, L. M. (2009)
Section 1: The Measurement of Work Performance o Rating Formats
o Rater Training
Section 2: The Meaning of Work Performance o Theoretical Parameters
o Bottom-Up & Top-Down Processing
o Dimensions of Performance
Section 3: Examining Rating Quality o Rater Error and Rating Validity
Section 4: Performance Appraisal and the Social Context o Purpose of a Social-Contextual Perspective
Section 5: Team Performance Appraisal o How to Measure Team Performance
o Setting a Team Purpose & Team Processes
o Using Team Measures of Performance
Section 6: Multisource Issues in Performance Appraisal
Section 7: Litigation Issues in Team Performance
The Measurement of Work Performance
Meaning of effective ratings has expanded over the years, while rating format and training research has stayed narrow in its definition of effectiveness. o Ex: “Effective” ratings might be a rating that the rater and ratee
see as being fair or motivates the employee.
Rating Formats o Behaviourally Based Scales
o Trait-Based Scales
o Absolute & Comparative Judgment
Rater Training o Rater Error Training (RET)
o Behavioural Observation Training (BOT)
o Frame-Of-Reference Training (FOR)
Behaviourally Based
o Require rater to rater to judge the employee’s frequency or quality of specific employee actions.
o Ex: Behavioural Observation Scale (BOS) and Behavioural-Anchored Rating Scale (BARS)
Trait Based
o Require the rater to judge the employee based on their traits such as leadership skills and creativity.
Research comparing the psychometric quality does not provide a conclusion on which is best because of the:
1. Lack of theories that provide information on presumed predictions about scale-based rating differences.
2. Differences found in a given study may be due to measurement issues, or other errors, making one scale stronger for that particular study.
Legally, which is best?
o Behavioral based scales are viewed as superior in terms of legal defensibility because they can be formatted from on-the-job behaviors and are visible and direct.
Absolute Judgment Rating Scale
o Scales such as the BARS and BOS require the rater to formulate an “absolute performance judgment.”
Comparative Judgment Rating Scale
o Relative comparisons among employees via forced distribution or ranking scales.
o Wagner and Goffin (1997)
o Concerns?
• Rater difficulty in justifying rating.
• Absolute judgments are necessary in some cases.
• Do not provided needed feedback to employees.
Three types of training approaches that have been used widely in past appraisal research:
o Rater Error Training (RET)
o Behavioral Observation Training (BOT)
o Frame-Of-Reference Training (FOR)
The goal is to educate raters on common rating errors to such as leniency error, range restriction error, and halo error.
Has be shown to be effective in reducing rating errors; however, RET may reduce rating validity.
o Text Ex: If all employees are genuinely good employees and deserve high ratings, teaching raters to avoid common errors could have a negative effect on rating quality.
Assumes a normal distribution of employee performance.
The goal is to maximize raters’ ability observe employee performance accurately by training raters to avoid processing errors at the point of observation.
Derived from cognitive processing models of appraisals with intentions to improve the overall quality of performance ratings.
Studies suggest that raters who are BOT trained are more accurate in ratings compared to others who have received little to no training.
Consistent with cognitive processing models and has been found to be the most accurate rater training method by Woehr and Huffcutt (1994); however, limitations to the study were noted due to laboratory methodology.
The goal is to ensure that raters develop correct impressions on employee performance on each performance dimension evaluated.
Calibrates raters on: o Relevance of employee behaviors for specific performance
dimensions;
o Levels of effectiveness of specific behaviors;
o And the rules of using individual judgments in evaluation summary for each dimension.
The Meaning of Work Performance
Theoretical Parameters
o Hisory
o Campbell, McCoy, Oppler, and Sager (1993)
o Tubre, Arthur, and Bennett (2006)
Bottom-Up & Top-Down Processing
Dimensions of Performance
Not having a clear definition of what is meant by performance results in the validity of ratings becoming questionable.
Relevant parameters of the theory of performance:
o Relevant performance dimensions;
o Performance expectations linked to alternative performance levels;
o How context constraints should be weighed when evaluating performance;
o Number of performance levels for each dimension;
o And the extent at which performance should be based on absolute vs. comparative judgments.
Campbell, McCloy, Oppler, and Sager (1993)
o Examined the framework of performance dimension and presented taxonomy of dimensions assumed to underlie general work.
Tubre, Arthur, and Bennett (2006)
o Developed an understanding of performance that is general enough to span across all jobs and situations by examining findings from Campbell et. al. (1993).
There are two methods to identifying performance dimensions:
Bottom-Up Process
o Based on critical incidents technique
Top-Down Process
o Based on competency modeling
Inductive approach designed for the purpose of developing performance theories which provides bottom-up processing.
In this approach, subject-matter experts (SMEs) record examples of performance at various levels of work effectiveness. These levels of performance are then sorted into dimensional categories.
Concerns? o Relies on statistical considerations
• Ex. Averaging means of SMEs
o Does not provide the rater with what dimension weights should be and important dimensions could be left out of assessment per SMEs opinions/recording abilities.
Utilizes a top-down and deductive approach
Developed from a review of job analysis and content validation through the following steps:
1. Core competencies are identified
• General across a majority of positions
• Reflect organization's strategic goals
2. Competencies are defined behaviorally and differentiating criteria depending on levels of expertise
• Although competencies remain the same for all positions, the behavioral expectations vary depending on the level of responsibility.
They only examine some aspects of the job:
o Two sets of behaviors in job performance:
1. Behaviors defined by the formal job description;
2. And behaviors defined by the organization’s social context.
Organization complexity and continual change affects the meaning of good performance.
Is it possible to adequately define performance dimensions?
Hybrid Approach?
o May be promising, but difficult when considering contextual performance and intricacies.
Examining Rating Quality
Primary indirect approaches adopted in performance appraisal research:
o Rater error measures
o Measures of rating accuracy
Popular in the 70s
o The idea was that ratings lacking errors (halo error, leniency error, and central tendency error) would be higher compared to those that contained one or more errors.
Later found that just because errors exist does not mean that it will suffer in psychometric integrity because these errors do not predict rating accuracy.
The idea is that high accuracy = high validity because accurate means valid (like validity assumes reliability).
Cronbach (1955) Cronbach accuracy score format.
o Examines the numerical distance between ratings produced by a rater, and a set of ratings provided by expert raters in which the closer the rater’s score is to the true score, then the more accurate the rating.
o Allows researchers to choose indices that are most relevant and breaks down the rating distance into four statistically independent components:
1. Elevation accuracy
2. Differential elevation accuracy
3. Stereotype accuracy
4. Differential accuracy
Sulsky and Balzer (1988)
o Quality of the true score has to be established;
o Quality of the true scores will only be as accurate as the technology used to identify them;
o And it is difficult to obtain accurate field settings.
Society for Industrial and Organizational Psychology (2003)
o Argues that validity is found in the inferences we draw from the ratings.
An example of accuracy measure issues found in the difficulty of obtaining field settings, page 165.
“Yet another issue that challenges the potential utility of accuracy measures is the distributional properties of the true scores. For instance, assume there are two sets of these true scores for a seven-point rating scale (with one ratee and three performance dimensions): Set A contains the ratings 4, 5, 4, whereas Set B contains the ratings 1, 3, 2. Also assume a study is designed whereby it is assumed that raters will inflate ratings under certain circumstances (e.g., the ratings will be used for reward purposes). If elevation accuracy is chosen as the accuracy criterion, it becomes evident that Set B allows for a greater range of elevation scores compared to Set A. Therefore, statistical power will be enhanced if Set B is used, and relatively attenuated if Set A is adopted… In sum, the failure to obtain an expected effect in a given study could be partly or wholly explainable based on the properties of the true scores used in the computation of rating accuracy.”
An example of validity and inferences as problems of measure accuracy on page 165.
“Consider a situation in which there are two employees, and they each truly deserve the following ratings for three performance dimensions: Ratee A: 7, 6, 7 and Ratee B: 6, 5, 5. Assume Rater A produces ratings of 5, 5, 6, and 6, 7, 7 for Ratees A and B, respectively, and Rater B produces ratings of 3, 2, 2, and 2, 1, 1, for the two respective ratees. Clearly, Rater A is more accurate according to our conceptualization of rating accuracy. However, what if the appraisal decision is to select and promote the top performing ratee? Rater B would formulate the correct inference that Ratee B is superior, whereas Rater A would not. Therefore, paradoxically, Rater A is seemingly more accurate, yet it is Rater B who formulated a correct inference concerning who is the best person to promote.”
Performance Appraisal and the Social Context
The goal for the rater’s standpoint may be from a social context concern rather than psychometric issues.
The rate may consider the set of ratings valid based on if the employees agree and accept that the process by the rating was fair and unbiased.
If the raters goal is something such as motivation, then the process and rating scale should be adjusted to produce results that motivate the employee.
Per Levy and Williams (2004), there are variables that regulate the effectiveness of a rating system such as:
o Organizational Culture
o Legal Climate
o Trust
o Rater Training
o Appraisal Documentation
From a social context standpoint, the performance appraisal system carries beyond the psychometric elements of rating systems previously researched (see behaviorally based and trait based formats).
Team Performance Appraisal
Two key questions:
o How do we measure team performance?
• Remove in individual component of performance appraisal to determine the goals and outcomes of the team.
o How do we use the measures of team performance?
• Measure the team based on the expectations set forth then add the individual component back into the appraisal for individual ratings used for feedback and promotions.
Challenges:
o Cannot measure the team performance as one does individual performance.
o Teams are unique—single-project, long-standing, and repeated performance teams all have different components that make them successful or unsuccessful.
Process:
o Identify the role of the team and analyze the tasks of each team.
o Identify the purpose of the team by
• Finding the outcomes of the team in terms of quality and quantity;
• Gaining input from stakeholders to align team’s goals with the organization’s goals; from experienced team members to determine what has or has not worked in the past; and rom new team members for fresh ideas.
Idiosyncratic Assessments
o Researchers/Practitioners need to create:
• Easy-to-use inventories for those who are giving input to complete;
• Determine which inventory tasks are most useful and reliable;
• And use their position to gather inventory information, collate it, synthesize it, and report their findings.
Research Findings: Common Themes
o Team processes
• Measures abilities in: team decision making, internal communication, gives/receives feedback, leadership, attitudes towards others and tasks.
o Outcomes
• Quality and quantity of goods or services provided.
At the individual level, performance often determines “salary increments, promotions or terminations, and training needs” (pg 166).
Teams do not receive these incentives as a group; therefore, the team measures must be broken down to individual components.
Research has found that tying the team’s performance to individual appraisals raises team moral and task commitment.
Hoffman and Rogelberg (1998): 7 Team Incentive Systems
o Profit sharing, goal-based incentive systems, discretionary bonus systems, skill incentive systems, member skill incentive systems, member goal incentive systems and member merit incentive systems.
Optimal ratio of team-based to individual-based compensations needs to be established to allow for the transfer of team performance appraisals.
Identify variables that predict team performance to understand its relation to pay systems.
Develop and allow for positive feedback where team members are confident in both giving feedback and openly receiving feedback.
Multisource Issues in Performance Appraisal
Also call 360-degree feedback, multisource appraisal systems gather information from a variety of sources which are commonly self-ratings, peer ratings, and subordinate ratings and sometimes customer ratings.
Lead to more reliable ratings, better performance information, and greater performance improvements.
Flint (1999) found that if the employees’ viewed the rating as unfair, then their performance did not improve; however, Farth, Cannella, and Bedeian (1991) found that if the feedback was used for developmental purposes only, then the employees were more likely to accept peer feedback.
Self-ratings:
o Are the most common, but lack correlations with peer and subordinate ratings because (a) information differences about what is to be performed and how, (b) different schemas associated with employee performance, and (c) psychological defenses by the employee about their performance (Campbell and Lee, 1988).
Peer ratings:
o Are equally unreliable but are better at predicting performance than self-ratings and are more congruent with subordinate ratings; however, peers only have specific encounters, producing limited information.
Subordinate ratings:
o Hold higher value when used for developmental purposes, but have more skepticism from because of various rater errors that occur; however, subordinate ratings offer the best insight on supervisory behaviors.
Overall:
o Self-ratings are more distant from feedback of subordinate ratings than peer ratings;
o All ratings are perceived more positively when used for developmental purposes;
o And social context issues play a key role in both the rating and perceptions of the feedback.
Good:
o Call attention to behaviors missed in traditional systems, assess consistency between groups, increase rater—ratee communication, increase employee involvement and are less expensive than developing traditional systems.
Evil:
o Difficult to identify what performance elements to be rated, hard to combine the information for employee use, can be time costly-ultimately being less productive for the organization, maintaining confidentiality is a source of concern, employee sensitivity to elements being evaluated, and is a long term developmental process to ensure the system measurements are giving accurate and predictive results.
Litigation Issues in Team Performance
Aside from developmental purposes, performance appraisals are also used to determine personnel decisions. Should an
employee disagree, litigation issues arise.
Appeal: o Inexpensive-ratee argues the accuracy of the measurement.
Formal Grievance o Inexpensive-cost of employee time and administrative personnel
Arbitrations o Expensive-a third party is used to mediate between the disputing
parties; cost of a lawyer, arbitrator, expert witness, and accompanying expenses.
Lawsuit o Most expensive-employee could sue the organization for the
decision, ultimately costing the organization immediate administrative costs, but reputational costs as well.
In an Alberta Labour Relations Board ruling (Asbell, 2004), the Board agreed with the union’s contention that the performance appraisal was not an accurate reflection of the plaintiff’s work…Arbitrator Bowman (Armstrong, Carpenter, Kline, & Megennis, 1996) in a case in Manitoba stated,
There is no question that this article introduces what is commonly referred to as a “threshold clause” for promotion or hiring. This means that where there is more than one candidate for the position the candidates are not compared with each other, for the purpose of determining which may be the most skilled, or most qualified, but rather, that each is compared to an objective standard indicating the requirements for successfully carrying out the position. Of the persons who demonstrate capacity to meet the position’s requirements, with perhaps some limited training and generally some familiarization time, the most senior will be given the position over other qualified applicants. In other words, amongst those who can do the job, the most senior is entitled to receive it. (p. 136)
Documentation, Documentation, Documentation!
o Having documentation in the employee files will severely decrease legal problems in an organization.
Using Performance Appraisals Properly
o Providing training for raters and understanding the feedback form appraisals, both on the employer’s and employees’ side, will limit litigation issues for a company.
o Have an objective instead of subjective system in place.
o Make sure systems are up to date and, when a new system is implemented, make sure enough time have passed to validate the system on performance measures.