Annual performance appraisals tend to receive a lot of negative press. Critics grumble that they aren’t timely, don’t provide sufficiently detailed feedback to improve performance, and can ultimately reduce employee motivation. Some of the most damaging complaints come from employees who feel that their supervisor is unfair or biased. For example, supervisors may only recall employee performance on the most recent project, show favoritism toward some employees over others, exhibit some form of discrimination, or only focus on one aspect of the employee’s job.

One of the most common challenges associated with performance appraisals in large organizations is a lack of consistency in ratings given by different supervisors. When supervisors use different standards to rate employees, they may end up giving different ratings for the same level of performance. For example, consider Barbara, who works for Marc. Marc assesses Barbara’s overall performance on a 1-5 scale as a 4, because Marc believes her work “exceeded expectations” on the rating scale. But had Barbara been working for Michelle, Michelle might have felt that Barbara merely “met expectations” and rated her as a 3. On the other hand, had she been evaluated by Bob, Barbara would have been rated as “exceptional,” or a 5 on the rating scale, which could have made her eligible for a higher bonus, a promotion, and training not available to employees with ratings of 3 or 4.

When different supervisors apply different standards to assess employee performance, how can organizations really know which employees are performing exceptionally well and which are performing poorly? Employee dissatisfaction can result from such inconsistencies and can negatively affect an organization’s ability to develop human capital and to reward and retain high-performing employees.


Higher-level managers are in a position to see the ratings that lower-level supervisors assign to employees and can identify inconsistencies. Thus, they can balance out the ratings of excessively lenient or tough supervisors so that consistent standards are applied and employees are assessed similarly regardless of who is actually rating them.

This is often called a calibration process. After lower-level supervisors rate their subordinates, the ratings are further evaluated by a calibration committee composed of higher-level managers. The purpose of a calibration committee is to review employee ratings and adjust as needed to “calibrate” the ratings. Surveys indicate that many organizations use calibration committees in their annual performance appraisal process to overcome supervisor rating biases.


We collaborated with a multinational organization to study its performance appraisal system and calibration process over a three-year period. The performance appraisal process started with supervisors determining initial employee ratings. Similar categories of employees were grouped into the same bonus pool, and a separate calibration committee reviewed the ratings for each group of employees. The calibration committees were generally composed of managers who were one level higher than the supervisors that assigned the initial ratings.

The committees came to a shared understanding of what constituted appropriate performance for each rating level and then reviewed the ratings for all employees in the pool. If the committees felt that specific ratings were too high or too low, they adjusted the initial ratings, resulting in a final, calibrated rating for each employee in the pool. Supervisors were then free to share the final ratings with employees.

In the published study (“The Role of Calibration Committees in Subjective Performance Evaluation Systems,” Management Science, April 2019), we explored and discovered many interesting realities of the calibration process. For example, the committees adjusted about one of every four ratings, reflecting instances in which they felt the ratings were too high or too low. This also means they accepted the supervisor’s assigned rating in 75% of the cases, recognizing that supervisors had the most direct knowledge of the employees’ performance. When the committee did adjust ratings, it adjusted the ratings downward about 80% of the time, suggesting an overall tendency of supervisors to give lenient ratings.

The committees made adjustments in a manner consistent with removing bias from the initial ratings and promoting greater consistency in ratings across supervisors. Specifically, supervisors who tended to issue higher-than-average ratings were more likely to have the ratings they assigned adjusted downward, while supervisors who tended to issue lower-than-average ratings were more likely to have the ratings they assigned adjusted upward. Overall, we found that employee evaluations were more consistent after the calibration committees reviewed and adjusted the employee performance ratings.

We examined whether supervisors subsequently gave a higher rating to an employee whose previous rating had been adjusted upward and a lower rating to an employee whose previous rating had been adjusted downward. This was generally the case. Supervisors did tend to respond, suggesting they learned from the calibration process over time.


We also surveyed the organization’s employees about the performance appraisal process. In general, higher-performing employees tended to think the system was fair, didn’t think favoritism was an issue, and were satisfied with the system. On the other hand, lower-performing employees thought the system was unfair, believed favoritism was present, and were dissatisfied with the system.

The organization interpreted this finding as evidence that the system was working as intended because a main objective of the system was to retain and reward the highest-performing employees. This pattern provided evidence that the highest-performing workers were indeed the ones reaping the system’s benefits and were most likely to be satisfied, stay with the organization, and continue to perform well.

Despite these benefits, the calibration process wasn’t perfect. Because the committees focused more on adjusting ratings downward that they felt were too high, the final ratings tended to be more compressed around average performance levels, so employees were less differentiated from one another. This can make it more challenging to identify high-performing employees for promotion and reward.

Overall, our study found the use of calibration committees could effectively remove bias from supervisors’ initial subjective appraisals and make employee ratings more consistent across different supervisors. The result was that employees, especially the higher-performing ones, were generally satisfied with the appraisal system and believed it was fair, which can positively influence future motivation and performance. In addition, the calibration process appeared to be a better way to reward knowledge workers dispersed throughout the world, whose performance is notoriously difficult to objectively measure.

About the Authors