Goniometric Reliability in a Clinical Setting
Elbow and Knee Measurements
JULES M. ROTHSTEIN, PETER J. MILLER, and RICHARD F. ROETTGER
Reliability of goniometric measurements has been examined only under standardized conditions and usually with healthy subjects. The purpose of this study was to assess goniometric reliability in a clinical setting. The reliability of goniometric measurements of passive elbow and knee positions was assessed using patients as subjects. The effect of using the means of repeated measurements and the interdevice reliability of three common goniometers were also examined. Results showed that intratester reliability for flexion and extension of the knee and the elbow joints was high (r = .91 to .99). Intertester reliability was also high (r = .88 to .97) for these measurements except for measurements of knee extension (r = .63 to .70). Although previous investigators have suggested that using the means of multiple measurements improves reliability, our data indicate that this procedure never improves the correlation coefficient more than .12. The reliability was similar for all three devices. The results of this study indicate that for the knee and elbow joints, goniometric measurements performed in a clinical setting can be highly reliable. The method described in this study provides a simple protocol that can be used clinically to investigate goniometric reliability.
Key Words: Elbow joint, Goniometric measurement, Knee joint, Physical therapy.
Although goniometry is a primary measurement tool in physical therapy for initial assessment and for charting patient progress, the reliability of using goniometry in a clinical setting has not been examined. In a clinical setting, measurement technique will often vary between therapists, partially because of their training and preferences and partially because of adaptations, such as positioning, which are necessary with different patients. Therefore, reliability studies that have used standardized protocols, limited numbers of therapists, and healthy subjects do not necessarily reflect the reliability of clinical goniometry.
Previous reports on goniometric reliability have been made by Hellebrandt and associates, Boone et al, and Low.1-3 Boone et al and Low each examined the reliability of goniometry when it was applied to healthy subjects.2, 3 In the study by Boone et al, the therapists used a prescribed method of measurement.2
Hellebrandt and associates used patients as subjects, but only examined intertester reliability by comparing a therapist's obtained measurements with those of an acknowledged expert.1 Although the three studies give some insight into the potential reliability of goniometry under somewhat idealized conditions, they do not reflect the goniometric reliability one can expect in clinical practice. Even Low's suggestion that taking the means of multiple measurements of the same joint improves reliability has been questioned,2
and Hellebrandt and her associates' suggestion that large goniometers are better than small goniometers, has not been tested.
The purpose of this study was to develop a protocol that could be used to examine goniometric reliability in a physical therapy department and to use that protocol to determine the following: 1. Intratester reliability for the measurement of pas-
Dr. Rothstein is Instructor, Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO 63110 (USA). He is also a consultant physical therapist at the Irene Walter Johnson Institute for Rehabilitation, Washington University Medical Center, St. Louis, MO 63110.
Mr. Miller is Staff Physical Therapist, Professional Physical Therapy, Inc, Creve Coeur, MO 63141 and Research Assistant, Program in Physical Therapy. At the time of this study, he was Staff Physical Therapist, Jewish Hospital of St. Louis, Washington University Medical Center.
Mr. Roettger is Instructor, Program in Physical Therapy, Washington University School of Medicine and Director of Physical Therapy, Jewish Hospital of St. Louis.
This article was submitted October 13, 1982; was with the authors for revision eight weeks; and was accepted April 29, 1983.
Volume 63 / Number 10, October 1983 1611
Fig. 1. The three goniometers used in the testing protocol (left to right): large metal, large plastic, and small plastic.
sive flexion and extension of the knee and elbow joints with each of three commonly used goniometers;
2. Intertester reliability for the measurement of passive flexion and extension of the knee and elbow joints with each of three commonly used goniometers;
3. Possibility of improved intertester reliability by using the means of repeated measurements made on the same joint with the same goniometer; and
4. Interchangeability of the three commonly used goniometers.
METHODS
Subjects were 24 patients referred for physical therapy at the Jewish Hospital of St. Louis, Washington University Medical Center. Goniometric measurements were made by 12 physical therapists who had graduated from five different schools and who had from one to four years of clinical experience (average of 2.8 years). Supervisors in the department monitored the testing and did not take measurements. The criterion for selecting a patient was that goniometric measurement of the patient's knee or elbow joints would have been a part of the regular course of examination. The measurements for the study were made during the patients' daily treatments. Twelve patients had their elbows measured and 12 patients had their knees measured. Depending on the pattern of involvement, the left or right extremity was measured.
The three goniometers (Fig. 1) were the following: 1) a large metal goniometer with 12-in movable arms,
marked in 2-degree increments*; 2) a large plastic goniometer with 10-in movable arms, marked in 1-degree increments')"; and 3) a small plastic goniometer with 6-in movable arms, marked in 5-degree increments.‡
The scales on all three goniometers were covered so the therapist using the device could not see the numbers and, therefore, be influenced by the results of the measurements.
The test procedure was as follows (Fig. 2): 1. A therapist first identified a patient who was suit
able for the study. 2. The therapist then sought out a supervisor to mon
itor the test. 3. The therapist then measured the passive range of
motion of the knee or elbow joint for flexion and extension six times. Each goniometer was used twice, but the same goniometer was not used consecutively. The therapist made the measurement by using her own technique. The supervisor gave no instructions on technique but did note the patient's position.
4. When the therapist had aligned the arms of the goniometer to her satisfaction, she gave the goniometer to the supervisor who uncovered the scale to read and record the measurement obtained.
5. After all six measurements had been made, the supervisor found the second therapist who had been paired randomly (using a program written for a Texas Instruments SR-59 calculator**) with the first therapist.
6. The second therapist took six measurements following the same procedure as the first therapist.
7. The supervisor read and recorded the measurements as previously described.
Two statistical methods were used to calculate reliability coefficients. One was the classical method using the Pearson product-moment (r), which is a correlation, or an assessment of covariance. The second method was suggested by Bartko,4 who demonstrated that product-moment correlation coefficients can be high even with large discrepancies between paired scores. This high correlation results because r reflects association (covariance) between two sets of numbers, not necessarily the agreement of the two sets. This second method consisted of calculating intraclass correlation coefficients (ICCs).4 Agreement between multiple measures is reflected by the ICC, and the value of the correlation is the percentage of common variance among repeated measures.4"6 If goniometric measurements are to be used inter-
* PC 5053, JA Preston Corp, 60 Page Rd, Clifton, NJ 07012. † No. 576330 International Standard Goniometer, Orthopedic Equipment Co Inc, Bourbon, IN, 46504.
‡ Hanger Limb Co, St. Louis, MO 63110. ** Texas Instruments, Lubbock, TX 79408.
1612 PHYSICAL THERAPY
RESEARCH
Fig. 2. The flow chart used to conduct the study.
changeably between examiners, and if these measurements are to be done serially, then agreement among repeated measures, as tested by the ICC, may be more appropriate than assessment of association or covar-iance.
To assess intratester reliability, a therapist's first measurement with a given goniometer on a given joint was correlated with the second measurement made on the same joint by the same therapist using the same goniometer. Because all the patients were measured by two different therapists, each measurement with each type of goniometer yielded 24 pairs of scores. Similar pairings were used for the calculation of the ICC.
To assess intertester reliability, three different product-moment correlations and ICCs were calculated for each goniometer. A therapist's first measurement on a joint with a given goniometer was compared with the first measurement made by the second therapist on the same joint with the same goniometer. Second measurements and the means of both measurements were similarly compared. Because two measurements, one by each therapist, were compared for each of the 12 joints, 12 paired measurements were used in each of these analyses.
To determine whether intertester reliability could be improved by comparing the means of repeated measurements taken by the same therapists, three coefficients were compared. One set of coefficients (ICC and r) was calculated based on the first measurements made by the therapists. A second set of coefficients was calculated based on the second measurements made by the therapists. The third set was based on the means of the measurements made by the therapists on a given joint.
To examine whether the devices could be used interchangeably (reliability among devices), the mean value each therapist obtained for a position using a given device was correlated with the mean value the same therapist obtained with the other two devices (N = 24 from the two sets of paired measurements).
TABLE 1 Intratester Reliability (N = 24)
Joint Position
Elbow extension
Elbow flexion
Knee extension
Knee flexion
Device
metal large plastic small plastic
metal large plastic small plastic
metal large plastic small plastic
metal large plastic small plastic
ra
.95
.99
.99
.95
.98
.97
.96
.96
.91
.97
.99
.99
ICCb
.86
.96
.99
.94
.97
.96
.96
.91
.97
.97
.99
.99
In addition to the three correlation coefficients, an ICC also was calculated in this manner. The ICC allowed simultaneous comparison of all three devices.
RESULTS
Results show that intratester reliability for passive flexion and extension of the knee and elbow joints was high (Tab. 1). Intertester reliability was also high for these measurements with the exception of knee extension (Tab. 2). Table 2 also shows that taking the mean of multiple measures does not improve the correlation coefficient for intertester reliability more than .12. Interdevice reliability was found to be high using intraclass correlation coefficients for all positions (Tab. 3), and Pearson product-moment values were also high for all positions.
DISCUSSION
Regardless of the type of device used, a high degree of intratester reliability for both joints and for all four movements tested was found. This agrees with previous studies conducted outside a clinical setting.1"3
Intertester reliability was high for elbow flexion, elbow extension, and knee flexion. The relatively poor intertester reliability for the measurement of knee extension (r ranged from .57 to .80) existed for all three devices. We thought that this poor reliability could have resulted from the therapists' use of different patient positions for measuring knee movements. Different positions could have led to a variable effect from the two-joint hamstring muscle limiting knee extension. To test this hypothesis, a, post hoc analysis was carried out. From records obtained at the time of the study, we determined that seven pairs of measurements were made by therapists who used the same position. Reliability coefficients and intraclass correlation coefficients were then calculated for these seven
a r = Pearson product-moment correlation. b ICC = Intraclass correlation coefficient.
Volume 63 / Number 10, October 1983 1613
TABLE 2 Intertester Reliability (N = 12)
Joint Position
Elbow extension
Elbow flexion
Knee extension
Knee flexion
Device
metal large plastic small plastic
metal large plastic small plastic
metal large plastic small plastic
metal large plastic small plastic
1 st Measurement
rb
.92
.94
.93
.95
.97
.91
.57
.62
.68
.83
.90
.89
ICCC
.92
.94
.92
.85
.97
.87
.59
.61
.70
.84
.92
.89
2nd Measurement
r
.94
.96
.93
.89
.96
.96
.79
.63
.60
.92
.92
.92
ICC
.95
.94
.93
.87
.94
.95
.80
.63
.61
.92
.93
.92
r
.95
.95
.94
.94
.97
.95
.70
.63
.66
.88
.91
.91
ICC
.96
.94
.93
.89
.96
.90
.71
.64
.68
.99
.92
.91
TABLE 3 Interdevice Reliability (N = 24)
Joint Position
Elbow extension Metal Large plastic Small plastic
Elbow flexion Metal Large plastic Small plastic
Knee extension Metal Large plastic Small plastic
Knee flexion Metal Large plastic Small plastic
Metal
— --
---
---
--—
Large Plastica
.96 --
.92 --
.92 --
.99 -—
Small Plastica
.97
.95 -
.92
.98 -
.96
.98 -
.99
.99 -
ICCb
.96
.92
.92
.99
paired measurements and for the five paired measurements where different positions were used.
Examination of Table 4 shows that using the same position by both testers dramatically increased the reliability coefficient. When these coefficients for intertester reliability of knee extension were compared with the intratester coefficients, intertester reliability was still relatively poor. Using the same position improved intertester reliability, but never to a point that was comparable to intratester reliability. For most of the measures, between 15 to 20 percent of the variance can be accounted for by using the same person to measure knee extension. This finding suggests that for knee extension measurements, the same
therapist should take all the measurements for the same patient's joint. The data also suggest that therapists should record the patient's position at the time measurements of knee extension are made.
Of the four movements examined in this study, only intertester reliability for knee extension was poor. Because this finding was true even when positioning was accounted for, other difficulties in obtaining good intertester reliability for this movement must exist. The knee extension arc is limited, and any error might, therefore, be magnified. Determining the anatomical landmarks may be difficult in patients with pathological changes in the knee. Knee extension itself may be highly labile and, therefore, hard to quantify. At least two of these hypotheses seem unlikely. Elbow extension is also measured in a limited arc, and intertester reliability for elbow extension was excellent. If movements into knee extension are highly labile and change with each successive passive movement, poor intratester reliability would have been observed, but this was not the case. If problems identifying bony landmarks for placing the goniometer caused the poor reliability, placement errors could occur more easily for measurements of the knee than of the elbow because of the long lever arms associated with the knee, the shank, and the thigh.
We cannot say what caused poor intertester reliability for knee extension. The results strongly suggested, however, that repeated measurements of knee extension be made by the same therapist. This is in sharp contrast with the three other movements examined: elbow flexion, elbow extension, and knee flexion. For these movements, only minor improvements in reliability can be gained by having the same therapist take all measurements.
The results also indicated that no greater intertester reliability is obtained when therapists use the means
a = Mean of first and second measurements. b r = Pearson product-moment correlation. c ICC = Intraclass correlation coefficient.
a r *= Pearson product-moment correlation. b ICC = Intraclass correlation coefficient.
1 6 1 4 PHYSICAL THERAPY
RESEARCH
of repeated measurements on the same joint. This finding contrasts with the results of Low3 but agrees with those of Boone et al.2 Because the present study was done in a clinical setting, the results indicated that reliability and efficiency can be served by taking a single measurement.
Hellebrandt and associates strongly suggested that reliability was improved when all therapists in a department used the same goniometer.1 They also stated that the best device was a large universal goniometer. Our results did not support either assertion. By examining the correlation matrix (Tab. 3), one can see that all the measurements obtained with the three goniometers systematically related to each other. Of even greater importance was the fact that no ICC value was lower than .919 (Tab. 3). These results demonstrated that for measurements of knee and elbow movements, all three goniometers could be used interchangeably in the physical therapy department we examined.
The results on these two single axis joints cannot necessarily be generalized to other joints, which may have more degrees of freedom and may be more difficult to measure because they are either proximal (hip and shoulder) or small (fingers and ankle). The clinical reliability of goniometric assessment of other joints will require further research.
This study was conducted in one physical therapy department. Although the methods used by the physical therapists to take goniometric measurements in the study were not unique, the results cannot necessarily be generalized to all departments. The protocol used in this study has been demonstrated to be feasible in a busy department and can, therefore, be used by other clinicians who want to assess goniometric reliability in their setting. In this study, multiple devices and the effects of multiple measures were examined. Clinicians interested in establishing goniometric reliability in their own setting may modify the protocol by examining only single devices with single measurements. In our experience, such a study would take no more than five minutes to conduct with each patient.
Reliability is inherent to the quality of a measurement; therefore, this testing protocol can also serve as a tool to provide and document quality assurance.
This study demonstrated that a high degree of reliability can be expected when a therapist measures the same knee and elbow joints. This reliability was almost equally high when other therapists measured elbow movements and knee flexion. For measurements of knee extension, reliability was high only when the same therapist took all serial measurements. This study also indicated that expensive devices were no more reliable than some inexpensive devices for measuring the knee and elbow joints and that relia-
TABLE 4 Reliability Coefficients for Intertester Measures of Knee Extension When the Two Testers Used the Same and
Different Positionsa
Position
Same (n = 7)
Different (n = 5)
Type of Goniometer
metal large plastic small plastic
metal large plastic small plastic
rb
.75
.71
.67
.10
.64
.31
ICCC
.84
.74
.79
.20
.69
.36
bility and efficiency can be served by taking single measurements.
CONCLUSION
Goniometric measurements made in a clinical setting on the elbow and knee can be highly reliable when taken by the same therapist. Measurements taken by different therapists can also be reliable for elbow flexion and extension and for knee flexion. Knee extension measurements taken by multiple therapists demonstrate poor reliability. Reliability can be obtained with single measurements made by any one of three commonly used goniometers. Goniometry can be assessed in a clinic using the protocol in this study, and goniometric assessment can be used for charting patient progress and quality assurance.
Acknowledgments. The authors thank Dr. Franz Steinberg for his cooperation, Eugene Michaels and Dr. Steven J. Rose for their critical thoughts on this manuscript and for their advice on the design of this study, and the staff therapists at the Jewish Hospital of St. Louis whose efforts made this study possible.
REFERENCES
1. Hellebrandt FA, Duvall EN, Moore ML: The measurement of joint motion: Part III - Reliability of goniometry. Phys Ther Rev 29: 302-307, 1949.
2. Boone DC, Azen SP, Lin CM, et al: Reliability of goniometric measurements. Phys Ther 58: 1355-1360, 1978
3. Low JL: The reliability of joint measurement. Physiotherapy 62:227-229, 1976
4. Bartko J J, Carpenter WT: On the methods and theory of reliability. J Nerv Ment Dis 163: 307-317, 1976
5. Ebel RL: Estimation of the reliability of ratings. Psychometrika 16:407-424, 1951
6. Winer BJ: Statistical Principles In Experimental Design, ed 2. New York, NY, McGraw-Hill Inc, 1971, pp 283-292
a These coefficients are based on the first measure each tester made. Coefficients using the means of the two measurements made by each tester tended to be only slightly higher in most cases and slightly lower in a few cases.
b r = Pearson product-moment correlation. c ICC = Intraclass correlation coefficient.
Volume 63 / Number 10, October 1983 1615