Pilot Study: Measurement Characteristics of a Web-based Multi-site Evaluation System for Operative Skills
Derstine, Pamela L.
MetadataShow full item record
A modified version of a well-studied operative performance rating form, developed by the RRC for Urology was made available in a web-based evaluation system linked to the ACGME Case Log system. While use of the system by ACGME-accredited urology residency programs is not required, the system may be well-suited for providing data in a future accreditation system where aggregated performance scores are provided to RRC’s annually or semi-annually to permit close monitoring of program quality and early detection of potential problems. Study goals were to: determine if performance scores derived from this system distinguished resident performance by training level and/or by resident operative experience; determine if performance scores derived from this system detected significant differences among programs for resident performance on all cases and differences among programs for resident performance for cases of varying difficulty; and, because rater use of the evaluation system is not controlled, determine if there is a significant rater effect on performance scores. Retrospective data for all urology residents using the system between July 2001 and June 2008 were provided in deidentified format and analyzed using an applied mixed models approach in order to minimize potential effects due to missing data. Summed scores were modeled as a function of fixed effects, random effects and error. Extensive analyses for descriptive characteristics of the dataset were carried out in order to elucidate program usage of the evaluation system. Usage of the system by programs and by residents/evaluators within programs was highly variable. The results of the mixed models analysis showed that resident performance improved with training level, except there was little change between PGY2 and PGY3 resident scores. Further studies are needed to determine if this finding remains under conditions where all programs are required to use the system at regular intervals for all residents. The current method of continuous data collection by the system allowed examination of the effect of resident operative experience irrespective of training level. Three components of operative experience, namely amount of time in training, the number of cases performed, and the complexity of cases (defined by the number of CPT codes/case), were found to have a significant effect on distinguishing resident performance scores. While the effect of experience is consistent with findings on the relationship of operative volume to clinical outcomes, the effect of experience combined with case complexity remains unclear. Limitations include the narrow definition of case complexity used for this model as well as the irregular use of the system by residents and evaluators for specified case types. To determine if the evaluation system could distinguish performance scores among programs, two models were used. The first model explored differences among programs for aggregated scores for all case types. When the referent or expected performance score was set at the median program case score, two programs had scores significantly different than the median score. A second model explored the question of differences among programs for specific case types. The referent or threshold was identified for each case type as the program having the median score for that case type. While it was not possible to analyze all 29 case types for all programs due to large gaps in the data, 23 case types could be analyzed, each for a subset of programs. This model was able to identify a total of 19 case-specific program differences for 10 case types. For this study, the rater effect was defined as the elapsed time between case performance and case evaluation. Regardless of the model used to analyze the data, there was no significant rater effect on performance scores. This finding is consistent with the recently reported finding that while experienced and inexperienced raters of trainee performance in the clinical workplace use different approaches in their decision making, there were no significant differences in rating scores. In contrast, it has been shown that there is a significant difference in the rating scores of trained raters and untrained raters when assessments are carried out under controlled conditions, such as OSCE’s or bench assessments. Further studies of rater effects comparing controlled and uncontrolled assessment conditions are needed, since the ultimate goal of resident performance assessment is to determine when a resident is competent to enter practice without direct supervision. In summary, the operative performance scores derived from the system used in this study were able to distinguish resident performance by training level and by experience. In addition, differences between programs could be distinguished both for scores aggregated across all case types as well as for scores aggregated for specific case types. There was no significant rater effect. These results suggest that this evaluation system would be suitable for use in a future residency program accreditation system where aggregated performance scores are regularly provided to RRCs to facilitate closer monitoring of program quality and timely identification of potential problems.
Subjectoperative skills assessment
graduate medical education
ACGME case log system