A Comparison of the Angoff and Item Mapping Standard Setting Methods for a Certification Examination
Schnabel, Sarah D
MetadataShow full item record
In high-stakes licensure and certification testing, sound standard setting is essential. The Angoff method requires subject matter experts (panelists) to estimate the likelihood that a minimally-competent examinee would answer an examination item (question) correctly. The Angoff method’s implementation features vary across studies, but often require panelists to estimate the likelihood of answering an item correctly on a scale of 0-100 (Angoff, 1971). Despite widespread use, the Angoff method has been criticized as being too cognitively complex (Skorupski, 2012). Some researchers argue that it is not possible for individuals to reliably apply a definition of minimal competence through estimation of performance on an examination item (Glass, 1978; Brandon, 2004). However, others have demonstrated that despite criticisms, the Angoff method produces reliable results and reasonable standards (Norcini & Shea, 1992; Plake & Impara, 2001). The item mapping method, a variation of the bookmark method, has not been used widely in practice but it affords some advantages over the Angoff method in theory (Wang, 2003; 2009). The item mapping method makes use of a graphical representation of items and encourages group discussion. Using the item mapping method, panelists answer “yes” or “no” to the question “would a minimally-competent examinee answer this item correctly?” This research compares the Angoff and item mapping method standard setting procedures. Using two panels of subject-matter experts, standards were set on a recertification examination using both methods. Both panels arrived at higher standards using the Angoff method than when using the item mapping method. The methods were compared with respect to procedural, internal, and external validity. The evidence for procedural validity for both methods was strong. Neither method was significantly more internally consistent, but the Angoff method resulted in higher intra-panelist consistency (stronger correlations between ratings and empirical difficulty). External validity is established in part through evaluation of the reasonableness of the cut score. The panelists did not agree on which method produced a more reasonable cut score, but they preferred the process of the Angoff method.