[Rasch] Comparing rater leniencies across independent resident programs

Kenji Yamazaki kyamazaki at acgme.org
Wed Jan 11 07:39:37 EST 2012

Dear colleagues:

I was wondering if I could ask another question in relation to my previous posting.  I become interested in comparing raters' severities/leniencies across resident programs.  Particularly, I would like to show that the average severity/leniency measure for each resident program varies across programs.

I made a FACETS program that specifies the model including evaluator, resident, and item as independent variables (facets).

            (DV =) evaluator, resident, item

As instructed previously, I group-anchored on the resident facet and specified the item facet as if it would be applied to one dataset with connectedness (i.e., a normal specification procedure).

Now, I have obtained evaluator measures along with other information such as standard errors and fit indexes (although I found several elements are shown as unmeasurable or maximum, I would like to leave these issues for another time for the sake of argument).  In order to show that evaluator severity/leniency varies depending on the resident program to which evaluators belong, I am thinking that I would average evaluator measures for each resident program.  Then, I would compare these average scores among 12 resident programs.  My question is how to compare average rater severities/leniencies at the resident program level.  To achieve this goal, I came up with the following two options.  However, I am not sure how to do this accurately.  These are my questions.

1.      Average evaluator severity/leniency measures for each resident program.  How can standard errors of measures be included for these comparisons?  If t-test-like comparisons are used, should p values be changed at a very stringent level because there would be so many comparisons among 12 programs expected?

2.      Add resident program facet to the model and obtain measures as main effects.

(DV =) program, evaluator, resident, item

However, I am not sure whether or not I need to group-anchor on the evaluator facet as well as the resident facet.

Thank you very much for your help.


From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Mike Linacre
Sent: Wednesday, December 21, 2011 1:08 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Calibrating ratee ability and rater leniency/severity with data not connected

Thank you for your questions, Kenji Yamazaki.

Your experimental design includes a 20-item instrument that was used everywhere (good). But there are 12 separate resident programs. Each has its own attending physicians (raters) and residents (ratees). The main purpose is to compare resident performance levels.

In this design, there are 12 disjoint subsets of ratings corresponding to the 12 resident programs. To make the measures of the 12 programs comparable, we must make some assumptions:

If we want to compare ratee abilities across programs, then we assume that the 12 samples of raters have the same mean leniency. In Facets, group-anchor the 12 groups of raters (attending physicians) at the same value, usually zero. Non-center the ratee facet.

Since the same 20 items are used everywhere, no additional action is required to make the item difficulties comparable everywhere.

Alternatively, if we want to compare raters severities/leniencies across programs, then we assume that the 12 samples of ratees have the same mean ability.


Mike L.

Mike Linacre
Editor, Rasch Measurement Transactions
rmt at rasch.org www.rasch.org/rmt/<http://www.rasch.org/rmt/> Latest RMT: 25:3 Winter 2011
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20120110/2bf4599d/attachment.html 

More information about the Rasch mailing list