[Rasch] Calibrating ratee ability and rater leniency/severity with data not connected
irothnie at usyd.edu.au
Sat Jan 7 20:49:16 EST 2012
I have a similar problem, and the group anchor doesn't seem to have
resolved the subset issue, so I am wondering if there is something else
I can do or how I need to interpret the analysis.
I have applicants taking an oral exam consisting of 9 seven minute
questions . The questions are very structured from the examiner point of
view and examiners are trained.
Because there are a large number of applicants being examined over a
number of weeks, different subsets of questions are used in 'circuits'.
The subsets of questions are partially crossed, ie one question is in
all circuits and many are in more than one. (there are about 50
questions across 13 circuits). Applicants are obviously completely
nested within circuits but examiners are partially crossed with circuits
(and questions) because many examiners examine on more than one day.
I ran the Facets model with examiner, applicant, qn (item) , and
question circuit as facets. Applicants and circuits were in up to 21
subsets. I group anchored the circuits because we do assume they are of
equal mean difficulty (to be fair) and we certainly don't assume
applicants to be of equal ability (they are ranked on their scores).
I still have 11 or so subsets, and I can't group anchor the applicants
obviously. Is there anything else I can do or is it a case of we must
have more crossing of items across circuits? When I run the analysis
without the 'circuit' facet there is no subset problem, but I'm not sure
how useful this analysis in determining 'fairness' of the experience
Any help much appreciated.
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On
Behalf Of Mike Linacre
Sent: Wednesday, 21 December 2011 6:08 PM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Calibrating ratee ability and rater
leniency/severity with data not connected
Thank you for your questions, Kenji Yamazaki.
Your experimental design includes a 20-item instrument that was used
everywhere (good). But there are 12 separate resident programs. Each has
its own attending physicians (raters) and residents (ratees). The main
purpose is to compare resident performance levels.
In this design, there are 12 disjoint subsets of ratings corresponding
to the 12 resident programs. To make the measures of the 12 programs
comparable, we must make some assumptions:
If we want to compare ratee abilities across programs, then we assume
that the 12 samples of raters have the same mean leniency. In Facets,
group-anchor the 12 groups of raters (attending physicians) at the same
value, usually zero. Non-center the ratee facet.
Since the same 20 items are used everywhere, no additional action is
required to make the item difficulties comparable everywhere.
Alternatively, if we want to compare raters severities/leniencies across
programs, then we assume that the 12 samples of ratees have the same
Editor, Rasch Measurement Transactions
rmt at rasch.org www.rasch.org/rmt/ Latest RMT: 25:3 Winter 2011
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch