[Rasch] Question on measure development
slu at ccsr.uchicago.edu
Thu Apr 14 10:16:43 AEST 2016
Fellow Raschies, Back in the mid- to late-1990s, in consultation with
Ben Wright, we developed a method for developing and scoring survey
measures. We survey all teachers, and students in grades 6 through 12
in Chicago Public Schools every year now. We have been using this
method for about 20 years now, and it has gotten to be routine. Now we
have been questioned about some of the details, and it made me wonder
whether it's routine elsewhere, and whether it is done the same way.
Here's the method:
1) Take a random sample of 2000 elementary school and 2000 high school
students or teachers.
2) Analyze this sample in Bigsteps or Winsteps.
3) Remove misfitting people (generally infit mean square and/or outfit
mean square > 2.0), and repeat the analysis. This should improve the
4) Remove misfitting items in the conventional way.
5) Save the item and step difficulties from the run with the optimal
item and person subset.
6) Score all the people using the item and step parameters from 5) as
This is all very ordinary, but I'm wondering about step 3. Do people
regularly remove misfitting people to get optimal item parameters, and
then score all the people using them as anchors? And if they do, what
they use for the cutoff? Is 2.0 reasonable?
Sometimes, especially when we are working with smaller data sets (e.g.,
only Pre-K teachers), removing misfitting people using the 2.0 cutoff
removes too many people, especially those who are responding in the top
and bottom categories. This reduces the number of responses in the top
and bottom categories and spreads out the category thresholds to an
extreme degree, and reduces the reliability. In this case, we sometimes
change the cutoff to something like 3.0, and it works much better. Have
other people had this kind of experience, and, if so, how do you deal
Thank you for your input.
UChicago Consortium on School Research
More information about the Rasch