[Rasch] a scaling issue?
Robert.Hess at ASU.edu
Fri Aug 31 06:32:24 EST 2012
There are several issues that need to be addressed concerning your
questions. First, not all 1PL models are Rasch models. So that issue would
need to be addressed. As an illustration, in Arizona we have a statewide
testing program that incorporates a standards-based assessment (called AIMS)
within a norm-referenced test. The two tests are analyzed separately. In the
past we employed a testing company that insisted upon using a 3PL
proprietary analysis which I tried to replicate using Bilog, while I was
successful to some extent (.90 correlation), I was not able to precisely
obtain the same item order that was obtained by the contractor. When I
re-analyzed the same item set using a Rasch model I found that anywhere
between 15% and as high as 28% of the items did not "fit" while they did
"fit" under the proprietary 3PL and Bilog analyses. Similarly, when we asked
the same company to employ a 1-parameter analysis they employed their own
proprietary model rather than one of the "nonproprietary" Rasch analyses
programs (Winsteps or Rumm, for example).
Keep in mind, that the 3-parameter model employs a varying discrimination
parameter for items as well as a varying "pseudo-guessing" parameter. While
the Rasch model assumes the notion of a constant item discrimination
parameter and a zero "pseudo-guessing" parameter.
I know of two different national testing companies that employ the Rasch
model and one testing company that employs a proprietary 3-parameter model,
so how is it that in Michigan a statewide "standardized testing" program
results in scores derived from two different models? Are the districts
allowed to choose what standardized testing instrument they employ?
In reflection, the additional two parameters of the 3-parameter model would
more than likely result in the gap that you describe for the low achieving
schools because of the influence of both the varying discrimination
parameter as well as the pseudo-guessing parameter - both of these will have
a strong push down affect on low scoring students. Why? Because there is no
clear order of item discrimination due to the impact of the crossing affect
from the varying discrimination values and the varying influence of the
"pseudo-guessing" (mathematically magically determined) effect.
It is also very important to note that under a 3-parameter model almost all
items will "fit", the same cannot be said for the 1-parameter model. I would
also like to note that a school or schools that employ a testing program
that utilizes a Rasch analysis are really attempting to understand how
students perform relative to the tasks they are tested on, whereas, more
often or not, schools that employ a 3-parameter analytic model are
attempting to compare student performance relative to other student
performance. In the Rasch model the assumption is items must fit the model;
in the 3-parameter model the assumption is find the model that best fits the
items. This too may have quite a bit of influence on your low achieving
One final note to consider, in the 3-parameter model (typically) items are
allowed to "float" while person ability is centered; in the Rasch model
items are centered and persons are allowed to "float". While this may not
sound important, try to find out if your 3-parameter model does indeed
reflect the former characteristic. I have found that this can certainly
influence the push down affect on low scoring students thus increasing the
"gap" effect you describe.
I don't know if any of this will help, I hope so.
Prof. Emeritus of Educational Measurement & Evaluation
Arizona State University
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Treder, David
Sent: Wednesday, August 29, 2012 6:36 PM
To: rasch at acer.edu.au
Subject: [Rasch] a scaling issue?
Not sure that this is the right place to bring this up, but though it might
be a good place to start.
As a requirement of the federal NCLB Waiver, states were required to address
the "achievement gap." As their answer, Michigan decided to identify schools
with the largest "gap." The gap is based on the State's standardized tests:
this is computed by subtracting the mean score top 30% percent of students
from the mean score of the bottom 30% of students ( for each subject-area
The issue, and my question:
1) Some of the tests are scaled using 1 parameter IRT, while others are
scaled using the 3 parameter IRT,.
2) Based on tests scaled with 1 P/IRT, the schools with "high achievement"
(highest mean scale score) are far more likely to have a large gap
(correlation between achievement and gap, around .70); for the tests scaled
with 3 P/IRT, there's an opposite relationship - low achievement schools are
far more likely to have a large gap (correlations around -.70)
Would anyone care to weigh in on why tests scaled with 1 P/IRT would
identify diametrically different schools, on the "top minus bottom" index,
than tests scaled with 3 P/IRT?
(or, point me to a place in the literature that talks about the differences
in the way scores will be "spread out" in 1 P/IRT, vs 3 P/IRT).
Rasch mailing list
Rasch at acer.edu.au
More information about the Rasch