Hello Fellow Raschies, I'm doing some work analyzing data from a
standardized test that will remain nameless here. In 2008, the
authorities changed from scoring and equating using the Rasch model to
using the 3-PL model. The justification was that in 2008 (compared to
2007) there were sizable changes in the standard deviation of scale
scores at certain grades. This resulted in the mean scale score
declining by a small amount, but since the distribution spread out more,
more students reached the proficiency level: an obvious contradiction
(according to the testing authorities).

The Authorities concluded: "We needed to equate the tests at grades
where reliability changed, using methodology that took changes in
reliability into account ("3-parameter model" instead of "Rasch
1-parameter model")"

This sounds like a bunch of hooey to me. My take:
1) Changes in variability (and, as a result, reliability) of test scale
scores is a natural phenomenon and not related to the equating method.
Maintaining score distributions (if you really want to do this) is taken
care of in scaling, not in equating.
2) The apparent contradiction (mean scale score not moving in the same
direction as percentage meeting standards) is due to dichotomizing the
distribution into two groups, "meets standards" and "doesn't meet
standards", which is bound to produce loss of information and introduce
3) The switch from Rasch to 3-PL has nothing to do with practicality or
superiority of one method, but just on the predilections of The

Am I wrong? Is 3-PL actually better for equating when score
distributions change?

