[Rasch] Negative pt-bis and fit of 1.0? How can this be?

Stone, Gregory Gregory.Stone at UToledo.Edu
Wed Mar 7 05:54:06 EST 2012

Sending this to you and not the listserv.  I have come to the conclusion that apart from using it as one consideration for dimensionality, fit is a useless statistic when it comes to assessing reasonability of item performance.  While I have no empirical evidence of this, gathered deliberately, it has appeared to me that time after time, decisions made with point biserials are consistently more reasonable than any made with fit, regardless of whether we accept the largely arbitrary range of fit so popularized in books like Bond and Fox, or the more rational fit calculated by folks like Smith. This appears to hold true for small datasets, where many items demonstrate fit concerns, and large datasets where everything, naturally, fits perfectly. Thoughts?


On Mar 6, 2012, at 1:46 PM, Mark Moulton wrote:


Belay my previous message (coffee hadn't kicked in yet).

You're right.  Variances -- p * (1 - p) at the extreme of p (1 or 0) will be smaller, which should make the misfits bigger.

Could those cell probabilities p[ni] actually be close to 0.50?  This would happen if the person measures tended to collapse to the center of the scale, an artifact of randomness.


On Tue, Mar 6, 2012 at 7:58 AM, Stuart Luppescu <slu at ccsr.uchicago.edu<mailto:slu at ccsr.uchicago.edu>> wrote:
Hello, I'm analyzing items suggested for a course final exam. The
problem is that the students the items are tested on have not taken the
course yet. This results in very poor performance. The average person
measure is -0.78, and the average item p-value is 0.33 (for 4-choice
multiple choice items).

What is confusing me is that all the items have mean-square fit
statistics (infit and outfit) near 1.0, while many of them have negative
point-biserial correlations. According to my understanding, the fit
statistics are calculated from an aggregation of the squared
standardized residuals, which are calculated from the raw residual
divided by the score variance. In this case, the expectation of an
individual response would be low, so raw residuals would be large. And
the score variance at the extremes are lower than in the middle, so you
divide a large raw residual by a small score variance and you get a very
large standardized residual, right? So, how come the fits are close to
expectation? Especially since the pt-bis are low or negative? I don't
get it. Can someone explain this to me?

Stuart Luppescu -=- slu .at. ccsr.uchicago.edu<http://ccsr.uchicago.edu/>
University of Chicago -=- CCSR
才文と智奈美の父 -=-    Kernel 3.2.1-gentoo-r2
What we have is nice, but we need something very
 different.    -- Robert Gentleman
 Statistical Computing 2003, Reisensburg (June

Rasch mailing list
Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/markm%40eddata.com

Rasch mailing list
Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/gregory.stone%40utoledo.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20120306/ba699161/attachment.html 

More information about the Rasch mailing list