[Rasch] Negative pt-bis and fit of 1.0? How can this be?

Andrew Kyngdon akyngdon at lexile.com
Wed Mar 7 10:27:41 EST 2012

Greg and others,

Steve Humphry and I recently extended the Outfit and Person Coefficient
statistics (Wright & Masters, 1982) to pairwise choice data analysed with
the Bradley - Terry - Luce (BTL) model. The investigation was not a
psychometric one, but one into the utility of incremental gains and losses
under conditions of risk and uncertainty.

In this context, fit statistics indicate the degree of transitivity of
pairwise choices, given the BTL utilities for sets of simple gambles. We
found that the statistics were consistent with violation of first order
stochastic dominance induced by event splitting operations in two sets of
5-branch simple gambles. For a set of binary gambles (2 branch gambles),
the fit statistics were consistent with first order stochastic dominance as
pairwise choice behaviour was highly transitive. To compare the degree of
transitivity between sets of gambles, we used the person coefficient, which
indicated that there was twice as much noise in the 5-branch sets of
gambles compared to the binary gambles. This was consistent with event
splitting significantly interfering with choice behaviour under risk.

In utility theory, there is much in the way of established, descriptive
theory, such as expected utility theory (Bernoulli, 1738; von Neumann &
Morgenstern, 1944), cumulative prospect theory (Kahneman & Tversky, 1979;
Tversky & Kahneman, 1992) and configural weighting utility (Birnbaum,
2008). All these non-stochastic theories predicted first order stochastic
dominance would hold for the binary gambles, so the results Steve and I
obtained with the stochastic BTL model were consistent with these theories.
Configural weighting, however, predicted violations of dominance with the
5-branch gambles, but the pattern of violation we observed was not
consistent with configural weighting, as the BTL utility spreads did not
behave as predicted. It would seem that violations of dominance are caused
by choice errors, not an underlying psychology of configural weighting,
which is consistent with the “bounded rationality” hypothesis advocated by
the behavioural finance economist Levy (2008).

So, what fit statistics are able to tell you depends entirely upon how much
you already know about the relevant, psychological system. Completely
relying upon them as evidence of the behaviour of a single, relevant
psychological quantity is far too much to ask of them.



*From:* rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] *On
Behalf Of *Stone, Gregory
*Sent:* Wednesday, 7 March 2012 5:54 AM
*To:* <rasch at acer.edu.au>
*Subject:* Re: [Rasch] Negative pt-bis and fit of 1.0? How can this be?

Sending this to you and not the listserv.  I have come to the conclusion
that apart from using it as one consideration for dimensionality, fit is a
useless statistic when it comes to assessing reasonability of item
performance.  While I have no empirical evidence of this, gathered
deliberately, it has appeared to me that time after time, decisions made
with point biserials are consistently more reasonable than any made with
fit, regardless of whether we accept the largely arbitrary range of fit so
popularized in books like Bond and Fox, or the more rational fit calculated
by folks like Smith. This appears to hold true for small datasets, where
many items demonstrate fit concerns, and large datasets where everything,
naturally, fits perfectly. Thoughts?


On Mar 6, 2012, at 1:46 PM, Mark Moulton wrote:


Belay my previous message (coffee hadn't kicked in yet).

You're right.  Variances -- p * (1 - p) at the extreme of p (1 or 0) will
be smaller, which should make the misfits bigger.

Could those cell probabilities p[ni] actually be close to 0.50?  This would
happen if the person measures tended to collapse to the center of the
scale, an artifact of randomness.


On Tue, Mar 6, 2012 at 7:58 AM, Stuart Luppescu <slu at ccsr.uchicago.edu>

Hello, I'm analyzing items suggested for a course final exam. The
problem is that the students the items are tested on have not taken the
course yet. This results in very poor performance. The average person
measure is -0.78, and the average item p-value is 0.33 (for 4-choice
multiple choice items).

What is confusing me is that all the items have mean-square fit
statistics (infit and outfit) near 1.0, while many of them have negative
point-biserial correlations. According to my understanding, the fit
statistics are calculated from an aggregation of the squared
standardized residuals, which are calculated from the raw residual
divided by the score variance. In this case, the expectation of an
individual response would be low, so raw residuals would be large. And
the score variance at the extremes are lower than in the middle, so you
divide a large raw residual by a small score variance and you get a very
large standardized residual, right? So, how come the fits are close to
expectation? Especially since the pt-bis are low or negative? I don't
get it. Can someone explain this to me?

Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
University of Chicago -=- CCSR
才文と智奈美の父 -=-    Kernel 3.2.1-gentoo-r2
What we have is nice, but we need something very
 different.    -- Robert Gentleman
 Statistical Computing 2003, Reisensburg (June

Rasch mailing list
Rasch at acer.edu.au

Rasch mailing list
Rasch at acer.edu.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20120307/e14020df/attachment.html 

More information about the Rasch mailing list