[Rasch] Fan: > Is replicating Fan's research worthwhile? I'm voting no for now...
Trevor Bond
trevor.bond at jcu.edu.au
Sat Nov 3 14:49:51 EST 2007
I don't know that any other critique of Fan's Rasch work exists; so
here's our two bits' worth
Bond & Fox 2nd (@007) pp266-267
Another pervasive view is that traditional
statistical approaches in the social sciences provide all the techniques
sufficient for understanding data quantitatively. In other words, the
Rasch model
is nothing special and anything outside the scope of traditional
statistics produces
little or no extra for all the extra work involved: special software,
Rasch work-
shops, and books such as this one. In a large scale empirical comparison of IRT
(including Rasch) and CTT item and person statistics in mandated achievement
testing, Fan (1988) concluded that the intercorrelations between
person indicators
and between item indicators across Rasch, 2PL, 3PL IRT and CTT models were
so high as not to warrant the extra effort of latent trait modeling.
Because the IRT Rasch model (one parameter IRT model) assumes fixed item
discrimination and no guessing for all items, the model only provides estimates
for item parameter of difficulty. Because item difficulty parameter estimates
of the Rasch model were almost perfectly related to CTT-based item
difficulty indexes (both original and normalized), it appears that
the one-parameter
model provides almost the same information as CTT with regard to
item difficulty but at the cost of considerable model complexity. Unless Rasch
model estimates could show superior performance in terms of invariance
across different samples over that of CTT item difficulty indexes, the results
here would suggest that the Rasch model might not offer any empirical advantage
over the much simpler CTT framework. (Fan, 1998, p. 371)
Novices to Rasch measurement might ask, "How could that possibly be the
case?" The explanation is really quite simple but goes to the very
heart of the distinction
canvassed in this volume between Rasch measurement on the one hand
and general IRT- and CTT- based analyses on the other. Fan revealed, "As the
tabled results indicate, for the IRT Rasch model (i.e., the one parameter IRT
model), the relationship between CTT- and IRT- based item difficulty
estimates is
almost perfect" (p. 371). Of course, for both CTT and the Rasch
model, N (number
correct) is the sufficient statistic for the estimation of both item
difficulty and
person ability. However, for the Rasch model there is a crucial caveat: To the
extent that the data fit the Rasch model's specifications for
measurement, then N
is the sufficient statistic. In light of the attention paid to the
issues raised about
Rasch model fit and unidimensionality in chapter 11, it is not so easy then to
glide over the telling result of Fan's analyses: "Even with the
powerful statistical
test, only one or two items are identified as misfitting the two- and
three-parameter
IRT model. The results indicate that the data fit the two- and three-parameter
IRT models exceptionally well" (Fan, 1988, p. 368). Or, should that be, the
2PL and 3PL models that were developed accounted for these data very well? Fan
went on to report, "The fit of the data for the one-parameter model,
however, is
obviously very questionable, with about 30 percent of the items
identified as misfitting
the IRT model for either test." (Fan, 1988, p. 368). Then, according to our
approach to the fit caveat, only about 70% of the items might be used
to produce
a Rasch measurement scale in which N correct would be the sufficient statistic.
Fan continued, "Because there is obvious misfit between the data and the one
parameter IRT model, and because the consequences of such misfit are not
entirely clear (Hambleton et al., 1991), the results related to the
one-parameter
IRT model presented in later sections should be viewed with extreme caution"
(Fan, 1988, p. 368). From our perspective, "[V]iewed with extreme caution"
would be better written as "dismissed as irrelevant to evaluating the
value of the
Rasch model."
Given that the second and third item parameters (slope and guessing) are
introduced into the 2PL and 3PL IRT models expressly for the purpose
of reducing
the variance not accounted for by the item difficulty parameter
alone, we reasonably
could expect (but do not always get) better fit of the 2PL and 3PL IRT
models to the data. Let us not be equivocal about how proponents of the Rasch
model, rather than the authority cited by Fan, above, regard the role
of fit statistics
in quality control of the measurement process: "Rasch models are the only
laws of quantification that define objective measurement, determine
what is measurable,
decide which data are useful, and exposes which data are not" (Wright,
1999, p. 80). In other words, by this view, the results showing
failure to fit the
Rasch model should not merely be viewed with extreme caution, they should be
dismissed out-of-hand for failing to meet the minimal standards
required for measurement.
Readers might wish to judge for themselves the extent to which Fan
actually treated the Rasch results with extreme caution--but at the
very minimum,
unless the data for 30% of misfitting items are removed from the data analysis
adopting the Rasch model the resultant Rasch versus IRT versus CTT comparisons
remain misleading, invidious, or both.
And would you like an invariant interval level measurement
--
Trevor G BOND Ph D
Professor and Head of Dept
Educational Psychology, Counselling & Learning Needs
D2-2F-01A EPCL Dept.
Hong Kong Institute of Education
10 Lo Ping Rd, Tai Po
New Territories HONG KONG
Voice: (852) 2948 8473
Fax: (852) 2948 7983
Mob:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20071103/57264fb0/attachment.html
More information about the Rasch
mailing list