[Rasch] quantifying departure from "invariance of item difficulties" or from "unidimensionality"
Steven L. Kramer
skramer1958 at verizon.net
Sun Jul 6 08:08:28 EST 2008
Dear Rasch folks,
I have long been trying to wrap my mind around issues related to uni-dimensionality, and the potential use/abuse of Rasch models when uni-dimensionality is violated. I have several questions.
Is "UNIDIMENSIONALITY" identical to "the invariance of item difficulties--that is, item difficulties that are the same for all similar groups of respondents"?
Has anyone developed a method of quantifying the degree to which a group of items departs from the assumption that item difficulties are invariant across groups--perhaps by comparing the difficulties for two distinct groups? (The example I have in mind is checking whether a math test is invariant when comparing students taught using typical U.S. teaching methods vs. students using one of the "reform" curricula like Core Plus.)
--Is this method perhaps related to analyses of DIF?
--Is there any meaning to a generic "degree of departure from 'unidimensionality/invariant item difficulties'" or does the concept of necessity need to be defined for two pre-determined subgroups?
I would make the following claims about various applications of Rasch Modeling, based on my subjective impressions. Are the claims correct?
1. One common use of Rasch modeling is to translate a group of item responses into a Rasch score, which better approximates an interval scale than does "raw number correct"--even though raw number correct is a sufficient statistic for a Rasch score. This simplest application of Rasch is relatively tolerant to violations of "invariance of item difficulty" or (I think equivalently) of "unidimensionality". The Rasch score will be no worse than any other summary score one could devise, and perhaps better.
2. Other common uses of Rasch modeling range from somewhat intolerant to very intolerent to violations of the "invariance" assumption
-Equating test forms and/or computing Rasch scores with missing items. As I understand it, these are both examples of imputing scores with missing item responses--which is an application completely dependant on the "invariant item difficulties". In theory, these tasks should not work at all if unidimensionality is violated. Nevertheless, my subjective impression is "doing them is bad, but perhaps you can get away with it".
-Using a Rasch score for Formative Assessment in applications by groups like Northwest Evaluation Association and MetaMetrics. This is just a direct application of invariant item difficulties: these tests say, " If you scored a particular ability "THETA" then these are the items you have solidly; those are the items you can do sometimes and should be working on now; those other items are ones for which you have hardly a clue and which are perhaps not in your Zone of Proximal Development." As item difficulties become variant across individuals, this entire application becomes meaningless
-Adaptive testing. Here I think real harm may be done to groups dis-similar from the equating sample. Imagine Group A for whom the order of item difficulty is basically reversed from what it is for Group B. The test is equated to Group B and the adaptive test starts students on an "easy" item. But that "easy" item is relatively difficult for group A, so Group A students mostly miss it. The test gives them an "easier" item---but it isn't easier for Group A. They miss it again, and so on--so that they are assigned a low score. Group A students never get to address items that were "easy" for them--because the test was equated to Group B, and these items are mistakenly categorized as "too difficult" and thus never offered.
Bringing all these questions together: is there a way to quantify whether a set of items is "sufficiently unidimensional" for each of the following applications:
-computing a Rasch score, no missing items
-computing a Rasch score with missing data; or equating two test forms
-using Rasch for Formative Assessment
-developing an adaptive test
(Perhaps this has been discussed in the literature and I need to seek references. Or perhaps someone on the list-serve has developed a measure. Or perhaps this is a new area of investigation.)
Math Science Partnership of Greater Philadelphia
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch