[Rasch] quantifying departure from "invariance of item difficulties" or from "unidimensionality"

Agustin Tristan ici_kalt at yahoo.com
Sun Jul 6 08:51:42 EST 2008

Hello Steve,
I don´t know if what you ask can be solved by the use of the test design line (TDL), based on the idea by Wright and Stone, but our improvement is to define a design from the very beginning with the commitment to distribute the items uniformly from easy to hard items (from -1.5 to +1.5 logits). All items are supposed to fit the Rasch model. This design must correspond to our focus group if everything is OK, if the item distribution doesn´t work well for a specific group, you may measure the distance with regard to the focus group. You may also check the fit to the TDL and other properties. Unfit of data to the TDL may indicate several things:
- difference in competencies of the specific student group
- multidimensional measurement or violation of unidimensionality
- lack of objectivity of the set of items 
- other problems to be analyzed in the future...
The use of the TDL has applications on standard error calculation, reliablity and other topics from the design, i.e. you may know which is your Cronbach´s alpha or the SE from the design (before the administration of the test!).
The construction of tests following the test design line makes unnecesary to do equating as al tests are equivalent from the beginning. Nevertheless equating has a different benefit regarding the characteristics of the population as you have an evidence of the quality of your test and what we consider the validity of the scale.
As I said, I don´t know if this test design line is the complete solution for what you are requesting, but it has been proved to be very useful. The model has been used in a meta analysis (not yet published) showing that tests such as NAEP, IAEP, PISA and other from latinamerican countries, fit closely the TDL, and we have shown design defficiencies on some tests, we could compare among focus and non-focus groups (able and not able sudents), and can identify the measurement properties of the same test if applied to 9 and 12 K students. We have recently used the TDL on the analysis of scales for health questionnaires. Probably the TDL needs an opportunity to be known elsewhere!

You may download the complete document of the TDL from the ERIC data base:
use search with ERIC #: ED501232
(meta analysis is not included in the document, it will be finished this year).
Hope this helps

Ave. Cordillera Occidental  No. 635
Colonia  Lomas 4ª San Luis Potosí, San Luis Potosí.
C.P.  78216   MEXICO
 (52) (444) 8 25 50 76
 (52) (444) 8 25 50 77
 (52) (444) 8 25 50 78
web page (in Spanish AND ENGLISH): http://www.ieesa-kalt.com

--- On Sat, 7/5/08, Steven L. Kramer <skramer1958 at verizon.net> wrote:

From: Steven L. Kramer <skramer1958 at verizon.net>
Subject: [Rasch] quantifying departure from "invariance of item difficulties" or from "unidimensionality"
To: Rasch at acer.edu.au
Cc: "Gaun, Stephanie" <SGaun at arcadia.edu>, "Wolff, Ned" <Wolffe at arcadia.edu>, "Pomeroy, Deborah" <Pomeroyd at arcadia.edu>, "Michael Posner" <michael.posner at villanova.edu>
Date: Saturday, July 5, 2008, 5:08 PM

Dear Rasch folks,
I have long been trying to wrap my mind around issues related to uni-dimensionality, and the potential use/abuse of Rasch models when uni-dimensionality is violated.  I have several questions.
Is "UNIDIMENSIONALITY" identical to "the invariance of item difficulties--that is, item difficulties that are the same for all similar groups of respondents"?
Has anyone developed a method of quantifying the degree to which a group of items departs from the assumption that item difficulties are invariant across groups--perhaps by comparing the difficulties for two distinct groups?  (The example I have in mind is checking whether a math test is invariant when comparing students taught using typical U.S. teaching methods vs. students using one of the "reform" curricula like Core Plus.)
  --Is this method perhaps related to analyses of DIF?
  --Is there any meaning to a generic "degree of departure from 'unidimensionality/invariant item difficulties'"  or does the concept of necessity need to be defined for two pre-determined subgroups?
I would make the following claims about various applications of Rasch Modeling, based on my subjective impressions. Are the claims correct?
1. One common use of Rasch modeling is to translate a group of item responses into a Rasch score, which better approximates an interval scale than does "raw number correct"--even though raw number correct is a sufficient statistic for a Rasch score.  This simplest application of Rasch is relatively tolerant to violations of "invariance of item difficulty" or (I think equivalently) of "unidimensionality".  The Rasch score will be no worse than any other summary score one could devise, and perhaps better.
2. Other common uses of Rasch modeling range from somewhat intolerant to very intolerent to violations of the "invariance" assumption
  -Equating test forms and/or computing Rasch scores with missing items.  As I understand it, these are both examples of imputing scores with missing item responses--which is an application completely dependant on the "invariant item difficulties".  In theory, these tasks should not work at all if unidimensionality is violated.  Nevertheless, my subjective impression is "doing them is bad, but perhaps you can get away with it".
  -Using a Rasch score for Formative Assessment in applications by groups like Northwest Evaluation Association and MetaMetrics.  This is just a direct application of invariant item difficulties: these tests say, " If you scored a particular ability "THETA" then these are the items you have solidly; those are the items you can do sometimes and should be working on now; those other items are ones for which you have hardly a clue and which are perhaps not in your Zone of Proximal Development."  As item difficulties become variant across individuals, this entire application becomes meaningless
-Adaptive testing.  Here I think real harm may be done to groups dis-similar from the equating sample.  Imagine Group A for whom the order of item difficulty is basically reversed from what it is for Group B. The test is equated to Group B and the adaptive test starts students on an "easy" item.  But that "easy" item is relatively difficult for group A, so Group A students mostly miss it.  The test gives them an "easier" item---but it isn't easier for Group A.  They miss it again, and so on--so that they are assigned a low score.  Group A students never get to address items that were "easy" for them--because the test was equated to Group B, and these items are mistakenly categorized as "too difficult" and thus never offered.
Bringing all these questions together:  is there a way to quantify whether a set of items is "sufficiently unidimensional" for each of the following applications:
-computing a Rasch score, no missing items
-computing a Rasch score with missing data; or equating two test forms
-using Rasch for Formative Assessment
-developing an adaptive test
(Perhaps this has been discussed in the literature and I need to seek references.  Or perhaps someone on the list-serve has developed a measure.  Or perhaps this is a new area of investigation.)
STeve Kramer
Math Science Partnership of Greater Philadelphia_______________________________________________
Rasch mailing list
Rasch at acer.edu.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20080705/32cb1493/attachment.html 

More information about the Rasch mailing list