[Rasch] Fan: > Is replicating Fan's research worthwhile? I'm voting no for now...
Lang, William Steve
WSLang at tempest.coedu.usf.edu
Sat Nov 3 09:08:55 EST 2007
Maybe Fan's research lacks some of the most important concepts that I consider in Rasch:
(I don't have the Fan original, but using the numbered description by Jim Sick below).
1.) I would have to look at why I put the items together to start with and why some did not fit. Does that inform the construct or does it inform the item construction? I expect to write items to the construct, not explore for imaginary factors that might mean something. What was the original plan for the scale that was used for Fan's items? If you don't know your construct, you may not like the Rasch model or what it tells you.
2.) What does the DPF explain? [Quote from Embretson, "Other research, conducted on the power of person-fit statistics under 1PL, 2PL, or 3PL models does not strike us a very encouraging either(references)."] Some of the most important observations in my uses of Rasch are the very interesting clues that are validated by investigation coming from person keyforms or similar person analysis patterns. If you don't care about person fit, you may not like the Rasch philosophy. (Personally, I wish Richard would "re-publish" IPARM. A very cool program.)
3.) Does the instrument expect "challenges" such as missing data, bias detection, or rater errors? If so, the detection of misfitting items are only one of several reasons to use a particular model. Sometimes editing items or adding a few new ones is pretty minor compared to "rater training" or simply making the test work "across 3 languages". If you have a perfect testing situation, you may not need the Rasch model. On the other hand, if you really understand your construct and you are interested in robust instruments; that's another issue. How about the "split item" function in RUMM 2020? Can you do that with classical analysis?
4.) Were all the items the same item type? Again, many applications that I use in folio or performance applications require different item types or different levels of inference from the judges. Maybe Fan's items are a pretty simple example. If you are so consumed by the item types and you forget the fact that the measures are conjoint assessments of the underlying construct; and you don't realize that there's more than one way to skin the cat (is that "multi-joint?), then you certainly don't want the Rasch model. Different "levels of inference" are deadly to "discrimination" most likely (my seat of the pants intuition). Do they make sense with Rasch? What does FACETS say (Thank you, Mike!)? Sometimes items make construct sense to me across item types. I like that...and I can't imagine the gyrations a 2PL or 3PL would do with it...unless you're a fan (pun intended) of string theory.
5.) Why the hell would you rename something unless you didn't know what you were trying to measure in the first place? (See number 1 above).
I suspect that this is a provocative enough email that exposes my ignorance, but it is one example of why the Rasch model might suggest as much a philosophy as a mathematical model. Fan may have a completely different view of "measurement" than I do.
University of South Florida
From: rasch-bounces at acer.edu.au on behalf of Jim Sick
Sent: Fri 11/2/2007 4:19 AM
To: Rasch List
Subject: Re: [Rasch] Fan
To get back to Anthony's original question:
> *Is replicating Fan's research worthwhile?
As I recall, Fan found that 30% of the items did not fit the Rasch
model, which she interpreted as a weakness of the model in accounting
for the variance in the data. I think it would be quite interesting if
someone replicated this study and:
1. Systematically deleted misfitting items until a set with acceptable
fit was derived
2. Compared the Rasch measures constructed from the reduced item set
with raw, 2-PL, and 3-Pl scores of the original data set. And most
3. Conducted a thorough qualitative assessment of the deleted and
retained items in order to . .
4. Discuss the implications for how we name or re-name this variable,
and/or revise our theoretical description of the "new" construct.
Of course, the score comparisons then become fruits and oranges, but
that's the point. And would the new measures be appreciably different?
That is an interesting empirical question.
J. F. Oberlin University
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch