[Rasch] Misfitting Individuals
gstone at UTNet.UToledo.Edu
Mon Apr 30 10:25:54 EST 2007
The one caveat in this description are the "rules of thumb". As noted they are as variable as are the citations they are found among. Having used MNSQ since I began, I was recently made amptly aware of the problems associated with them. Rules of thumb are OK for a few hundred, but become less than useful as the sample size increases. Indeed they tend towards 1.0.
The example I spoke of was a major statewide testing program I reviewed. Tens of thousands took the examination, yet the vendor chose the relatively arbitrary path of the "rule of thumb" when determining whether items misfit. Of course they didn't. The larger the sample, the more the distribution regresses to the mean. In short, test 10,000 people and all the items look great. Use 10,000 items to score a group of people and all the people look perfect as well. Since both the MNSQ and Z-STD are sample dependent, as Mike Linacre notes in the instructions with Winsteps, we need to use them wisely. As the samples increase, items tend to conversely demonstrate misfit (>2.0) more often when using Z-STD.
So the question becomes, do you want to risk overlooking dysfunctional items (or persons) by using a predetermined MNSQ range that does not reflect the sample characteristics, or, would you rather overestimate the number of misfits. It is a Type I/II error question really. Personally, I tend to err on the side of caution. If you do wish to use MNSQ then it is more reasonable to calculate the precise error for the sample being used. For my 10,000 people, for instance, the range of good fit may look more like .96-1.04 for instance - a much different kettle of fish.
Ultimately, it is an evaluative question with statistical aids (including the unmentioned Pt. Biserial) to assist us. There are no purely algorithmic solutions that will take our judgment out of the mix.
Gregory E. Stone, Ph.D., M.A.
Assistant Professor of Research and Measurement
The Judith Herb College of Education
The University of Toledo, Mailstop #914
Toledo, OH 43606 419-530-7224
Editorial Board, Journal of Applied Measurement www.jampress.org
Board of Directors, American Board for Certification of Teacher Excellence www.abcte.org
For information about the Research and Measurement Programs at The University of Toledo and careers in psychometrics, statistics and evaluation, email gregory.stone at utoledo.edu.
From: rasch-bounces at acer.edu.au on behalf of Twing, Jon
Sent: Sat 4/28/2007 2:24 PM
To: Petroski, Greg
Cc: rasch at acer.edu.au
Subject: RE: [Rasch] Misfitting Individuals
This is often more art than science. Here is what we sometimes do:
1.) Use Rasch Person Fit to identify "anomalies" in the testing experience (this could be pure guessing, cheating or other unusual student engagements).
2.) Since most of our work requires a student score, we will score them but we might choose to drop them from the calibration.
3.) In the "old days" we might have included an asterisk indicating peculiar response string, but we have not done this in the last 10 years or so.
4.) Typically we use Mean Square Fit, INFIT and OUTFIT when diagnosing person anomalies. We typically use the values dictated for items and apply them to persons.
5.) Below are the criteria I have collected over the years.
Item INFIT and OUTFIT should be between 0.60 and 1.40 (Bond & Fox, 2001; Linacre & Wright, 1999)
Item INFIT between 0.70 and 1.30 (Bode, Heineman, & Semik, 2000; Bogner, Corrigan, Bode & Heinemann, 2000).
Mean-square fit statistics are defined such that the model-specified uniform value of randomness is 1.0. Values greater than 1.5 (more than 50% unexplained randomness) are problematic. (Wright and Panchapakesan, 1969; Linacre, 1999).
Hope this helps. Good luck.
Jon S. Twing, Ph.D.
Executive Vice President, Test & Measurement Services
Pearson Educational Measurement
2510 N. Dodge Street, P.O. Box 30, Mailstop 165
Iowa City, Iowa 52245-9945
Jon.S.Twing at Pearson.com
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Petroski, Greg
Sent: Friday, April 27, 2007 1:29 PM
To: rasch at acer.edu.au
Subject: [Rasch] Misfitting Individuals
I have a few questions centering on person-fit. The need to understand the cause of aberrant response patters is obvious. But in applications, i.e. not in the test development phase, what is does with misfitting person-responses?
Score them anyway?
Exclude them from the reporting? This could be a very unpopular solution in some applications.
Are rules of thumb for person INFIT and OUTFIT the same as when judging item fit? In which case one might not report scores for individuals with INFIT or OUTFIT exceeding certain limits. Is this done?
Gregory F. Petroski, Ph.D.
Dept. of Health Management and Informatics &
Office of Medical Research/Biostatistics
137 Hadley Hall (DC 018)
University of Missouri - Columbia,
Columbia, Mo. 65212
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch