[Rasch] Misfitting Individuals

Michael Lamport Commons commons at tiac.net
Mon Apr 30 11:23:51 EST 2007

So make up a measure of misfit that corrects for n.  


My Best,


Michael Lamport Commons, Ph.D.
Assistant Clinical Professor
Program in Psychiatry and the Law
Department of Psychiatry
Harvard Medical School
Beth Israel Deaconess Medical Center
234 Huron Avenue
Cambridge, MA 02138-1328


Telephone (617) 497-5270
Facsimile (617) 491-5270
Commons at tiac.net




From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Stone, Gregory
Sent: Sunday, April 29, 2007 8:26 PM
To: Twing, Jon; Petroski, Greg
Cc: rasch at acer.edu.au
Subject: RE: [Rasch] Misfitting Individuals


The one caveat in this description are the "rules of thumb".  As noted they
are as variable as are the citations they are found among.  Having used MNSQ
since I began, I was recently made amptly aware of the problems associated
with them.  Rules of thumb are OK for a few hundred, but become less than
useful as the sample size increases.  Indeed they tend towards 1.0.  


The example I spoke of was a major statewide testing program I reviewed.
Tens of thousands took the examination, yet the vendor chose the relatively
arbitrary path of the "rule of thumb" when determining whether items misfit.
Of course they didn't.  The larger the sample, the more the distribution
regresses to the mean.  In short, test 10,000 people and all the items look
great.  Use 10,000 items to score a group of people and all the people look
perfect as well.  Since both the MNSQ and Z-STD are sample dependent, as
Mike Linacre notes in the instructions with Winsteps, we need to use them
wisely.  As the samples increase, items tend to conversely demonstrate
misfit (>2.0) more often when using Z-STD.  


So the question becomes, do you want to risk overlooking dysfunctional items
(or persons) by using a predetermined MNSQ range that does not reflect the
sample characteristics, or, would you rather overestimate the number of
misfits.  It is a Type I/II error question really.  Personally, I tend to
err on the side of caution.  If you do wish to use MNSQ then it is more
reasonable to calculate the precise error for the sample being used.  For my
10,000 people, for instance, the range of good fit may look more like
.96-1.04 for instance - a much different kettle of fish.


Ultimately, it is an evaluative question with statistical aids (including
the unmentioned Pt. Biserial) to assist us.  There are no purely algorithmic
solutions that will take our judgment out of the mix.


Good luck!


Gregory E. Stone, Ph.D., M.A.
Assistant Professor of Research and Measurement
The Judith Herb College of Education
The University of Toledo, Mailstop #914
Toledo, OH 43606   419-530-7224
Editorial Board, Journal of Applied Measurement     www.jampress.org
Board of Directors, American Board for Certification of Teacher Excellence
For information about the Research and Measurement Programs at The
University of Toledo and careers in psychometrics, statistics and
evaluation, email gregory.stone at utoledo.edu.



From: rasch-bounces at acer.edu.au on behalf of Twing, Jon
Sent: Sat 4/28/2007 2:24 PM
To: Petroski, Greg
Cc: rasch at acer.edu.au
Subject: RE: [Rasch] Misfitting Individuals



This is often more art than science.  Here is what we sometimes do:


1.)     Use Rasch Person Fit to identify "anomalies" in the testing
experience (this could be pure guessing, cheating or other unusual student

2.)     Since most of our work requires a student score, we will score them
but we might choose to drop them from the calibration.

3.)     In the "old days" we might have included an asterisk indicating
peculiar response string, but we have not done this in the last 10 years or

4.)     Typically we use Mean Square Fit, INFIT and OUTFIT when diagnosing
person anomalies.  We typically use the values dictated for items and apply
them to persons.

5.)     Below are the criteria I have collected over the years.


Item INFIT and OUTFIT should be between 0.60 and 1.40 (Bond & Fox, 2001;
Linacre & Wright, 1999)

Item INFIT between 0.70 and 1.30 (Bode, Heineman, & Semik, 2000; Bogner,
Corrigan, Bode & Heinemann, 2000).

Mean-square fit statistics are defined such that the model-specified uniform
value of randomness is 1.0.  Values greater than 1.5 (more than 50%
unexplained randomness) are problematic. (Wright and Panchapakesan, 1969;
Linacre, 1999).


Hope this helps.  Good luck.





Jon S. Twing, Ph.D.

Executive Vice President, Test & Measurement Services


Pearson Educational Measurement

2510 N. Dodge Street, P.O. Box 30, Mailstop 165

Iowa City, Iowa  52245-9945

Phone: 319-339-6407

Fax: 319-339-6477

Cell: 319-331-6547


Jon.S.Twing at Pearson.com





From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Petroski, Greg
Sent: Friday, April 27, 2007 1:29 PM
To: rasch at acer.edu.au
Subject: [Rasch] Misfitting Individuals


I have a few questions centering on person-fit.  The need to understand the
cause of aberrant response patters is obvious. But in applications, i.e. not
in the test development phase, what is does with misfitting


Score them anyway?

Exclude them from the reporting?  This could be a very unpopular solution in
some applications.  


Are rules of thumb for person INFIT and OUTFIT the same as when judging item
fit?  In which case one might not report scores for individuals with INFIT
or OUTFIT exceeding certain limits.  Is this done?



Gregory F. Petroski, Ph.D.

Dept. of Health Management and Informatics &
Office of Medical Research/Biostatistics
137 Hadley Hall (DC 018)
University of Missouri - Columbia,
Columbia, Mo.  65212





-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20070429/65f1abdc/attachment.html 

More information about the Rasch mailing list