[Rasch] Estimating Rasch Measures for Extreme Scores

Agustin Tristan ici_kalt at yahoo.com
Sun May 1 00:03:03 EST 2011

Dear Iasonas:
I think you are facing some problems using the Rasch model, that would be the same using raw scores or IRT or any other model. I am certain that the problem does not concern the Rasch model but the origin of your data, I mean the validity of the scores that your student is using for the analysis, and hence the problem concerns the test design.
I understand that a judge sets a score from A to E according to specific criterion (including technical, ethical and other elements), but you only have ONE score that may represent a set (big or small) of items. In cases like yours I think it is better to define an analytical criterion where every aspect or trait will be translated into one item and instead an A you may have 5, 6, 10, 20 or more items. This eliminates the fact that several persons will have extreme scores without meaning because it is very improbable that all the As mean exactly the same ability, knowledge, competency, etc. among different persons.
To give you an example, we proceeded with an analytical criterion to score performances of children (6 to 8 years old) in several activities at school. In this case where the teachers used to assign a letter i a competencey (say spatial competency, verbal competency, and so on), we got several items (position up-down, side left-right, size big-small, angle vertical-horizontal-inclined, selection of color, etc.). 
It is possible that you are not able to recode or translate the holistic values into analytical items, and your problem will still be there, nevertheless I am sure that you can be more than sure that it is not a Rasch model defect but a test design defect. I consider that statistics or CTT or Rasch model should not be used to solve the test design problem, nevertheless the other proposals by colleagues in this listserve are very wise.


Ave. Cordillera Occidental No. 635

Colonia Lomas 4ª San Luis Potosí, San Luis Potosí.

C.P. 78216 MEXICO

(52) (444) 8 25 50 76 / (52) (444) 8 25 50 77 / (52) (444) 8 25 50 78

Página Web (en español): http://www.ieia.com.mx

Web page (in English): http://www.ieesa-kalt.com/English/Frames_sp_pro.html

--- On Sat, 4/30/11, liasonas at cytanet.com.cy <liasonas at cytanet.com.cy> wrote:

From: liasonas at cytanet.com.cy <liasonas at cytanet.com.cy>
Subject: Re: [Rasch] Estimating Rasch Measures for Extreme Scores
To: "Luis Carlos Orozco" <lcorovar at gmail.com>
Cc: "rasch list" <rasch at acer.edu.au>
Date: Saturday, April 30, 2011, 12:17 AM

Thank you, 

indeed, it seems that there are too many people reaching the ceiling effect with a 3x4x (1-5 scale) design. Maybe this is one of the weak points of the current system...but the point is that somebody argues that the use of raw scores would solve the problem of the raw scores, and we may need to see if there is a published answer to this...
Sent from my BlackBerry® smartphone

From: Luis Carlos Orozco <lcorovar at gmail.com> 
Date: Fri, 29 Apr 2011 19:17:34 -0500
To: Iasonas Lamprianou<liasonas at cytanet.com.cy>
Cc: rasch list<rasch at acer.edu.au>
Subject: Re: [Rasch] Estimating Rasch Measures for Extreme Scores

Maybe this from winsteps  be of some "help".
There is no "correct" answer to the question: "How large should ExtremeScoreAdjustment= be?" The most conservative value, and that recommended by Joseph Berkson, is 0.5. Some work by John Tukey indicates that 0.167 is a reasonable value. The smaller you set EXTRSC=, the further away measures corresponding to extreme scores will be located from the other measures. The technique used here is Strategy 1 in www.rasch.org/rmt/rmt122h.htm.

Another thing to say is that your measure obtained from (4x3) is not good enough to measure some people.

Luis C. Orozco V.

MD MSc Epidemiologia

EScuela de Enfermería

Universidad Industrial de Santander


2011/4/29 Iasonas Lamprianou <liasonas at cytanet.com.cy>

Dear colleagues,
I rarely submit requests in this list unless it is urgent and important because I respect the time of the people who tend to reply most often. I would like to thank them. This time, it is important and that is why I politely request you to help me. The post is long, but it has to do with the problems Rasch-users face in the harsh world of academia. I think that the post concerns most of us.

I am trying to help a student with her PhD thesis (so I am writing on her behalf). She submitted her thesis and her examiners spotted some problems and she has to address them.

The problem: The PhD thesis is about the performance of students.  For each student participating in the study (N>1000), the researcher has his/her score on four subjects: language, science, maths and history. For each subject, each student has three teacher assessments which were awarded in January, March and June. Each score runs from E (Failure) to A (Excellent). So, overall, each student has three ordinal teacher assessment measures for each of four subjects. It is a typical repeated measures case for four variables/subjects with three measures per variable/subject.

Design: Since the data are ordinal (E=1=Failure to A=5=Excellent) the researcher used a Partial Credit Rasch model with three  items  to build four Ability scales, one for each subject (the Rating Scale did not have good fit). Also, the student used all 12 scores (4 subjects X 3 measures) to produce one overall Ability  Academic Performance  measure. Then, the researcher used these Rasch ability measures as dependent variables to run OLS regressions.

Issue 1:
A serious problem spotted by the examiners is that a large proportion of students (around 20%) has perfect scores (three  A s) on some of the four subjects. The researcher used a Winsteps routine to find measures of ability for those students with extreme scores. The examiner has major reservations about the validity of this decision and asks whether these data (extreme scores) should be dropped. The examiner says:  If a Rasch analysis is to be used to derive attainment scores, the final distribution must provide a realistic representation of attainment. This means that the large group of candidates who achieve perfect scores (on the extreme right of the histograms) need to be properly represented. These scores need to be appropriately dealt with by Rash (if this is possible), or they need to be removed from the analysis (with an
assessment made about the impact of the resulting loss of data).
To the defense of the researcher, the distance between the  perfect score  and the  perfect-1  estimate is neither huge nor unreasonable: it is around 1.4 logits on a scale which extends from around -11 to 11 logits. When the researcher draws the scatterplot between raw scores and logits, the sigma-curve looks beautifully smooth and the estimates of the extreme scores look neither  too extreme  nor out of tune with the rest data points on the scatterplot. The distance between the  perfect score  and the  perfect-1  estimate is not grossly out of line compared to the other distances between raw scores estimates (for example, the distance between the  perfect-1  and the  perfect-2  scores is only around 0.3 logits smaller).
(a) The researcher needs strong references to defend her decision NOT to drop the extreme data estimates. Can anyone please provide strong peer-reviewed papers to support the decision to keep the extreme score estimates as valid representations of the ability of the participants?

Issue 2:
Stemming from the previous comment, one of the suggestions of the examiners is that the researcher could ditch the Rasch model and instead sum the three measures in one subject (e.g. A+B+B=5+4+4=13) and then use this sum for an OLS regression. The examiner says  A serious discussion needs to be held about the benefits, if any, the Rasch analysis provides over a more direct analytical path (e.g.   a linear regression of results averaged over three   [teacher assessments] . We all know that this is simply wrong to do because we cannof average ordinal measures and the student already explains this in her Methodology section, but she probably needs more references.
(b) Can anyone please provide a list of (recent, if possible) papers in good peer-reviewed journals which explain that this is not the right thing to do?

Issue 3:
Another suggestion of the examiners is that the researcher could ditch the Rasch model and just use the ordinal measure (E=1=Failure to A=5=Excellent) as a dependent variable in a proportional odds models. This means that the researcher should run three different models for each subject (for the Teacher Assessment awarded in January, March and June).
(c) Can anyone pleased provide a list of (recent, if possible) papers in good peer-reviewed journals which explain that this is NOT better than using the Rasch model to get one linear measure instead of three ordinal?

I feel that the examiners did a very good job overall and were very fair and consistent. They spent too much time to read every little detail in a long thesis, they spotted some important issues and we need to credit them for this. I feel that we may want to help the student address these interesting issues to the full satisfaction of the examiners.

Thank you for your time

In anticipation of your help
Jason Lamprianou
University of Cyprus

Rasch mailing list
Rasch at acer.edu.au
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/lcorovar%40gmail.com

-----Inline Attachment Follows-----

Rasch mailing list
Rasch at acer.edu.au
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/ici_kalt%40yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20110430/17105107/attachment.html 

More information about the Rasch mailing list