[Rasch] Question about poor reliability
Gustaf Bernhard Uno Skar
gustaf.b.skar at ntnu.no
Thu Dec 26 19:20:37 AEDT 2019
Many thanks for your answer. You have interpreted the rating situation correct, and I agree that more assessments would have been preferable! Let me just add something: taking a CTT approach (i.e. the Intraclass Correlation Coefficient), the reliability between raters on the spelling scale averages around 0.78. (There were several raters.) Intuitively, and communicatively (?), it seems more reasonable to report and use raw scores then!?
From: Mike Linacre <mike at winsteps.com>
Sent: Wednesday, December 25, 2019 11:46 PM
To: rasch <rasch at acer.edu.au>
Subject: Re: [Rasch] Question about poor reliability
Thank you for your question.
Three things influence the reliability and separation or your students (for Rasch or CTT):
1) the spread of the student measures/scores (not the number of students): more spread, higher reliability
2) the number of ratings of each student: more ratings, higher reliability
3) the number of categories in the rating scale: more categories, higher reliability
Gustaf, based on your description, there is good spread of the spellers' measures/scores, and there are plenty of categories in the rating scale. It looks like there are only 2 ratings of each speller. An average of 4 ratings of each speller are probably needed.
On 12/26/2019 3:29 AM, Gustaf Bernhard Uno Skar wrote:
Dear Rasch Community,
I turn to you with a reliability question. I have conducted a MFRM-analysis (in the FACETS software), using the rating scale model. I had three facets: students, raters, and dimensions of text quality (there were eight dimensions). Each dimension was rated by at least two raters on five-point scales. The separation reliability for students was good: .94.
In a subsequent analysis, I pulled one dimension out to create two measures. The first (call it a limited writing proficiency measure) still showed high enough separation reliability (.94), but the second (spelling) indicated too low reliability (0.00 according to the Software output). Now here’s the question: what should I do with the spelling measure? To be sure, there is a lot of variation in the data (some kids are excellent spellers, some not so much).
Using the raw scores is one option, using the logit values another. Using raw scores will have some unwanted consequences (such as having a measure limited to the steps 1, 1.5, 2, 2.5 etc). Using logits, however, may not seem justifiable because the analysis was not able to reliably separate the students.
Any food for thoughts or hints about relevant papers/readings will be much appreciated.
Gustaf B. Skar, Ph.D.
Norwegian University of Science and Technology (NTNU)
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
Mike Linacre, mike at winsteps.com<mailto:mike at winsteps.com> or winsteps1234 at gmail.com<mailto:winsteps1234 at gmail.com>
Winsteps 4.4.8 and Facets 3.83.0 - www.winsteps.com<http://www.winsteps.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch