[Rasch] Question about poor reliability
Gustaf Bernhard Uno Skar
gustaf.b.skar at ntnu.no
Fri Dec 27 19:25:42 AEDT 2019
Hi. Many thanks for your thoughtful answers! I wish you all a Happy New Year!
From: Agustin Tristan <agustintristan123 at gmail.com>
Sent: Thursday, December 26, 2019 4:39 PM
To: Gustaf Bernhard Uno Skar <gustaf.b.skar at ntnu.no>
Cc: rasch <rasch at acer.edu.au>
Subject: Re: [Rasch] Question about poor reliability
Reliability from a single test is a more related to internal consistency.
I am going to use what you say: "some kids are excellent spellers, some not so much"
Now consider two theoretical extreme cases:
(a) excellent speller (a person who got the highest measures in Rasch logits or in raw scores) and
(b) low speller (a person who got the lowest Rasch measures or low raw scores)
If responses are consistent (high reliability), then the theoretically excellent (and reliable) speller has a high probablity to spell well every word and will have correct answers and the
bad (and reliable) speller has a low probability to spell well and will fail in every word, in this case you should obtain a good fit to the Rasch model, high separation and high standard deviation both in measures and raw scores.
If responses are not consistent, then an excellent (but unreliable) speller may unexpectedly fail an easy word and the low (and unreliable) speller will have correct spelling with a hard word. In this case, misfit will be present in your items, low separation and standard deviation should be smaller than the previous case.
In consequence, to use raw scores or Rasch measures (https://www.rasch.org/rmt/rmt32e.htm) does not mean that you may find different things on data. In fact:
1) if you have a high standard deviation of person's measures or a lot of items you should have a higher reliability.
2) to decide if reliability is good or bad, you should provide: (a) number of items (that you didn't tell) and (b) standard deviation of measures (that you didn't tell).
If you wish to read my paper about theoretical reliability values ( https://www.rasch.org/rmt/rmt323.pdf ) you may see what should be expected (both in separation and reliability) for your specific data set.
I hope this helps.
El mié., 25 de dic. de 2019 a la(s) 11:29, Gustaf Bernhard Uno Skar (gustaf.b.skar at ntnu.no<mailto:gustaf.b.skar at ntnu.no>) escribió:
Dear Rasch Community,
I turn to you with a reliability question. I have conducted a MFRM-analysis (in the FACETS software), using the rating scale model. I had three facets: students, raters, and dimensions of text quality (there were eight dimensions). Each dimension was rated by at least two raters on five-point scales. The separation reliability for students was good: .94.
In a subsequent analysis, I pulled one dimension out to create two measures. The first (call it a limited writing proficiency measure) still showed high enough separation reliability (.94), but the second (spelling) indicated too low reliability (0.00 according to the Software output). Now here’s the question: what should I do with the spelling measure? To be sure, there is a lot of variation in the data (some kids are excellent spellers, some not so much).
Using the raw scores is one option, using the logit values another. Using raw scores will have some unwanted consequences (such as having a measure limited to the steps 1, 1.5, 2, 2.5 etc). Using logits, however, may not seem justifiable because the analysis was not able to reliably separate the students.
Any food for thoughts or hints about relevant papers/readings will be much appreciated.
Gustaf B. Skar, Ph.D.
Norwegian University of Science and Technology (NTNU)
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch