[Rasch] Rasch scaling
Steve Kramer
skramer1958 at verizon.net
Fri Jun 27 06:17:44 EST 2014
Yes: I see that we need to spend more time on test blueprint and construct validity to make sure “what is included in the exam” really tracks to what we value. But my question was a bit more technical:
As a scale for reporting, have others used a scale where 100=goal theta; scale score is determined by 100-10*#logits below goal theta? I’m proposing this to be a better form of scaling than setting a particular mean or sd, since scale scores are then meaningful and easy to interpret, while still closely approximating peoples’ comfort level of “60 is failing, 80 is kind-of ok, 100 is great”.
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Agustin Tristan
Sent: Thursday, June 26, 2014 3:12 PM
To: rasch at acer.edu.au
Cc: John Baker
Subject: Re: [Rasch] Rasch scaling
Hello Steve:
In Winsteps you can define the value to scale the logit and the translation for the center of the scale. With this you can avoid values above 100 or below 0 if you need. Other manipulations are possible but they depend on your own considerations about the students achievements and what is reasonable as a minimum of performance.
But when you say that 100 really does mean "You learned everything we asked of you", it is not completely true. In fact, if a person answers 100% of the items (or 98% or whatever you are considering), you can say "You respond correctly what is included in the exam", because in a course you include many more things that are not contained in the test. If you want to say that this "measure" corresponds to what the student should learn, then you must demonstrate that the test matches a well defined domain of knowledge and tasks to be performed by the student (you can use the test blueprint) and that your test has a satisfactory number of items that are a representative sample of those topics (this is also part of the test blueprint). The cutoff point to pass the exam is another thing to define and there are several approaches for that using the Rasch model, and whatever you decide could be reasonable or not, but at this moment you must give a better foundation to your design and the scale.
Hope this helps.
Regards
Agustin
INSTITUTO DE EVALUACION E INGENIERIA AVANZADA.
Ave. Cordillera Occidental No. 635
Colonia Lomas 4ª San Luis Potosí, San Luis Potosí.
C.P. 78216 MEXICO
(52) (444) 8 25 50 76 / (52) (444) 8 25 50 77 / (52) (444) 8 25 50 78
Página Web (en español): http://www.ieia.com.mx
Web page (in English): http://www.ieesa-kalt.com/English/Frames_sp_pro.html
On Thursday, June 26, 2014 12:22 PM, Steve Kramer <skramer1958 at verizon.net> wrote:
I would like the group's opinion on a method I just finished using to translate Rasch "Thetas" to more easily interpretable scale scores. I think the method is exciting and perhaps useful to others. At first, I expected it to be a common approach--but I was using WinSteps, and I didn't find any rescaling procedures much like it in the WinSteps manual. So maybe I have an approach that, while kind-of obvious and quite useful, is nonetheless new.
My group is working with an Egyptian STEM school. We needed to administer and report scores on six "concept" tests, one each on Biology, Chemistry, Geology, Physics, "Pure" math, and Applied Math/Mechanics. On some of the tests, the items were such that good students at the school "ought" to be able to answer every question correctly. On other tests, the items were such that maybe good students "ought" to be able to answer 35 of 40 questions correctly, with an additional 5 extra-challenging items that only top students might answer. We originally wanted to use Rasch for two main reasons: so that we could use item diagnostics on a pilot to identify/fix problematic items an so that next year when we change the test we can use "anchor items" to put next year's tests on the same scale as this year's. But, once we output the Rasch theta scores, we decided to use those, instead of number-correct, as the basis for the final scores we reported out. Here is what we did and why,
The School and Egyptian Ministry folks grade students on a percentile scale out of 100, with 60% as failing. On some of these tests, the raw percentiles showed "too many" kids failing, and we were asked to consider rescaling all tests to a mean of 80, sd of 10, with our choice of using number-correct or Rasch Theta as the basis for the scaling. BUT the Tests of Concepts were intended to be criterion-referenced (do the kids know what is expected of them?) not norm-referenced (how does everyone rank?) and the mean-80 sd-10 throws out any meaningful criteria.
So here is what we did instead.
Step 1: For each test, set a "Goal". This goal is the number-correct we are really trying to get all the STEM school kids to achieve if the curriculum we are using works as intended . On some tests, this would be all items correct (e.g., 41 out of 41). On other tests it might be the 35 non-super-challenging questions (e.g. 35 out of 40).
Step 2: Give students a who achieved one less than the ideal score a "scale score" of 100.
We chose "one less than" the ideal score because a "perfect score" can't be assigned a Theta without extra assumptions beyond the normal Rasch model. Thus, a score of 100 would be given to a student who correctly answered 40 out of 41 (in the case where the ideal is a perfect score of 41) or 34 out of 40 (in the case where the ideal is a score of 35 out of 40). The latter was a judgment call, since a good Theta could often be computed for 35 out of 40--but in practice the "ideal" scores tended to be for the very top students, with a fairly wide sd of the Rasch Theta, so we decided to give kids the benefit of the doubt and set 100-score at one less than the "ideal".
Step 3: For each student, compute the scale score as follows. First, compute "Theta of this student minus Theta of students who achieved scale score of 100" , then multiply by ten, and add to 100. This is the student's scale score.
Step 4: Round to the nearest whole number, and truncate scores above 100 to 100. (I would have liked to allow scores above 100 to indicate students who showed skill above the "goal" but our clients wouldn't have it. The option of allowing scores above 100 next year, when we put next year's test on the same scale as this year's, is still on the table.)
The great thing about this approach is that it creates a scale where 100 really does mean "You learned everything we asked of you", 60 really does mean "You failed to learn even close to what we wanted" and the difference between 60 and 100 means the same across all tests, instead of depending on the vagaries of particular test construction or the standard deviation of a particular group of students. Here is how we described the scale to our clients:
"Across all tests, we strongly recommend using the recommended Scale Score. Even in cases where raw scores were acceptable, the Scale Score is a more reliable score to interpret across the tests. The Scale Score was created using the estimated ability of each student produced out of the Rasch analysis. Called a Theta, Rasch analysis uses the difficulty of each test item to produce a more accurate “ability level” for each student than a simple raw score. We then scaled the Thetas to a Scale Score, where 100 was set to be the Highest Observable Scale Score, or HOSS. A score of 100 is earned by students who scored at the top of the group, showing excellent conceptual understanding of the concepts covered by an exam. The Scale Score is a simple linear transformation of the Thetas. Each scale score point is equivalent to one tenth of a Rasch “logit”. Students who score below a 60 on this scale are more than 4 logits below students who score 100, which reduces the odds of their correctly answering any given question of the test by a factor of just over 50. (Technically, the factor is e^4.) This means for example that a student with a score of 60 would have only a 47% chance of correctly answering a question that a person with a score of 100 has a 98% chance of answering correctly—and only a 2% chance of correctly answering a question that a student with a scale score of 100 has a 50% chance of answering correctly. We believe this makes a 60 a justifiable cut score for those who fail."
Have others used this approach?
Is it as good as I think it is?
Steve Kramer
The 21st Century Partnership for STEM Education
________________________________________
Rasch mailing list
email: Rasch at acer.edu.au
web: https://mailinglist.acer.edu.au/mailman/options/rasch/ici_kalt%40yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20140626/ee532f3e/attachment-0001.html
More information about the Rasch
mailing list