[Rasch] Invariance of Item Difficulty, Northwest Evaluation Association "MAP" test
Steven L. Kramer
skramer1958 at verizon.net
Thu Jul 10 21:13:25 EST 2008
Dear Rasch modelers,
Please see the note below to the test coordinator at my school.
Are my concerns valid and my description of Rasch model limitations accurate? Or do I need to correct the note below? (Note: I state the "invariance" hypothesis weakly as dealing with order of item difficulties vs. actual item difficulties--but I think the weaker statement is easier to deal with in practical terms.)
Also: What if we really administer the entire scale and want to test whether the order-invariance hypothesis is violated when comparing my school's group to the NWEA norming group? How would we do that for the test as a whole? (i.e., see if we can reject the null, "the order invariance postulate is not violated" and also find a way to quantify any violation sufficiently to see whether the violation is practically as well as statistically significant?)
----- Original Message -----
From: Steven L. Kramer
To: pholman at baltimorefreedomacademy.org
Sent: Wednesday, July 09, 2008 7:00 PM
Subject: Northwest Evaluation Association "MAP" test
Hi Ms. Holman,
It was good meeting you today! I did want to share with you some worries on the NWEA MAP test.
I have a lot of respect for NWEA: they seem to be good at what they do. Also, I think their technique of Rasch modeling is very useful in education, and I think that their attempts to encourage teachers and schools to use test data for formative purposes are worthwhile. But I fear that the output of their tests may not be useful at Freedom Academy.
THE CASE AGAINST THE MAP
NWEA uses a Rasch model for its MAP test. The Rasch model takes a bunch of test items and lines them up in difficulty, from least difficult to most difficult. Each item difficulty is quantified (usually called "delta"). Then, each student, based on his/her test score, is assigned an ability (usually called "theta"). For each student and each item, there is a 50% chance (odds are 1-1) of getting an item correct if the item's "delta" equal's the student's "theta". In general, the odds of getting an item correct are e^(theta minus delta).
That means, all the MAP test has to do is determine a student's Theta, and the testers can say, "She has a 95% chance of getting these particular items correct, so she already knows them and it would be a waste of time to teach them to her; a 50% chance of getting these other items correct, so they are appropriate for practice; only a 15% chance of getting these other items correct, so they may not yet be in her "zone of proximal development"."
The problem with the MAP is that in order for the model NWEA uses to produce valid output, the items must meet the Rasch assumption of "the invariance of item difficulties"--that is, no matter what group of students you use to determine item difficulties, the items line up in pretty much the same order from easiest to hardest. I'm pretty sure that NWEA has done some checking to show that this is not a problem for different demographic groups, i.e., I think they checked that the order is the same for Blacks as for Whites, for Boys as for Girls, and so on. However, I FEAR THAT THE ORDER OF ITEM DIFFICULTIES MAY NOT BE THE SAME FOR STUDENTS WHO USED Connected Mathematics AS IT IS FOR STUDENTS WHO ARE TAUGHT DIRECTLY FROM MARYLAND'S VSC.
Below are four tested objectives from the VSC. Based on the order in which they are taught, I believe they are lined up pretty much from easiest to hardest for students who used the VSC--but by the same notion, they are perhaps lined up from hardest to easiest for students going through Connected Mathematics.
1. Identify and write inequalities to represent relationships
2. Plot the result of one transformation (translation, reflection, rotation) on a coordinate plane
3. Organize and display data to make circle graphs
4. Estimate and determine the circumference or area of a circle
Most Maryland schools teach directly from the VSC. For example, in Howard County, the EMIG electronic curriculum is the county curriculum, and the EMIG is basically a set of carefully designed activities sequenced to follow the VSC precisely. (Aside: I'm not advocating the Howard County solution. I'm on the Citizen's Math Advisory Committee for Howard County and we're not satisfied with using the VSC. Freedom Academy's solution is better.) I believe the MAP was calibrated in Maryland using a large representative sample--which means that, in essence, it was calibrated to a large group of students who mostly used curricula like the EMIG. For such students, items will likely line up in difficulty just as NWEA expects, and the MAP output and formative recommendations will be valid.
For us, the output might perhaps be meaningless. If a student has a 50% chance of "identifying and writing inequalities to represent relationships" the MAP would recommend that the student study that topic, and note that the student is not yet ready to study the second third, and fourth objectives on the list I show above--wheras in our classrooms, the student would likely already have mastered those objectives!
Even worse, NWEA sometimes uses an adaptive test to determine a student's THETA. For technical reasons, the adaptive test is even more dependant on the Rasch "Invariance of Item Difficulties" assumption than is a normal test--and for our school, this assumption might be violated!
THE CASE FOR THE MAP
I'm not actually certain that item difficulty will line up differently for our students than it will for typical Maryland students. If NWEA administered the full test (not just an adaptive test) to our students, then we could find out. My fears could prove invalid. It is an empiracally testable issue--and frankly, I'd love to find out because other schools would find the answer useful.
OUR OWN FORMATIVE WORK
As I understand it, my colleagues and I will spend a lot of time analyzing and reporting student work by specific objectives--whether students "not yet met" "partially met" "met" or are "advanced" on very specific skills. (My exact terminology isn't correct, but I know there are four levels). Students won't just get a "D" on a topic and move on: if they haven't learned it, we will work to teach it to them. This internal formative work will be very valuable. I do want to make sure that the MAP does not detract time and energy from it. Six days' testing will be a painfully high price to pay if we really do administer the MAP six times. Further, staff time analyzing the results may be better spent on other things, especially if the MAP "thetas" are not valid due to violation of Rasch model assumptions.
All for now,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch