[Rasch] Invariance of Item Difficulty, Northwest Evaluation Association "MAP" test

Trevor Bond trevor.bond at jcu.edu.au
Thu Jul 10 23:44:22 EST 2008

Dear Steve et al.,
I would have to suggest that NWEA really does have the 'invariance' 
runs on the board. Pls remind me if anyone, anywhere in the human 
sciences does invariance like this. (See B&F 2 p91)

In light of the previous argument and empirical evidence, Kingsbury's (2003)
study of the long term stability of item parameter estimates in 
achievement testing
has a number of important features. First, rather than using 
parameter estimates from
a set of items used in a single test, it investigated the stability 
of item parameter estimates
in two large item banks used to measure achievement in mathematics (> 2300
items) and reading (c.1400 items) with students from school years 
2-10 in seven US
states. (Sample sizes for the 1999-2000 school year item calibrations 
ranged from
300 to 10,000 students.) Second, the elapsed time since initial 
calibration ranged
from 7 to 22 years. Finally, and most importantly (for these 
purposes), "the one-parameter
logistic (1PL) IRT model (Wright, 1977) was used to create and maintain the
underlying measurement scales used with these banks." While thousands of items
have been added to these item banks over the course of time, each item has been
connected to the original measurement scale through the use of IRT 
procedures and
systematic Rasch measurement practices (Ingebo, 1997).
The observed correlations between the original and new item difficulties were
extremely high (.967 in mathematics, .976 in reading), more like what would be
expected if items were given to two samples at the same time, rather 
than samples
separated by a time span from 7 to 20 years. Over that period, the 
average drift in
the item difficulty parameters was .01 standard deviations of the 
mean item difficulty
estimate. In Rasch measurement terms (i.e., focusing on impact on the 
scales), the largest observed change in student scores moving from the
original calibrations to the new calibrations was at the level of the 
minimal possible
difference detectable by the tests, with over 99% of expected changes 
being less
than the minimal detectable difference (Kingsbury, 2003).

Kingsbury, G. (2003, April). A long-term study of the stability of 
item parameter estimates.
Paper presented at the annual meeting of the American Educational 
Research Association,

Trevor G BOND Ph D
Professor and Head of Dept
Educational Psychology, Counselling & Learning Needs
D2-2F-01A EPCL Dept.
Hong Kong Institute of Education
10 Lo Ping Rd, Tai Po
New Territories HONG KONG
Voice: (852) 2948 8473
Fax:  (852) 2948 7983
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20080710/2b969658/attachment.html 

More information about the Rasch mailing list