[Rasch] How many anchor items do I need

Robert Hess Robert.Hess at ASU.edu
Wed Jul 11 08:43:00 EST 2007


Steve,
I don't believe there is as easy an answer as you might hope for.  First, one of the primary concerns of any IRT anchoring project is the assumption that the population being tested shares a very significant trait: they have all experienced the same instructional events.  Trying to equate two different tests where subjects have been presented two different forms of instruction, even in the same subject matter would be extremely difficult.  Hence the problem that many states have been trying to convey to the federal government concerning the requirements of NCLB.

It is difficult enough to ensure that different school districts within the same state have received essentially the same instruction when we equate two different forms (A & B for example).  In such a case, the rule of thumb that I have used for over 25 years, is to utilize a minimum of 25% of the total number of items (or, 20, which ever number is less).  Now, it is also important to remember that the types of items chosen for equating is very important.  I refer you to the Lords' seminal work where he details the effect of removing/adding items of different difficulty to a given test -- the same warning should be taken in choosing items to act as equating items.

While Rasch is very robust and will actually allow a fairly highly correlated set of test scores with as few as six items on a test with 30 total items, I feel much more comfortable with at least 10 common items shared between two tests -- I know I just violated my 25% rule stated above, but the lower the total number of items on a given test the more likely I have been to raise the percentage of shared items.  Nonetheless, it still behooves me to verify that my students taking the test have all experienced essentially the same instructional events -- perhaps instruction based upon the same carefully stated (and measurable) objectives.  It is in this last sentence that you will run into a significant degree of problems.

It is true that the most effective way to equate two different tests is either with common people or common items. But unless you can prove beyond a reasonable doubt that tested students received common and shared curricular experiences the courts will have a field day with you -- see the results of the challenge to the state of Texas' statewide testing program.  In fact, I'm sure that several members of this list could provide you with a detailed description of what the court felt was necessary in order to provide evidence supporting what it called "curricular validity".  Any test attempting to link assessment on the level you propose will have to stand this judicial test.  Personally, I don't think it's possible.  We do not at this time possess a national curriculum and until we do so will not be able to compose tests that cross state boundaries.  Any attempt to force equating between groups of students who have not received instruction under the same carefully worded (and measurable) outcomes will prove fruitless -- even though one might obtain a significant correlation between the two tests, the correlation might simply be a result of large numbers.

I know, I just provided a non-statistical response to your question, but believe me, I have done equating for at least five different states and have run into enough problems trying to ensure "curricular validity" just at the state level.  I will be interested in hearing what others on this list have to say about your conundrum.

Robert Hess
Professor Emeritus of Measurement and Evaluation
Arizona State University
  ----- Original Message ----- 
  From: RISA KRAMER 
  To: rasch at acer.edu.au 
  Sent: Tuesday, July 10, 2007 5:02 AM
  Subject: [Rasch] How many anchor items do I need


  Dear Rasch colleagues,

  I'm trying to determine how many anchor items one needs to equate two subject-matter tests.  I estimated 20, or 25% of total test, whichever is greater, based on a rule-of-thumb they used at ETS and on one article that found on Florida's FCAT that 15 or 20 anchor items worked as well as 30.

  (Note: I'm thinking here about how one would create a pool of items for the U.S. NCLB to form a de-facto standard of "what should be taught," with the pool of items forming a portion (25%) of each state's test.  I realize the equating won't be perfect, but the goal is a good rough measure. Here is what I'm saying about the idea currently:
  It should also be noted that it is impossible to precisley equate two tests designed to different standards.  Strictly speaking, two students' scores can  be directly compared only if they are being tested on the same thing.  If State A emphasizes statistics in eighth grade math and 
  State B emphasizes geometry, it is not precisely possible to compare the math knowledge of students from the two states. Who is the more "knowledgeable" student: the one good at statistics or the one good at geometry?  Thus,equated  math (or reading) scores of State A and State B will be a rough reflection of how much math (reading) students know.  If in a particular state students who get a lot of state test items correct also tend to get a lot of the more difficult "common core" items correct, then high scores on the state test will equate to very high scores on the common scale.  If in a particular state many students tend to get many state test items correct while missing most of the"common core" items, then high scores on the state test will equate to somewhat lower scores on the common scale.)

  So:  I know the equating will be imperfect.  
  But am I correct that 20 items are sufficient for this imperfect equating?

  Steve Kramer
  Math Science Partnership of Greater Philadelphia


------------------------------------------------------------------------------


  _______________________________________________
  Rasch mailing list
  Rasch at acer.edu.au
  http://mailinglist.acer.edu.au/mailman/listinfo/rasch



------------------------------------------------------------------------------


  No virus found in this incoming message.
  Checked by AVG Free Edition. 
  Version: 7.5.476 / Virus Database: 269.10.2/893 - Release Date: 7/9/2007 5:22 PM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20070710/cdc7d4c5/attachment.html 


More information about the Rasch mailing list