[Rasch] How many anchor items do I need

Agustin Tristan ici_kalt at yahoo.com
Wed Jul 11 10:08:47 EST 2007


Hello!
  And once you do equating between populations (specially linear equating) you will have the same mean and std deviation for all the groups, how can you say that a group is better or worse than the other if both have the same mean? if you do this equating with the mean of the previous year, how can you know if some improvement occurs year after year, if all have the same mean? Under these circumstances, I'm able to predict the value for the mean and stdn deviation on the following years: both will follow my equating data. 
  And if students didn't have the same opportunity to learn, once equated, how can you demonstrate that they didn't learn a different curriculum or a different level, if all of them have the same mean?
   
  Probably we have to demonstrate first that the differences are due to the test and not to the students, if it is the case, we should do equating, but if we can demonstrate that the test does not produces the differences and they are the result of differences among populations, then we should not do equating. The problem is how can we demonstrate one or the other?
  Anchor items is certainly a part of the solution. How many anchor items? according to the book by Wright "Best test design", we could even reduce up to one common item, it is to low, he proposes to use about 5 items, probably 10 is better, we see in this forum that 20 or 25, even 30 could be a good figure. Other friends are using 50% of a test (that means from 50 to 100 items). An option is the amount that permits to feel comfortable, but a model regarding the standard error could be better. I have no solution for the moment.
   
  We live together with the same problem.
   
  regards
  Agustin Tristan

"Hendrickson, Peter" <PHendrickson at everettsd.org> wrote:
          ...and even if we had a common curriculum, how would we know that the instruction had a similar degree of commonality?  Our commitment to monitoring fidelity of implementation is honored, for the most part, in the breach.  
   
  //Peter//
   
  Peter Hendrickson
  Assessment Specialist
  Everett (WA) Public Schools
    


    
---------------------------------
  From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Robert Hess
Sent: Tuesday, July 10, 2007 3:43 PM
To: RISA KRAMER; rasch at acer.edu.au
Subject: Re: [Rasch] How many anchor items do I need


  
  Steve,
I don't believe there is as easy an answer as you might hope for.  First, one of the primary concerns of any IRT anchoring project is the assumption that the population being tested shares a very significant trait: they have all experienced the same instructional events.  Trying to equate two different tests where subjects have been presented two different forms of instruction, even in the same subject matter would be extremely difficult.  Hence the problem that many states have been trying to convey to the federal government concerning the requirements of NCLB.

It is difficult enough to ensure that different school districts within the same state have received essentially the same instruction when we equate two different forms (A & B for example).  In such a case, the rule of thumb that I have used for over 25 years, is to utilize a minimum of 25% of the total number of items (or, 20, which ever number is less).  Now, it is also important to remember that the types of items chosen for equating is very important.  I refer you to the Lords' seminal work where he details the effect of removing/adding items of different difficulty to a given test -- the same warning should be taken in choosing items to act as equating items.

While Rasch is very robust and will actually allow a fairly highly correlated set of test scores with as few as six items on a test with 30 total items, I feel much more comfortable with at least 10 common items shared between two tests -- I know I just violated my 25% rule stated above, but the lower the total number of items on a given test the more likely I have been to raise the percentage of shared items.  Nonetheless, it still behooves me to verify that my students taking the test have all experienced essentially the same instructional events -- perhaps instruction based upon the same carefully stated (and measurable) objectives.  It is in this last sentence that you will run into a significant degree of problems.

It is true that the most effective way to equate two different tests is either with common people or common items. But unless you can prove beyond a reasonable doubt that tested students received common and shared curricular experiences the courts will have a field day with you -- see the results of the challenge to the state of Texas' statewide testing program.  In fact, I'm sure that several members of this list could provide you with a detailed description of what the court felt was necessary in order to provide evidence supporting what it called "curricular validity".  Any test attempting to link assessment on the level you propose will have to stand this judicial test.  Personally, I don't think it's possible.  We do not at this time possess a national curriculum and until we do so will not be able to compose tests that cross state boundaries.  Any attempt to force equating between groups of students who have not received instruction under the same carefully worded
 (and measurable) outcomes will prove fruitless -- even though one might obtain a significant correlation between the two tests, the correlation might simply be a result of large numbers.

I know, I just provided a non-statistical response to your question, but believe me, I have done equating for at least five different states and have run into enough problems trying to ensure "curricular validity" just at the state level.  I will be interested in hearing what others on this list have to say about your conundrum.

Robert Hess
Professor Emeritus of Measurement and Evaluation
Arizona State University
    ----- Original Message ----- 
  From: RISA KRAMER 
  To: rasch at acer.edu.au 
  Sent: Tuesday, July 10, 2007 5:02 AM
  Subject: [Rasch] How many anchor items do I need
  

  Dear Rasch colleagues,
   
  I'm trying to determine how many anchor items one needs to equate two subject-matter tests.  I estimated 20, or 25% of total test, whichever is greater, based on a rule-of-thumb they used at ETS and on one article that found on Florida's FCAT that 15 or 20 anchor items worked as well as 30.
   
  (Note: I'm thinking here about how one would create a pool of items for the U.S. NCLB to form a de-facto standard of "what should be taught," with the pool of items forming a portion (25%) of each state's test.  I realize the equating won't be perfect, but the goal is a good rough measure. Here is what I'm saying about the idea currently:
    It should also be noted that it is impossible to precisley equate two tests designed to different standards.  Strictly speaking, two students' scores can  be directly compared only if they are being tested on the same thing.  If State A emphasizes statistics in eighth grade math and 
  State B emphasizes geometry, it is not precisely possible to compare the math knowledge of students from the two states. Who is the more "knowledgeable" student: the one good at statistics or the one good at geometry?  Thus,equated  math (or reading) scores of State A and State B will be a rough reflection of how much math (reading) students know.  If in a particular state students who get a lot of state test items correct also tend to get a lot of the more difficult "common core" items correct, then high scores on the state test will equate to very high scores on the common scale.  If in a particular state many students tend to get many state test items correct while missing most of the"common core" items, then high scores on the state test will equate to somewhat lower scores on the common scale.)
   
  So:  I know the equating will be imperfect.  
  But am I correct that 20 items are sufficient for this imperfect equating?
   
  Steve Kramer
  Math Science Partnership of Greater Philadelphia

    
---------------------------------
    
_______________________________________________
Rasch mailing list
Rasch at acer.edu.au
http://mailinglist.acer.edu.au/mailman/listinfo/rasch
    
---------------------------------
    
No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.10.2/893 - Release Date: 7/9/2007 5:22 PM
_______________________________________________
Rasch mailing list
Rasch at acer.edu.au
http://mailinglist.acer.edu.au/mailman/listinfo/rasch



FAMILIA DE PROGRAMAS KALT. 
Mariano Jiménez 1830 A 
Col. Balcones del Valle 
78280, San Luis Potosí, S.L.P. México 
TEL (52) 44-4820 37 88, 44-4820 04 31 
FAX (52) 44-4815 48 48 
web page (in Spanish AND ENGLISH): http://www.ieesa-kalt.com

 
---------------------------------
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20070710/b4833748/attachment.html 


More information about the Rasch mailing list