[Rasch] How many anchor items do I need

Hendrickson, Peter PHendrickson at everettsd.org
Wed Jul 11 09:24:20 EST 2007


...and even if we had a common curriculum, how would we know that the
instruction had a similar degree of commonality?  Our commitment to
monitoring fidelity of implementation is honored, for the most part, in
the breach.  
 
//Peter//
 
Peter Hendrickson
Assessment Specialist
Everett (WA) Public Schools

________________________________

From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On
Behalf Of Robert Hess
Sent: Tuesday, July 10, 2007 3:43 PM
To: RISA KRAMER; rasch at acer.edu.au
Subject: Re: [Rasch] How many anchor items do I need


Steve,
I don't believe there is as easy an answer as you might hope for.
First, one of the primary concerns of any IRT anchoring project is the
assumption that the population being tested shares a very significant
trait: they have all experienced the same instructional events.  Trying
to equate two different tests where subjects have been presented two
different forms of instruction, even in the same subject matter would be
extremely difficult.  Hence the problem that many states have been
trying to convey to the federal government concerning the requirements
of NCLB.

It is difficult enough to ensure that different school districts within
the same state have received essentially the same instruction when we
equate two different forms (A & B for example).  In such a case, the
rule of thumb that I have used for over 25 years, is to utilize a
minimum of 25% of the total number of items (or, 20, which ever number
is less).  Now, it is also important to remember that the types of items
chosen for equating is very important.  I refer you to the Lords'
seminal work where he details the effect of removing/adding items of
different difficulty to a given test -- the same warning should be taken
in choosing items to act as equating items.

While Rasch is very robust and will actually allow a fairly highly
correlated set of test scores with as few as six items on a test with 30
total items, I feel much more comfortable with at least 10 common items
shared between two tests -- I know I just violated my 25% rule stated
above, but the lower the total number of items on a given test the more
likely I have been to raise the percentage of shared items.
Nonetheless, it still behooves me to verify that my students taking the
test have all experienced essentially the same instructional events --
perhaps instruction based upon the same carefully stated (and
measurable) objectives.  It is in this last sentence that you will run
into a significant degree of problems.

It is true that the most effective way to equate two different tests is
either with common people or common items. But unless you can prove
beyond a reasonable doubt that tested students received common and
shared curricular experiences the courts will have a field day with you
-- see the results of the challenge to the state of Texas' statewide
testing program.  In fact, I'm sure that several members of this list
could provide you with a detailed description of what the court felt was
necessary in order to provide evidence supporting what it called
"curricular validity".  Any test attempting to link assessment on the
level you propose will have to stand this judicial test.  Personally, I
don't think it's possible.  We do not at this time possess a national
curriculum and until we do so will not be able to compose tests that
cross state boundaries.  Any attempt to force equating between groups of
students who have not received instruction under the same carefully
worded (and measurable) outcomes will prove fruitless -- even though one
might obtain a significant correlation between the two tests, the
correlation might simply be a result of large numbers.

I know, I just provided a non-statistical response to your question, but
believe me, I have done equating for at least five different states and
have run into enough problems trying to ensure "curricular validity"
just at the state level.  I will be interested in hearing what others on
this list have to say about your conundrum.

Robert Hess
Professor Emeritus of Measurement and Evaluation
Arizona State University

	----- Original Message ----- 
	From: RISA KRAMER <mailto:skramer1958 at verizon.net>  
	To: rasch at acer.edu.au 
	Sent: Tuesday, July 10, 2007 5:02 AM
	Subject: [Rasch] How many anchor items do I need

	Dear Rasch colleagues,
	 
	I'm trying to determine how many anchor items one needs to
equate two subject-matter tests.  I estimated 20, or 25% of total test,
whichever is greater, based on a rule-of-thumb they used at ETS and on
one article that found on Florida's FCAT that 15 or 20 anchor items
worked as well as 30.
	 
	(Note: I'm thinking here about how one would create a pool of
items for the U.S. NCLB to form a de-facto standard of "what should be
taught," with the pool of items forming a portion (25%) of each state's
test.  I realize the equating won't be perfect, but the goal is a good
rough measure. Here is what I'm saying about the idea currently:
	
	It should also be noted that it is impossible to precisley
equate two tests designed to different standards.  Strictly speaking,
two students' scores can  be directly compared only if they are being
tested on the same thing.  If State A emphasizes statistics in eighth
grade math and 
	State B emphasizes geometry, it is not precisely possible to
compare the math knowledge of students from the two states. Who is the
more "knowledgeable" student: the one good at statistics or the one good
at geometry?  Thus,equated  math (or reading) scores of State A and
State B will be a rough reflection of how much math (reading) students
know.  If in a particular state students who get a lot of state test
items correct also tend to get a lot of the more difficult "common core"
items correct, then high scores on the state test will equate to very
high scores on the common scale.  If in a particular state many students
tend to get many state test items correct while missing most of
the"common core" items, then high scores on the state test will equate
to somewhat lower scores on the common scale.)
	 
	So:  I know the equating will be imperfect.  
	But am I correct that 20 items are sufficient for this imperfect
equating?
	 
	Steve Kramer
	Math Science Partnership of Greater Philadelphia

	
________________________________


	

	_______________________________________________
	Rasch mailing list
	Rasch at acer.edu.au
	http://mailinglist.acer.edu.au/mailman/listinfo/rasch
	

	
________________________________


	

	No virus found in this incoming message.
	Checked by AVG Free Edition. 
	Version: 7.5.476 / Virus Database: 269.10.2/893 - Release Date:
7/9/2007 5:22 PM
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20070710/a5186c3c/attachment.html 


More information about the Rasch mailing list