# [Rasch] Reverse Digit Span Item Scoring Question

Robert Hess Robert.Hess at ASU.edu
Fri Dec 14 13:12:00 EST 2012

```Carolyn,

Since it wasn't clearly stated, I am not sure exactly what you mean by
different Rasch model? Are you concerned whether he employed a dichotomous
or polytomous model? Are you concerned whether or not he employed a
multilevel (such as a hierarchical) model? The real question here is do
students receive credit for getting each event correct or do they receive
credit only if they get 3 of 5 events correct? And is this effect somehow
related to age? If it is the former then your option #1 is a correct choice.
On the other hand, if it is the latter, then option #2 is the correct choice
(which, based on discussions I had with Dick more than 20 years ago and if
memory serves me correct, #2 was the strategy he employed).

But I have a caveat to offer, perhaps instead of using WinSteps why not
employ Facets? The Facets model was not available when Dick did his original
scorings. I do know from discussions with him that he tended to employ a
model similar to your clustered dichotomous perspective. However the last
time I saw him and that was quite a long time ago we did discuss possible
applications of Facets in his scoring.

In this fashion your items can be one facet with a second facet being the
age of the child (this facet would have a different level for each age
(5,6,7,8,9,10, & 11) and doing this would allow you to discern if the age
level of the child has any impact. My "gut feeling" is that it will with the
difference being between ages 5 through 8 versus ages 9 through 11 (with a
possible overlap on both groupings by the age 8 group). And, by employing
Facets, you can make your trial length (5 v 4) a third facet. You could even
add a fourth facet, this would be the number of successful trials necessary
needed to obtain credit (3 or 4 or 5). You may actually be able to run a
more complete analysis by combining age clusters (5-8 v 9-11) or even into 3
groupings (5-7, 8-9, 10-11) or some other combination. But you'll never know
this unless you run a Facets model. Otherwise the only factor you are going
to find is whether or not kids can repeat backwards varying sets of numbers
which may or may not be independent of their age group. And you could even
tweak the model a bit and obtain a hierarchical analysis. I better stop now,
otherwise I'm liable to end up making the analytical model more complex than

This age grouping effect would be consistent with some research done by John
Meyers in the early 1990s. His dissertation examined the influence of
approximation techniques and he found as a serendipitous effect a clear
developmental (as in Piaget) influence rather than an instructional
influence.

Am I correct in assuming that your design is really Cross-Ages rather than a
true longitudinal? By this I mean you're not really going to track the same
children over a seven-year span you're going to question children at each
age level, am I correct in this?

In any case, good luck with your study.

Robert Hess

Emeritus Professor of Educational Measurement & Evaluation

Arizona State University

From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Schulz, Matt
Sent: Thursday, December 13, 2012 10:50 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Reverse Digit Span Item Scoring Question

I think it's reasonable to assume the items are locally independent so I
would go for scoring method #1.  The other two methods loose information
concerning e.g., item fit, person fit, etc.

From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Fidelman, Carolyn
Sent: Thursday, December 13, 2012 9:32 AM
To: rasch at acer.edu.au
Subject: [Rasch] Reverse Digit Span Item Scoring Question

HI All,

We are considering creating our own scale for a reverse digit span task that
is to be administered to children ages 5-11 longitudinally and I am having
trouble understanding which Rasch model to use or how to structure the data.

What is done in such a test is that a child is first asked to repeat back
two numbers that they hear on an audio recording, but in reverse order. They
get five such items and, if they get three of them right, they go on to the
next level. Each level consists of an additional number of numbers to repeat
back all the way up to 9. Most adults can do about 7. This is correlated to
general intelligence and is predictive of a number of achievement related
outcomes. This is modeled off of something from the Woodcock Johnson COG III
called Numbers Reversed and is Test 7 in their battery of cognitive
assessments.

My hope was that the WJ documentation would tell us which Rasch model was
used with this "trials"-type data. But it doesn't really go into it, perhaps
for proprietary reasons? It says that the original calibration program was
devised by Woodcock in approximately 1970. Much has been done to develop
models that fit various data types since that time.  It is not clear what
the Woodcock procedure is exactly and how it differs from or is similar to
standard Rasch 1 parameter, rating scale  or partial credit model.  I am
data is to be discrete or clustered and determine whether it is also a
problem that it is hierarchical where performances are nested within type.

One question I have for this test is "What is an item?"  Is an item a single
trial or the  5-trial (4-trial later on) cluster? How is that scored for the
purposes of score calibration? Is it a 0,1 for each trial or is it a scale
from 0-5 (0-4 later on) for each cluster? Would we lose too much information
going to a polytomous scoring?  The administration protocol is that if the
child gets three in a cluster such as the two-numbers type, they move on to
the next greater number type. If they only get two in a cluster, they are
considered to have achieved that span but not more and no further items are
presented.

Item type/names by number of digits to recall:

2222233333444455556666777788889999

Example item scorings:

1. One record of a discrete, dichotomously scored raw score file:

1111111110111000088888888888888888*
*8=not presented

2. One record of a clustered, dichotomously scored file:

11108888

3. One record of a clustered, polytomously scored file:

54308888

What we have seen in other ECLS tests is that when we violate the assumption
of local item independence, through either a prompt or format effect, item
parameters can be inflated and standard errors artificially low or unstable.
A possible further violation in this case is that of not taking into account
the extra variability between levels of data in a hierarchy, in addition to
those within item sets.

Does anyone here know informally or through some publication source(1), what
exact Rasch model the WJ folks used for this test? Or, regardless of that,
what do you think one would do here? I am not comfortable with just treating
it as in example 1 above.  Thanks for any help you can offer!

Carolyn

Carolyn G. Fidelman, Ph.D.

Early Childhood, International & Crosscutting Studies, NCES
Rm 9035, 1990 K St. NW | 202-502-7312 |  <mailto:carolyn.fidelman at ed.gov>
carolyn.fidelman at ed.gov

(1) I consulted the following:

Jaffe, L. E. (2009). Development, interpretation, and application of the W
score and the relative proficiency index (Woodcock-Johnson III Assessment
Service Bulletin No. 11). Rolling Meadows, IL: Riverside Publishing.

and

McGrew, K., Schrank, F., Woodcock, R. (2007). Woodcock-Johnson Normative
Update: Technical Manual. Rolling Meadows, IL: Riverside Publishing.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20121213/1a4e373e/attachment.html
```