[Rasch] Reverse Digit Span Item Scoring Question

Robert Hess Robert.Hess at ASU.edu
Sat Dec 15 06:11:33 EST 2012



I really like your approach. Your willingness to permit the data to drive
the analysis rather than having the analysis drive the data is very
refreshing. And I am very glad to see that you are using the concept of
number correct only up to the last successfully completed category. I'm sure
that Dick would find this very appropriate. I further agree that if indeed
(as you note) the directions and strategy for younger children is different
than it is for older children you are looking at a linked model rather than
a concurrent model - the two distinct questioning protocols is important.


I would really be interested in seeing some of your early results as a
curiosity in the application of the Facets approach - particularly since
there are a variety of strategies for analysis that could be employed for
scoring purposes. Do you have any plans at this time for an initial
Cross-Age analysis in order to establish the correct analytic approach? Such
a procedure might help to clarify some of your questions rather than having
to wait for several years to pass before trying to establish the link versus
concurrent and other issues (such as age).


Please feel free to contact me if I can be of any assistance (if for no
other reason than to be a sounding board for you). You can reach me at:
Robert.Hess at ASU.edu 




From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Fidelman, Carolyn
Sent: Friday, December 14, 2012 8:44 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Reverse Digit Span Item Scoring Question


Dr. Hess and others,


As always this group comes through for people like me. I am so grateful for
all your ideas!!!


Let me clarify that the ECLS-K:2011, similar to its predecessors ECLS-K:1998
and ECLS-B, is a true longitudinal study. See http://nces.ed.gov/ecls/ for
all the details. But basically we start with a nationally representative
sample of about 22K in Fall Kindergarten and by 5th grade, due to attrition,
may have 13K. Various  flavors of weights and replicates are created to
correct the variables for various forms of sampling, coverage and
nonresponse error, so that users may conduct complex survey analysis with it
for their purposes and still have confidence in its representativeness. The
survey data is eventually made available in Public Use File (PUF) form and
in Restricted Use File (RUF) form that will include item level data
(available now for the first two studies and for the study started in 2011,
at some point in the future).


Several people have suggested just going with the simple dichotomous
analysis. That makes sense and I guess I would always try that first.
Occam's razor. But we have done Yen's Q3 analysis on other test data and
found format effects causing LID (sometimes quite strong values of .5 etc.)
so it is an issue that we want to be conscious of as a source of error in
parameter estimation and correct for it if needed.  Since this whole test
repeatedly uses a very specific format I am suspicious and want to
investigate. To answer some of your questions:


The examinees don't literally get a Number Correct score equal to the number
of items they got correct. Instead, only up to the number correct in the
category they reached with success.  So for this example




I believe the "Number Correct" scoring is 11 (and not 12) from what I read
in the user's manual.


Also, yes, in the user's manual there are special directions for younger
children and I have observed younger children being administered this test.
They are different in their way of responding to this test and the FACETS
analysis, as you suggest, might help to account for this. It makes you
wonder if these results should be really linked instead of concurrently
calibrated since, essentially it is a slightly different test at the younger
ages with a different administration protocol.  Since the growth curve in
all areas is pretty steep at the ages (5-10) this study covers I would go
with your suggestion to specify the age as a facet.


So just like you, the more I looked at this the more complications there
seem to be and the less I am willing to only consider a straight-up,
dichotomous Rasch analysis. Dichotomous/polytomous? Hierarchical by set
length? Age group facet? Multidimensional by administration protocol? I will
try some different approaches and see what comes up. Looks like my work is
cut out for me! I will let you all know what happens.


Thanks again to all who offered their ideas!!!




Carolyn G. Fidelman, Ph.D.

Early Childhood, International & Crosscutting Studies, NCES
Rm 9035, 1990 K St. NW | 202-502-7312 |  <mailto:carolyn.fidelman at ed.gov>
carolyn.fidelman at ed.gov 


From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Robert Hess
Sent: Thursday, December 13, 2012 9:12 PM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Reverse Digit Span Item Scoring Question



Since it wasn't clearly stated, I am not sure exactly what you mean by
different Rasch model? Are you concerned whether he employed a dichotomous
or polytomous model? Are you concerned whether or not he employed a
multilevel (such as a hierarchical) model? The real question here is do
students receive credit for getting each event correct or do they receive
credit only if they get 3 of 5 events correct? And is this effect somehow
related to age? If it is the former then your option #1 is a correct choice.
On the other hand, if it is the latter, then option #2 is the correct choice
(which, based on discussions I had with Dick more than 20 years ago and if
memory serves me correct, #2 was the strategy he employed). 


But I have a caveat to offer, perhaps instead of using WinSteps why not
employ Facets? The Facets model was not available when Dick did his original
scorings. I do know from discussions with him that he tended to employ a
model similar to your clustered dichotomous perspective. However the last
time I saw him and that was quite a long time ago we did discuss possible
applications of Facets in his scoring.


In this fashion your items can be one facet with a second facet being the
age of the child (this facet would have a different level for each age
(5,6,7,8,9,10, & 11) and doing this would allow you to discern if the age
level of the child has any impact. My "gut feeling" is that it will with the
difference being between ages 5 through 8 versus ages 9 through 11 (with a
possible overlap on both groupings by the age 8 group). And, by employing
Facets, you can make your trial length (5 v 4) a third facet. You could even
add a fourth facet, this would be the number of successful trials necessary
needed to obtain credit (3 or 4 or 5). You may actually be able to run a
more complete analysis by combining age clusters (5-8 v 9-11) or even into 3
groupings (5-7, 8-9, 10-11) or some other combination. But you'll never know
this unless you run a Facets model. Otherwise the only factor you are going
to find is whether or not kids can repeat backwards varying sets of numbers
which may or may not be independent of their age group. And you could even
tweak the model a bit and obtain a hierarchical analysis. I better stop now,
otherwise I'm liable to end up making the analytical model more complex than
the task.


This age grouping effect would be consistent with some research done by John
Meyers in the early 1990s. His dissertation examined the influence of
approximation techniques and he found as a serendipitous effect a clear
developmental (as in Piaget) influence rather than an instructional


Am I correct in assuming that your design is really Cross-Ages rather than a
true longitudinal? By this I mean you're not really going to track the same
children over a seven-year span you're going to question children at each
age level, am I correct in this?


In any case, good luck with your study.


Robert Hess

Emeritus Professor of Educational Measurement & Evaluation

Arizona State University


From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Schulz, Matt
Sent: Thursday, December 13, 2012 10:50 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] Reverse Digit Span Item Scoring Question


I think it's reasonable to assume the items are locally independent so I
would go for scoring method #1.  The other two methods loose information
concerning e.g., item fit, person fit, etc. 


From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf
Of Fidelman, Carolyn
Sent: Thursday, December 13, 2012 9:32 AM
To: rasch at acer.edu.au
Subject: [Rasch] Reverse Digit Span Item Scoring Question


HI All,


We are considering creating our own scale for a reverse digit span task that
is to be administered to children ages 5-11 longitudinally and I am having
trouble understanding which Rasch model to use or how to structure the data.


What is done in such a test is that a child is first asked to repeat back
two numbers that they hear on an audio recording, but in reverse order. They
get five such items and, if they get three of them right, they go on to the
next level. Each level consists of an additional number of numbers to repeat
back all the way up to 9. Most adults can do about 7. This is correlated to
general intelligence and is predictive of a number of achievement related
outcomes. This is modeled off of something from the Woodcock Johnson COG III
called Numbers Reversed and is Test 7 in their battery of cognitive


My hope was that the WJ documentation would tell us which Rasch model was
used with this "trials"-type data. But it doesn't really go into it, perhaps
for proprietary reasons? It says that the original calibration program was
devised by Woodcock in approximately 1970. Much has been done to develop
models that fit various data types since that time.  It is not clear what
the Woodcock procedure is exactly and how it differs from or is similar to
standard Rasch 1 parameter, rating scale  or partial credit model.  I am
concerned about this because we need to decide whether the structure of the
data is to be discrete or clustered and determine whether it is also a
problem that it is hierarchical where performances are nested within type.


One question I have for this test is "What is an item?"  Is an item a single
trial or the  5-trial (4-trial later on) cluster? How is that scored for the
purposes of score calibration? Is it a 0,1 for each trial or is it a scale
from 0-5 (0-4 later on) for each cluster? Would we lose too much information
going to a polytomous scoring?  The administration protocol is that if the
child gets three in a cluster such as the two-numbers type, they move on to
the next greater number type. If they only get two in a cluster, they are
considered to have achieved that span but not more and no further items are


Item type/names by number of digits to recall:



Example item scorings:


1. One record of a discrete, dichotomously scored raw score file:

*8=not presented


2. One record of a clustered, dichotomously scored file:



3. One record of a clustered, polytomously scored file:



What we have seen in other ECLS tests is that when we violate the assumption
of local item independence, through either a prompt or format effect, item
parameters can be inflated and standard errors artificially low or unstable.
A possible further violation in this case is that of not taking into account
the extra variability between levels of data in a hierarchy, in addition to
those within item sets.


Does anyone here know informally or through some publication source(1), what
exact Rasch model the WJ folks used for this test? Or, regardless of that,
what do you think one would do here? I am not comfortable with just treating
it as in example 1 above.  Thanks for any help you can offer!





Carolyn G. Fidelman, Ph.D.

Early Childhood, International & Crosscutting Studies, NCES
Rm 9035, 1990 K St. NW | 202-502-7312 |  <mailto:carolyn.fidelman at ed.gov>
carolyn.fidelman at ed.gov 




(1) I consulted the following: 


Jaffe, L. E. (2009). Development, interpretation, and application of the W
score and the relative proficiency index (Woodcock-Johnson III Assessment
Service Bulletin No. 11). Rolling Meadows, IL: Riverside Publishing.




McGrew, K., Schrank, F., Woodcock, R. (2007). Woodcock-Johnson Normative
Update: Technical Manual. Rolling Meadows, IL: Riverside Publishing.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20121214/fd566f9d/attachment.html 

More information about the Rasch mailing list