# [Rasch] Rasch: analyze two versions of a test

```Hi Juho

I know that this discussion is not primarily about CAT, but want to respond
to some of your questions. (I have been implementing CAT since the late
80s.) As you said, CAT is a different ballgame, but highly efficient. As you
know in a "pure" CAT the algorithm usually selects the single item which
provides the maximum (Fisher) information - this is of course adjusted after
each response as a new ability estimate is obtained. There are methods to
manipulate this. For example, select one item randomly from say five items
at this difficulty. This is mainly for exposure control. Otherwise you can
dictate to the algorithm which specific item is to be administered after a
certain item, but that would of course disrupt the CAT nature. This may,
however, be useful in specific cases. In your math example the hierarchy is
very useful. For example, it is unlikely that you will be able to solve a
quadratic equation if you cannot solve a linear equation. To furher guide
the process, the SEM is estimated after each item is administrated and this
can also "keep you on track" as it needs to increasingly decrease.

In tests in which uni-dimensionality cannot be assumed or you want to derive
measures for different strands, say Number, Measurement, Space, ., you can
run sequential CATs - one for each strand. Obviously, if you want an overall
measure you will need to weigh the individual abilities by their SEs to
compute. We also implement CATs within CATs in an integrated CAT where the
same is achieved. So there are different ways in which one can manipulate a
CAT if you want to.

Finally, it may sound contradictory that you can revise and change responses
in a CAT since a change would mean that the pathway of item selection should
start again from that point onwards. This in not true. We have done some
significant research and showed that if you correlate initial ability
estimates with final (after review) estimates, you consistently find r >
0.995. Ability estimates are derived through ML estimation on the basis of
the item difficulties of the items administered and whether the response was
correct or incorrect. So, in a nutshell, you can go back to any item and
change the response in a CAT. (This is usually a "political" requirement as
it can be done in paper-and-pencil.)

Kindly

John

Prof John J Barnard (D.Ed.;Ph.D.;Ed.D.)

Executive Director: EPEC Pty Ltd

EPEC logo 3

<http://www.epecat.com> www.epecat.com

Augustin  et al

We aim for local independence, but I think in many tests, some items trigger
memory and ideas that can be useful for other items.

e.g. in a secondary school mathematics test, the same algebra skills are
required to manipulate formulae, to solve equations, for use of formulae
(substitute and evaluate), etc.

I am sure that sometimes there are some triggering effects from some items
to others.

Hence, it would be preferable to have items appear in a similar order, in
similar positions, etc.

Good test making includes balancing the items in a test according to
content, skills, order, etc; not just throwing the whatever number of items
in any order that happens to fit a page.

But you are correct, using CAT, this is a different issue.

Nit sure if anyone has explored it yet.

However, using good reasoning, we can consider the situation, and perhaps
items should be grouped in some way as well, to indicate those to be early
in a test, or late in  test, or near the middle.

One difference is that with pen and paper tests, the test taker can go back
and revise a response if  they have another thought about that item; so the
order of items can be countered by a student who is prompted for one item by
their working out another item. I am not sure that this is possible with
CAT.

Regards

Juho

Dr Juho Looveer

Australia

Hello!

What if the test is not in a paper and pencil format but it is administered
using?CAT? Do the items need to appear in the same invariant order in every
CAT version if difficulty has to be invariant too?

Agustin

?

> Hi

>

> I have a question about analyzing two versions of a test.

>

> Say we have 130 items for a test. We make two versions of this test.

> In version A, we put items number 1 to 80 and in version 2 the items

> number

50

> to 130. So items 50-80 are in both versions. In version A, the item

numbers

> 1 to 30 are anchor items from a previous test. For these items, we

> know

and

> use the measures from a previous Facets analysis. These anchor items

(1-30)

> occur only in version A, not in version B.

>

> About 400 candidates take version A, and about 250 take version B. The

test

> is rated by 4 raters. Raters rate both versions, each test taker is

> rated

by

> one random rater.

>

> One might choose to take all candidates together for analysis. Or one

> can choose to first analyze version A separately (using the measures

> for the anchor items).? And then use the outcome, i.e. the measures

> for the identical items (number 50-80) and the measures for the

> raters, in the subsequent analyses of version B.

>

> Which way of analyzing is preferable and why is it?

>

> Kind regards,

>

> Lucia Luyten

>

>

>

> Dear Lucia

> You do it both ways, expecting invariance.

> Where you don't, you look for reasons.

> Then choose.

> TGB

>

>

>

wrote:

>

> Hi

>

> I have a question about analyzing two versions of a test.

>

> Say we have 130 items for a test. We make two versions of this test.

> In version A, we put items number 1 to 80 and in version 2 the items

> number

50

> to 130. So items 50-80 are in both versions. In version A, the item

numbers

> 1 to 30 are anchor items from a previous test. For these items, we

> know

and

> use the measures from a previous Facets analysis. These anchor items

(1-30)

> occur only in version A, not in version B.

>

> About 400 candidates take version A, and about 250 take version B. The

test

> is rated by 4 raters. Raters rate both versions, each test taker is

> rated

by

> one random rater.

>

> One might choose to take all candidates together for analysis. Or one

> can choose to first analyze version A separately (using the measures

> for the anchor items).? And then use the outcome, i.e. the measures

> for the identical items (number 50-80) and the measures for the

> raters, in the subsequent analyses of version B.

>

> Which way of analyzing is preferable and why is it?

>

> Kind regards,

>

> Lucia Luyten

>

>

>

>

>

>

>

> Is there any way to have two or more raters evaluate the same people

> on a fairly large scale? If so, you can also check rater effects using
Facets ?

> Even if you had only very limited numbers of double/triple/ ? ratings,

large

> rater differences/biases would be a sign for caution.

>

> Rense Lange

>

>

> > Dear Lucia

> > You do it both ways, expecting invariance.

> > Where you don't, you look for reasons.

> > Then choose.

> > TGB

> >

> >

> >

> >> Hi

> >>

> >> I have a question about analyzing two versions of a test.

> >>

> >> Say we have 130 items for a test. We make two versions of this

> >> test. In

> version A, we put items number 1 to 80 and in version 2 the items

> number

50

> to 130. So items 50-80 are in both versions. In version A, the item

numbers

> 1 to 30 are anchor items from a previous test. For these items, we

> know

and

> use the measures from a previous Facets analysis. These anchor items

(1-30)

> occur only in version A, not in version B.

> >>?

>

```