# [Rasch] [BULK] Re: Rasch Digest, Vol 66, Issue 1

Stemler, Steven sstemler at wesleyan.edu
Tue Jan 4 18:12:42 EST 2011

```Hi Carolyn et al.,

Actually, if you don't change the missing values to zeros, you will end up with what is potentially a very problematic situation in this instance. This is because the Rasch logit score is just a simple logistic transformation of the proportion of people answering the item incorrectly/proportion of people answering correctly, and your item difficulties could be based on wildly different n's.

Let's take an extreme example. Suppose that out of 100 test-takers, only 2 people actually answer item 20 (your theoretically hardest item on the test). One of those two gets it right and the other gets it wrong. If you don't treat the other test-takers who didn't make it to that hard item as answering it incorrectly (and you are not working with anchored item statistics), the logit estimate for the item difficulty will be 0.

But now let's think about item 11, for example. Maybe all 100 people take this item but 40 get it right and 60 get it wrong. Compute the item difficulty and you will find that item 11 now appears to be more difficult (logit = .17) than item 20, even though you know theoretically that item 20 is far more difficult. Thus, the very nature of your scale will be undermined by treating the data as missing (assuming that you have a theory guiding your item development). Had you "imputed" the zeros for the other 98 people who got item 20 incorrect, the logit estimate would be 1.99, which is theoretically much more sensible.

Not surprisingly, using missing values in this instance will also wreak havoc with your fit statistics, which are really the important part of a Rasch analysis anyway, mainly because the practical ordering of item difficulties could potentially be so far off from their theoretical (and true) difficulty.

Best,

Steve

************************************
Steve Stemler, Ph.D.
Assistant Professor of Psychology
Wesleyan University
207 High Street
Middletown, CT 06459
860-685-2207
http://sstemler.web.wesleyan.edu/stemlerlab/
************************************

From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Mark Moulton
Sent: Tuesday, January 04, 2011 1:20 AM
To: Dr Juho Looveer
Cc: rasch at acer.edu.au
Subject: Re: [Rasch] [BULK] Re: Rasch Digest, Vol 66, Issue 1

Greetings, all!

Juho makes a fair and important point.  It's a question of picking one's poison.  Either we have a more honest statistic that makes no assumptions about missing data, but at the risk of having descriptives that utterly misrepresent what's going on.  Or we have a "descriptive" statistic that is actually a hodge-podge of descriptive and made up, with artificially inflated N counts, but that is less prone to missing selection bias (which would be acute in this case).  Certainly, with statistics based on partially imputed data, I would not report the total N, only the descriptive N, and I would not report statistics that rely heavily on N such as standard errors or significance tests.  All couched with appropriate caveats, of course.

It's worth remembering that a Rasch item difficulty is a transformed p-value in the special case where all data are present.  When data are missing, the Rasch difficulty is more or less what one would obtain by imputing values for the missing cells using the Rasch model.  This begs the question: Why report descriptive statistics like p-values at all?  Just report the Rasch statistics.  Of course, that can be hard to explain to the Dept of Education!

Mark Moulton

On Mon, Jan 3, 2011 at 8:34 PM, Dr Juho Looveer <juho.looveer at gmail.com<mailto:juho.looveer at gmail.com>> wrote:

Greetings and Best Wishes to all for the New Year; hopefully many of us can meet up at PROMS and The Psychometric Society meeting (IMPS) this year.

I agree with Mark's comments: "So far as the Rasch analysis goes, I would leave the data missing, even

though zeros are a reasonable imputation in this case.  However, even if they are reasonable they could distort the item difficulties in odd ways, so I would leave them missing."

However, I hate imputations.  There are many assumptions made when imputing data (and "when you assume, you make an ass out of U and me").  I would prefer to see descriptive statistics based on what you actually have, noting the actual sample size for each item.

By imputing, you skew and distort the descriptives; that is akin to having a sample of 20, and imputing that if you had a sample of 1000, they would fit the same pattern, and hence derive false statistics that make inferences appear highly reliable due to the "new" sample size.

Humble Regards

Juho

Dr  Juho Looveer

Psychometrician / Analyst

"Improving the Assessment of Literacy and Numeracy Across the Pacific"

Secretariat of the Pacific Board for Educational Assessment (SPBEA)

PO Box 2083, Government Buildings,  Suva, Fiji

Phone: +679-368 0051

Mobile: (00679) 945 6055

Fax: +679-3302898

Email: jlooveer at spbea.org.fj<mailto:mshafraz at spbea.org.fj>

Web Site: www.spbea.org.fj<http://www.spbea.org.fj/>
-----Original Message-----
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au] On Behalf Of rasch-request at acer.edu.au<mailto:rasch-request at acer.edu.au>
Sent: Tuesday, January 04, 2011 12:00 PM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: Rasch Digest, Vol 66, Issue 1

Send Rasch mailing list submissions to

rasch at acer.edu.au<mailto:rasch at acer.edu.au>

To subscribe or unsubscribe via the World Wide Web, visit

https://mailinglist.acer.edu.au/mailman/listinfo/rasch

or, via email, send a message with subject or body 'help' to

rasch-request at acer.edu.au<mailto:rasch-request at acer.edu.au>

You can reach the person managing the list at

rasch-owner at acer.edu.au<mailto:rasch-owner at acer.edu.au>

than "Re: Contents of Rasch digest..."

Today's Topics:

1. missing data advice needed (Fidelman, Carolyn)

2. Re: missing data advice needed (Mark Moulton)

----------------------------------------------------------------------

Message: 1

Date: Mon, 3 Jan 2011 16:50:44 -0600

From: "Fidelman, Carolyn" <Carolyn.Fidelman at ed.gov<mailto:Carolyn.Fidelman at ed.gov>>

Subject: [Rasch] missing data advice needed

To: "rasch at acer.edu.au<mailto:rasch at acer.edu.au>" <rasch at acer.edu.au<mailto:rasch at acer.edu.au>>

Message-ID:

<92E646BC24CBD742A4F9858873A97E0C78032767BB at EDUPTCEXMB01.ed.gov<mailto:92E646BC24CBD742A4F9858873A97E0C78032767BB at EDUPTCEXMB01.ed.gov>>

Content-Type: text/plain; charset="iso-8859-1"

Hi All,

I have a burning question about treatment of item-level missing data. Imagine a short reading or math screener given to 5 year olds (individual administration format, of course) in order to place them on some quick proficiency scale.  The 20 items are designed to go from easy to hard. Because young children can become easily frustrated and burdened once they reach their level, a stop rule is in place in which the administrator ceases testing once the child has missed 5 items in a row.

My question: is it legit to impute 0s to the missing values (9s) of a record such as

1110100110000099999

thus making it

1110100110000000000

where the assumption is that the child would not have correctly answered those last few items had they been administered anyway?

I know that for WINSTEPS and Rasch analysis, missing values are not an issue.  I'm really thinking in terms of the descriptive statistics that precede them in a report. It seems more reasonable for purposes of comparison to report for a set of 1000 examinees  that the p value of, say, Item 18 is  .09 where n=1000, than to say it is .25 for the 200 examinees who were actually presented with the item.

Or should I just have two different datasets, one for the Rasch analysis (leaving the 9s) and another for the descriptives (imputing 0s for 9s)?

Your wisdom much appreciated!  (and Happy New Year!)

Carolyn

-------------- next part --------------

An HTML attachment was scrubbed...

URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20110103/9e3686e5/attachment-0001.html

------------------------------

Message: 2

Date: Mon, 3 Jan 2011 16:28:54 -0800

From: Mark Moulton <markhmoulton at gmail.com<mailto:markhmoulton at gmail.com>>

Subject: Re: [Rasch] missing data advice needed

To: "Fidelman, Carolyn" <Carolyn.Fidelman at ed.gov<mailto:Carolyn.Fidelman at ed.gov>>

Cc: "rasch at acer.edu.au<mailto:rasch at acer.edu.au>" <rasch at acer.edu.au<mailto:rasch at acer.edu.au>>

Message-ID:

<AANLkTi=kMs5bB2RJogRB-qcA7KMN+HMjyTCnZDRcmSjT at mail.gmail.com<mailto:kMs5bB2RJogRB-qcA7KMN%2BHMjyTCnZDRcmSjT at mail.gmail.com>>

Content-Type: text/plain; charset="iso-8859-1"

Dear Carolyn,

So far as the Rasch analysis goes, I would leave the data missing, even

though zeros are a reasonable imputation in this case.  However, even if

they are reasonable they could distort the item difficulties in odd ways, so

I would leave them missing.

"Descriptive" statistics always assume complete data, so it is problematic

to leave the missing data in for that.  But it is also a little problematic

to leave the zeros, though less so.  What I would do myself is calculate the

expected values for all the missing cells, round them to zero or one, and

build your descriptive statistics using the resulting complete data matrix.

The resulting descriptives will be about the best you could manage.

Winsteps reports cell level expected values (though maybe not for missing

cells -- I can't remember).  Or you could import the data into Excel, along

with person and item measures, and calculate the expected values using

something like:

Eni = if(Xni <> 9,Xni,round(exp(Bn - Di) / (1 + exp(Bn - Di)),0))

(Translation:  If the observed value is not missing, use that, otherwise

round the expected value (same as probability in this case) calculated using

person logit ability Bn and item logit difficulty Di.)

Mark Moulton

On Mon, Jan 3, 2011 at 2:50 PM, Fidelman, Carolyn

<Carolyn.Fidelman at ed.gov<mailto:Carolyn.Fidelman at ed.gov>>wrote:

>   Hi All,

>

> I have a burning question about treatment of item-level missing data.

> Imagine a short reading or math screener given to 5 year olds (individual

> administration format, of course) in order to place them on some quick

> proficiency scale.  The 20 items are designed to go from easy to

> hard. Because young children can become easily frustrated and burdened once

> they reach their level, a stop rule is in place in which the administrator

> ceases testing once the child has missed 5 items in a row.

>

> My question: is it legit to impute 0s to the missing values (9s) of a

> record such as

>

> 1110100110000099999

>

> thus making it

>

>  1110100110000000000

>

> where the assumption is that the child would not have correctly answered

>

> I know that for WINSTEPS and Rasch analysis, missing values are not an

> issue.  I'm really thinking in terms of the descriptive statistics that

> precede them in a report. It seems more reasonable for purposes of

> comparison to report for a set of 1000 examinees  that the p value of, say,

> Item 18 is  .09 where n=1000, than to say it is .25 for the 200 examinees

> who were actually presented with the item.

>

> Or should I just have two different datasets, one for the Rasch analysis

> (leaving the 9s) and another for the descriptives (imputing 0s for 9s)?

>

> Your wisdom much appreciated!  (and Happy New Year!)

>

> Carolyn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20110104/a4c86691/attachment.html
```