[Rasch] R: item selection based on DIF

DIMA ALEXANDRA alexandra.dima at univ-lyon1.fr
Tue Oct 31 03:40:27 AEDT 2017

Dear All,

Many thanks for your responses. I will check the two papers recommended by Edward. I also found https://doi.org/10.1186/s12955-017-0755-0, which seems to be a follow-up of these two papers. I have already had a close look at the winsteps instructions (I work with winsteps) but I did not come across an explanation to this particular situation (it is probably not very common?). Will check the archives for the review of DIF. I guess this is the paper by Richard Smith : journals.sagepub.com/doi/abs/10.1177/0013164491513003 ?

Regarding the concept : it is health literacy, and the instrument is the SAHL (short assessment of health literacy). The items are 33 medical terms, and the respondents (N=1231) were required to choose a correct meaning. For example, ‘Psoriasis’ would be ‘a disease leading to red dry spots on the skin’ and not ‘brown spots, especially on the face’. All items have good fit according to the (yes, arbitrary but) recommended thresholds – including in the winsteps manual. But 33 items take too much time to answer in busy clinical settings. We cannot use a CAT (yet), so we need some criteria to select a smaller item set. We chose to select based on DIF because in principle we would like to be able to use this instrument to compare subgroups with different sociodemographic characteristics, and I understand that this would be a desirable item property to ensure that the meaning of the construct is the same in the subgroups we want to compare. Age, gender and education were the only such variables available in our sample.

From a theoretical perspective, there are no differences between DIFFING and NON DIFFING items, and they are all equally relevant to the hypothesized trait. There are no particular reasons to believe that some medical terms are more familiar to females for example (no medical terms specific to women’s health). So we don’t really have an interpretation for any item that displays DIF (we could come up with hypotheses, but probably not very useful to do so). I understand and totally agree with your approach of using both stats and theory in item selection, it was not my intention to use statistics blindly and mindlessly (not my style ;) ). My question was rather about how to figure out if a DIF that is noticeable and significant (according to the rules of thumb mentioned in the winsteps manual) is also a substantive difference, particularly in such situations were items that did not have noticeable and significant DIF in earlier steps of item selection now do show DIF according to these criteria. No matter how much discussion we generate about why a DIF happened, I need to arrive at a short form and exclude some items based on some objective criteria. ‘Substantive difference’ seemed from the winsteps description like something I need to look into, hence my email.

Regarding DIF by education, we have 3 education levels: low (level 0-2: early childhood; primary education, lower secondary education); intermediate (level 3-5: upper secondary, post secondary, short cycle tertiary) and high (level 6-8: bachelor, master, doctoral). We would expect differences in health literacy depending on education. But, if I understand DIF correctly, ideally we should not have differences in item difficulty because in this case we would not know whether the differences between respondents with low versus high education levels are differences in health literacy or we are comparing two different understandings of health literacy. To use the example with students and learning, if after a course the students only find 2 or 3 items easier probably their learning did not work as planned? On the contrary, if they have advanced on their level of understanding overall, their positions will change on the latent but not the difficulty of the items? Moreover, there seem to be items that are easier for respondents with lower education levels (might be artificial DIF, but.. gotta figure out a way to tell if this is the case).

Another DIF-related issue in this analysis is that the sample combines several studies (same country, all adults, patient or community samples). Items do show DIFs between studies, which is consistent with reports on other health literacy measures in the literature. We did not add study-related DIF as a criterion for item selection in the short form (not a socio-demographic variable), but we would like to report this and point to the fact that it may suggest that health literacy may mean different things in different contexts. Suffering from a chronic condition may lead to learning selectively the medical terms specific to one’s condition. There are no ‘easy’ or ‘difficult’ medical terms in general. The difficulty order may be sensitive to various personal experiences of illness. From this analysis I can only propose this as a topic of further study and reflection – one which I think is quite important theoretically for health literacy. But I still need to advise the clinicians which items they should include in a short form, preferably based on some objective criteria I can explain to them clearly without resorting to ‘it depends on what you mean’ – I tried these answers before and it did not go well :)

Many thanks,

From: Rasch <rasch-bounces at acer.edu.au> on behalf of "Stone, Gregory" <Gregory.Stone at UToledo.Edu>
Reply-To: "rasch at acer.edu.au" <rasch at acer.edu.au>
Date: Monday 30 October 2017 at 15:43
To: "rasch at acer.edu.au" <rasch at acer.edu.au>
Subject: Re: [Rasch] R: item selection based on DIF

I would point out a couple of things.

First, using .6-1.4 as a measure for fit is arbitrary.  A rule of thumb.  Richard Smith has a paper out on a proper calculation for understanding the boundaries appropriate based on sample sizes, number of items, etc.  While this range may be great for a a sample of 200, it’s significantly different for a sample of 1000.

Second, I would refer back to a very clear and simple review of DIF by Michael Linacre in a previous SIG newsletter and in the Winsteps instructions.  Beyond this we also look at balance and meaning.  I would never exclude an item based solely on a DIF statistic.  Measurement is a quantitative activity with deep roots in evaluation.  We cannot exclude an item just based on fit statistics, but based on our evaluation of why after the fit statistics provide us with a clue.  There are and will always be items that evidence dif.  For example 2+2=? may be biased in favor of females.  Should we immediately delete it?  Heavens no.  We need to determine why it exhibits DIF.  Is it simply random chance?  What is inherently biased in this item?  What could have made females do better?  Better preparation?  Finally consider balance.  Items balanced for and against females and males is a typical and acceptable event, depending on why.

Statistics are guideposts pointing you in certain directions.  They are not the destination.  We need, as Luigi has suggested, to explore the meaning before engaging in action.

I would also do not understand “Education Dif”?  Level of education?


On Oct 30, 2017, at 5:25 AM, Bond, Trevor <trevor.bond at jcu.edu.au<mailto:trevor.bond at jcu.edu.au>> wrote:

Education DIF?
Surely this is the reason we teach? To make items easier for those who learn.
Prof Trevor G BOND

On 30 Oct 2017, at 7:29 pm, prof. Luigi Tesio - AUXOLOGICO <l.tesio at auxologico.it<mailto:l.tesio at auxologico.it>> wrote:
Dear all,
Please do not forget that Rasch analysis aims at measuring an existing (reflective) variable, not at  INVENTING (constructive) a latent trait. Add to the discussion outlined below some reasoning concerning the nature of the DIFFING/NON DIFFING items, the reason why they are or not relevant to the hypothesized trait, the potential CAUSES for DIF (not only p values) etc. The more Rasch is taken only as a statistical game, the more you will strive to adapt reality to fit indexes, not the reverse.

Da: Rasch [mailto:rasch-bounces at acer.edu.au] Per conto di Edward Li
Inviato: lunedì 30 ottobre 2017 02:59
A: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Oggetto: Re: [Rasch] item selection based on DIF

Hi Alex,

DIF analyses could induce artificial DIF which is an artefact of some items displaying real DIF. For example, when you find some items favouring one group, some other items will inevitably favour the other group. David Andrich and Curt Hagquist have a few papers discussing this topic which could be quite helpful in your case. Please see the reference below.

Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Journal of Educational and Behavioral Statistics, 37(3).
Andrich,  D.,  &  Hagquist,  C.  (2015).  Real  and Artificial  Differential  Item  Functioning  in Polytomous  Items. Educational  and  Psychological  Measurement, 75(2).

From: Rasch <rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au>> on behalf of DIMA ALEXANDRA <alexandra.dima at univ-lyon1.fr<mailto:alexandra.dima at univ-lyon1.fr>>
Sent: Monday, 30 October 2017 1:19:17 AM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: [Rasch] item selection based on DIF

Dear Rasch researchers,

I would like to ask for your advice in a DIF-related issue. We performed a Rasch analysis of a 33-item questionnaire to choose items for a short version, in two steps. First we examined items infit and outfit, and used the thresholds of mean squares outside the 0.6-1.4 range and standardized fit statistics outside the +/-2.0. No items were excluded at this step (all fitted this criterion). Then we examined DIF for gender, age, and education (in this order), and excluded items with DIF > 0.5 logits. We excluded 3 items based on gender DIF, 4 items based on age DIF, and 12 based on education DIF. We thus arrived at a 14-item version, for which we examined again DIF for all 3 variables. We noticed that there appeared DIFs for age > 0.5 logits (e.g. .60, p<.000), even if in the earlier steps of the selection process these items had no problems. This is a noticeable and significant difference according to the winsteps manual (http://www.winsteps.com/winman/table30_1.htm ): « [DIF CONTRAST] should be at least 0.5 logits for DIF to be noticeable. "Prob." shows the probability of observing this amount of contrast by chance, when there is no systematic item bias effect. For statistically significance DIF on an item, Prob. ≤ .05. ». But I do not know how to figure out if this is also a substantive difference and if I should exclude those items as well, in other words continue item selection until all items show DIF <.05 for all variables (age, gender, education). This is particularly puzzling since these items showed acceptable DIF in previous runs. I understand in principle that this can happen, but what does it mean: are this items good enough or not? Should they be kept or excluded? Are there other criteria and tests that I should consider?

Any suggestions or references for further reading would be much appreciated!

Many thanks,

Alexandra DIMA PhD, AFBPsS
Marie Curie Research Fellow

Health Services and Performance Research
Université Claude Bernard Lyon 1
Domaine Rockefeller- 2eme étage (aile CD)
8 avenue Rockefeller
69373 Lyon 8
W: +33 (0) 4 26 68 82 23
M: +33 (0) 6 32 86 82 37
alexandra.dima at univ-lyon1.fr<mailto:Alexandra.dima at univ-lyon1.fr>

Le informazioni contenute nella presente comunicazione sono di carattere strettamente confidenziale e sono riservate alla sola persona o società identificata come destinataria. Nel caso non siate la persona destinataria Vi informiamo che ogni divulgazione, copia o azione intrapresa sulla base delle informazioni contenute nella presente mail è proibita e sarà perseguita nei termini di legge. Qualora riceveste questa mail per errore, del quale ci scusiamo, Vi preghiamo di darcene immediata comunicazione rispondendo a questo stesso indirizzo e-mail e di cancellarlo definitivamente dal vostro computer.

This communication is for use by the intended recipient and contains information that may be privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended”, this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties.

Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/trevor.bond%40jcu.edu.au
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/gregory.stone%40utoledo.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20171030/0c726d2e/attachment-0001.html>

More information about the Rasch mailing list