[Rasch] Rasch model affected by item type or not

Agustin Tristan ici_kalt at yahoo.com
Fri Mar 12 02:07:06 EST 2010

Dear Parisa, 
You are talking about longitudinal DIF or DPF (Differential Item Functioning or Differential Person Functioning). Let's consider an experiment or educational process like this:
Let's suppose that "Diagnostic" and "After intervention" tests are the same or practically similar (same characteristics, number of items, equivalent items, etc.)
Then you must expect that your focal population will have an initial measure at the diagnostic and a different measure after the intervention, specially if your intervention is working.

Initial and final values may change from LOW-TO-HIGH if your experiment concerns to a learning of new abilities or competencies improving the performance of the students or the patients, but can be from HIGH-TO-LOW if your intervention is focused on reduction of "bad" habits, elimination of defficencies, etc.
In this experiment your test will show differences that must reflect the quality of your experiment and intervention.

Let's suppose that you have the same test for 3rd grade students, and you have diagnostic and after intervention and suppose a (very small) four items test. You may have the item distribution like this:
For diagnostic                                  After test
item 3 +1.0                                      item 2 +1.0
item 1 +0.5 --->INTERVENTION-->     item 1 +0.5
item 4 -0.5                                       item 3 -0.5
item 2 -1.0                                       item 4 -1.0
The change of position will reflect the relative difficulty of a task before and after the intervention. The "absolute" difficulty value will be different because a software like Winsteps will re-center all the test to a mean of 0.0

In such conditions this shows what I understand you mean by "pragmatic" distribution of the items, in fact the position changes and could seem incomprehensible. But it is not.
What happens here is that despite you have a single population, it has two different characteristics according to each stage. You must see that you now have two focal groups: before and after your intervention.

If you receive a new group 3rd grade group of studente for a second round of your experiment, what you must expect is that the persons BEFORE in round 1 and round 2 must be comparable to what we must call FOCAL GROUP BEFORE and after your intervention groups from round 1 and 2 must be comparable to FOCAL GROUP AFTER.
The item calibrations for FOCAL GROUP BEFORE could be very similar among groups at this stage, and if it is the case you have the INHERENT distribution (what you call "GENERIC") of the test for this FOCAL GROUP BEFORE, that will be good for many years with 3rd group students if nothing more changes.
After your intervention, the item distribution will correspond to the FOCAL GROUP AFTER if nothing more changes.
If you use this same test but with 5th grade students, then they represent a different group that is not the FOCAL GROUP BEFORE AND AFTER defined here and may produce the inherent item distribution for 5th grade.
The new  (different) experiment is:
In this experiment the same test A, will appear as "inherently difficult" for group 1, "inherently centered" for group 2 and "inherently easy" for group 3. If it is not the case hen your test is not enough sensitive to persons competencies. Please note that it will be the test inherent properties that will be a combination of the item's inherent properties for each group.


For instance, if you have read the paper about the test design line, you could see in figure 6 the comparison between two different populations (the focal population and a defficient population). You have two test design lines but each one is inherent for a specific population, the relative position of the items in the lines will be different and with this you may calculate DIF or DPF according to your needs. 

I did the same comparison with international tests such as NAEP, TIMMS, PISA and it is clear that the same test has an inherent distribution for the different focal populations an so longitudinal studies can be done. For instance with NAEP, the test is focused on 9 years old students, and it becomes easier for 13 years old and very easy for 17 years old. Try to use the test design line on your test and you will see how useful it is to decide the behavior of data.
If you need I can help you with your test.
Hope this helps.


Ave. Cordillera Occidental  No. 635
Colonia  Lomas 4ª San Luis Potosí, San Luis Potosí.
C.P.  78216   MEXICO
 (52) (444) 8 25 50 76
 (52) (444) 8 25 50 77
 (52) (444) 8 25 50 78
web page (in Spanish): http://www.ieia.com.mx
web page (in English) http://www.ieesa-kalt.com/English/Frames_sp_pro.html

--- On Thu, 3/11/10, Parisa Daftari Fard <pdaftaryfard at yahoo.com> wrote:

From: Parisa Daftari Fard <pdaftaryfard at yahoo.com>
Subject: Re: [Rasch] Rasch model affected by item type or not
To: "Agustin Tristan" <ici_kalt at yahoo.com>, "rasch list" <rasch at acer.edu.au>
Date: Thursday, March 11, 2010, 2:14 AM

Dear Agustin,
Hi, It's nice to hear from you.
Thank you so much for the link and your explanation. I found them informative. What if Learners ability changes over time, would you again expect to see diverse hierarchical order in you test as your example demonstrated below that inserting different item type would give us different fit for other items. what if the same person changes over time through learning. Do you expect to see item hierarchy to be generic or pragmatic.

--- On Thu, 3/11/10, Agustin Tristan <ici_kalt at yahoo.com> wrote:

From: Agustin Tristan <ici_kalt at yahoo.com>
Subject: Re: [Rasch] Rasch model affected by item type or not
To: "Parisa Daftari Fard" <pdaftaryfard at yahoo.com>, "Rasch" <rasch at acer.edu.au>
Date: Thursday, March 11, 2010, 8:14 AM

Dear Parisa, 
For me it is clear that the measure or difficulty of an item is an "intrinsic" or "inherent" property while fit is something "induced" by the sample of persons. That is why I can have an item bank classified by the difficulty of the items (and subject, taxonomic level and other elements), and  the test design must follow some rules for validity.
Lets suppose you have this (very small) item bank:
item 1 - difficulty +1.0
item 2 -              +1.5
item 3                -0.2
item 4                +1.3
item 5                -1.1
item 6                 -2.0
item 7                 +0.4
and you build a (very small) test [A] with items 1, 2 and 4, you  will have a configuration  different than other test [B] with items 1, 3  and 6. Test difficulty is different and also  you will see that item 1 is the easiest item in test [A], but the hardest in test [B]. So the test configuration changes with the choice of the items. 
The Rasch model will produce persons measures for test [A] and [B] that must be very similar, even under different tests.
In different administrations of the tests, it is expected that item 1 will have a difficulty close to +1.0 (plus/minus the error of the particular group of answers).
Item difficulty is a very "stable" parameter for a focal group, under different conditions of a test, position of the item in the test and other particular situations.
It is clear that if the group is not the focal one (different competencies, gender, region or country, age, etc.) the item calibration may differ from the original expected calibration and you may report DIF among those groups.
For me, that is the reason to design a test with item calibrations, following a design model, not just a random choice of the items. The concept of "Test design line" may be useful (something derived from the design proposed by Wright and Stone).
Wright B.D. & Stone, M.H. (2004) Making measures. The Phaneron Press. Chicago. pp 35-39
Tristan L.A. & Vidal U.R. (2007) Linear model to assess the scale's validity of a test. AERA Meeting, Chicago. Session: New developments in measurement thinking. SIG-Rasch Measurement. Available thru ERIC: 
In addition, it is possible to see that the item type may produce different calibrations for the same ability. I have produced an analysis of difficulty ranges for items of different type: simple multiple choice, ordering or hierarchy, matching columns, cloze, with graphics or only with text, and so forth. The range of difficulties is type-dependent, so if you switch between two items it is important to check that the item calibrations be similar. And the best of this analysis is that the paper is in Spanish! if you wish I can send you a copy of it, just let me now. Probably I have to translate it in the future..

Hope this helps.

Ave. Cordillera Occidental  No. 635
Colonia  Lomas 4ª San Luis Potosí, San Luis Potosí.
C.P.  78216   MEXICO
 (52) (444) 8 25 50 76
 (52) (444) 8 25 50 77
 (52) (444) 8 25 50 78
web page (in Spanish): http://www.ieia.com.mx
web page (in English) http://www.ieesa-kalt.com/English/Frames_sp_pro.html

--- On Wed, 3/10/10, Parisa Daftari Fard <pdaftaryfard at yahoo.com> wrote:

From: Parisa Daftari Fard <pdaftaryfard at yahoo.com>
Subject: [Rasch] Rasch model affected by item type or not
To: "rasch list" <rasch at acer.edu.au>
Date: Wednesday, March 10, 2010, 9:19 PM

Dear Rasch list members,
I came up with a question and I hope to have your reply. Does test configuration (the ordering of the item types and the adding or subtracting the item types) affect Rasch outcome in terms of the hierarchical order in item map? 
Does Rasch approach the item generically or pragmatically? by Generic I mean that adding one item type (not the total number but the cognition related item type) would not affect the pattern of items because the statistics is doing something on the nature of the item.
I am not sure if I am clear?

-----Inline Attachment Follows-----

Rasch mailing list
Rasch at acer.edu.au


Please consider the environment before you print
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20100311/f237f7b2/attachment.html 

More information about the Rasch mailing list