[Rasch] FW: c-parameter
David Andrich
david.andrich at uwa.edu.au
Wed Jun 13 15:08:21 EST 2012
Again. No contradiction to what Ray is saying, but not sure that the paradox is explained. Maybe rephrasing it in terms of standard errors will make it clearer.
(i) Suppose that there are 100 items in a test which fit the Rasch model perfectly.
(ii) You analyse these data and get a person estimate AND standard error (Inverse square root of information), for each person.
(iii) Suppose now that for some reason or other, with exactly the same items and persons (item difficulty parameters and person parameters), dependence is induced in clusters of items
(iv) You analyse these items in (iii) above by first summing the items in clusters and using the partial credit parameterization of the Rasch model.
You will observe that the standard errors of the person estimates in (iv) are smaller than in (ii).
But the standard errors should be the smallest with items with independence than with dependence???
David
size=2 width="100%" align=center tabindex=-1>
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Adams, Ray
Sent: Wednesday, 13 June 2012 10:38 AM
To: rasch
Subject: Re: [Rasch] FW: c-parameter
David (and others).
I agree with what you are saying.
It is useful to clarify our language though. The information is a function, so talking about items with more or less information needs to be well defined. I'd suggest that we can talk about the maximum amount of information that an item can provide and the total amount of information an item can provide. The maximum is the ability were the information function for an item reaches its highest point. The total is the area under the whole of the information function.
For SLM the maximum is always 0.25 and it is at ability=difficulty. The area under the function is 1.0. The shape of the function is the same for all items.
For PCM items the shape differs across items and the maximum, while a function of the item parameters. What is common about the functions, however, is that the area under them is equal to the maximum score. I've always interpreted this as under the Rasch model all items must provide an equal total amount of information - there is a sense in which they must be equally good. Furthermore the test designer decides on that information when they set the number of response categories.
With regard the paradox, I came across it in 1983, when I developed an adaptive testing algorithm for the PCM. Loosely speaking, if an item has item parameters that are not in numerical order then the information function will have higher maximum values than for items where they are in ascending order. In both cases the total information is the same, but the former item stacks all the information at one point, whereas the latter item spreads it out, so the item is useful over a wider range. The implication in adaptive testing was that if you used an algorithm that simply picked items that would minimise student standard errors then it kept choosing items with item parameters that were not in numerical order.
Ray
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of David Andrich
Sent: Tuesday, 5 June 2012 5:09 PM
To: rasch; rasch
Subject: Re: [Rasch] FW: c-parameter
Re the final point of Ray's note - and this is not to contradict what he says, information on the continuum of a person's estimate is an important notion in relation to the items. However, care needs to be exercised in interpreting information in explaining and justifying conclusions especially in relation to thresholds when doing any manipulations on the data. In one case, it leads to a paradox. For example, suppose that you have discrete dichotomous items but you suspect local dependence between some of these items - perhaps sets of them are based on the same reading passage. Then you can account for this dependence by simply summing the scores on the items in each passage to forming a testlet, and running the Rasch model (partial credit parameterisation though the idea of either partial credit is abstract here) on the testlets. [If there is local dependence this is one of the things to do that is justified. Items severely locally dependent will show high discrimination (or over-discriminate relative to the average), and by necessity because they deviate from the average, some items will show low discrimination (or under discriminate relative to the average.). In this case simply using the 2PL would not be the correct thing to do to account for the local dependence.]
Now if there is dependence, and other factors equal (difficulty, person distribution, etc), then the thresholds (which have no particular substantive meaning on the continuum in this situation in contrast to proper rating or partial credit items), will be closer than they would be if the items were independent and summed in the same way to form testlets. The shrinkage of the thresholds reflects that they have absorbed the dependence. For each number of items in a testlet, there is a value below which the average distance between thresholds definitely implies dependence (It is a positive value). These values for 2 to 8 dichotomous items in a testlet are provided in Andrich (1985). If there is a lot of dependence, the thresholds of the test let will even be reversed.
Here is a paradox re information. Again, other factors equal, there can be no more information in a set of items than when they are statistically independent. However, when we form testlests with dependent items, the Expected Value curve is steeper than when we form testlets with independent items. (I agree with Ray, Steve and others that it is not ideal to refer to this curve as characterising discrimination in the way it is normally understood).
With this steeper curve for the testlet with dependent items, the calculation of information comes up greater with dependence than with independence! But the information with independent items should be the maximum information?
There is an explanation to this paradox (and as usual with a paradox, something in the reasoning has been left out), but I will leave it for discussion. Clues are in Steve's comments and in the two references below that have been published in Rasch measurement transactions.
References
Linacre, J.M. (2006) Item Discrimination and Rasch-Andrich thresholds. Rasch Measurement Transactions, 20, 1, 1054.
Andrich, D. (2006) Item discrimination and Rasch-Andrich thresholds revisited. Rasch measurement Transactions. 20 (2), 1055 - 1057.
Andrich, D. (1985). A latent trait model for items with response dependencies: Implications for test construction and analysis. In S. Embretson (Ed.), Test design: Contributions from psychology, education and psychometrics. Academic Press, New York. (Chapter 9, pp. 245-273.)
David Andrich, BSc MEd W.Aust., PhD Chic, FASSA
Chapple Professor
david.andrich at uwa.edu.au<mailto:david.andrich at uwa.edu.au>
Graduate School of Education
The University of Western Australia
M428, 35 Stirling Highway,
Crawley,
Western Australia , 6009
AUSTRALIA
Telephone: +61 8 6488 1085
Fax: +61 8 6488 1052
www.gse.uwa.edu.au<http://www.gse.uwa.edu.au>
CRICOS Code: 00126G
Pearson Psychometric Laboratory<blocked::http://www.education.uwa.edu.au/ppl>
http://www.education.uwa.edu.au/ppl/courses<blocked::http://www.education.uwa.edu.au/ppl/courses>
http://www.education.uwa.edu.au/raschconference
www.matildabayclub.net<http://www.matildabayclub.net>
<hr< st1:PersonName=""> size=2 width="100%" align=center tabindex=-1>
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au] On Behalf Of Adams, Ray
Sent: Monday, 4 June 2012 2:16 PM
To: rasch
Subject: Re: [Rasch] FW: c-parameter
I've always thought it unhelpful to speak of "the" discrimination of an item. The discriminating power of an item (even under the Rasch model) is a function of ability. ie item has discrimination x and ability level y. It is best to think about the item discrimination in terms of the information function which is the derivative of the expected score curve with respect to ability.
SLM items have information functions that are equivalent in shape and they peak at a value of 0.25, when ability equals the item difficulty. No items are highly discriminating when they are not well targeted.
For PCM items the information functions are not of equivalent shape, so as Margaret implies the idea of "equal" discrimination doesn't have a lot of use with PCM items. The property that PCM items satisfy is that the area under the item information function is equal to one less than the number of categories in the item. So the area under the area under the information functions for dichotomous items equals 1. Under the Rasch model all items with k categories have an equal amount of total information (=k-1) the distribution of which over the ability dimension is a function of the item parameters. For PCM items, the highest peaks in the function typically occur at disordered Andrich thresholds. So if you do like to speak of discrimination of PCM items (I don't), the items with disordered parameter estimates are the most discriminating.
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au] On Behalf Of David Andrich
Sent: Monday, 4 June 2012 1:31 PM
To: rasch
Subject: Re: [Rasch] FW: c-parameter
Margaret. It might help if you define discrimination for a partial credit item? We know the discrimination in dichotomous item. I am not sure the two are compatible. Also, what is your formulation of a "generalized 2-parameter analysis"? Maybe you can give us the equation in an attachment.
Thanks
David
David Andrich, BSc MEd W.Aust., PhD Chic, FASSA
Chapple Professor
david.andrich at uwa.edu.au<mailto:david.andrich at uwa.edu.au>
Graduate School of Education
The University of Western Australia
M428, 35 Stirling Highway,
Crawley,
Western Australia , 6009
AUSTRALIA
Telephone: +61 8 6488 1085
Fax: +61 8 6488 1052
www.gse.uwa.edu.au<http://www.gse.uwa.edu.au>
CRICOS Code: 00126G
Pearson Psychometric Laboratory<blocked::http://www.education.uwa.edu.au/ppl>
http://www.education.uwa.edu.au/ppl/courses<blocked::http://www.education.uwa.edu.au/ppl/courses>
http://www.education.uwa.edu.au/raschconference
www.matildabayclub.net<http://www.matildabayclub.net>
<hr< st1:PersonName=""> size=2 width="100%" align=center tabindex=-1>
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au] On Behalf Of Margaret Wu
Sent: Sunday, 3 June 2012 6:27 PM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: [Rasch] FW: c-parameter
Rense writes: One down (guessing), one to go (discrimination)
Mike writes: Answer: OPLM with its fixed, but different, discrimination coefficients. http://www.cito.com/research_and_development/pyschometrics/psychometric_software/oplm.<http://www.cito.com/research_and_development/pyschometrics/psychometric_software/oplm.aspx>
There isn't such a divide between the Rasch model and the two parameter model. Let's consider an example. Suppose we have three Rasch partial credit items. Item 1 has scores 0,1,2,3,4. Item 2 has scores 0,1,2, and Item 3 has scores 0,1. So Item 1 has the highest weight in the test. That is, Item 1 has twice the weight of Item 2 (it counts 4 points in the test, while Item 2 only counts 2 points), and four times the weight of Item 1 in the test. How does one decide on the weight of a partial credit item? Or, the question could be, how does one decide on the maximum score of an item? Contrary to common perception that the maximum score of a partial credit relates to its difficulty, actually the maximum score of an item (or the weight of an item) depends on its discrimination. This makes sense. If an item does not discriminate, we want to weigh it down in the test. If an item discriminates highly, we want to increase the weight of the item in the test.
Suppose we run a generalized 2-parameter analysis, the scores of Item 1 may come out to be 0, 0.8, 1.1, 1.7, 2.2. These scores suggest that, instead of scoring Item 1 with 0, 1, 2, 3, 4, we should score these five categories as 0, 0.8, 1.1, 1.7, 2.2. Because the scores are not integer, we no longer have the Rasch model. However, if we round the scores to integers, and score the five categories, 0, 1, 1, 2, 2, (that is, we collapse original categories 1 and 2 as 1, and collapse categories 3 and 4 as 2), our new partial credit scoring is now 0, 1, 2. This is still a Rasch partial credit item. The consequence of this re-scoring will make the item fit the Rasch model better, and potentially increase the test reliability, because now we put more weight on "good" items (more discriminating ones), and less weight on "poor" items, and we still stay in the Rasch family.
In practice, many of us using the Rasch model have already been doing this. We examine the item analysis and decide on how to re-score or collapse categories to improve fit and reliability, without actually running a 2-parameter model . What we are doing is already trying to find the best weight for each item. So whenever you are using Rasch partial credit model, you are already giving different items different weights, so, in fact, you are already moving into the 2-parameter model concept.
The technical difference between a 2-parameter model and a Rasch partial credit model is that for a partial credit model, the category scores must be integer and there must not be jumps (e.g., we can't have 0,1,3,4). The 2-parameter model allows for non-integer scores. But by rounding the scores, you can have the best of the two worlds, and that's how OPLM can bring the best out of the one-parameter model.
Certainly, whenever you are using Rasch partial credit models, you are already using a special case of the 2-parameter model. We should realize that when items have different maximum scores, you are already providing different weights to the items, and that's the idea of 2-parameter models. But we can still stay within the Rasch family when we incorporate the item discrimination information.
Margaret
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au] On Behalf Of Mike Linacre
Sent: Sunday, 3 June 2012 5:29 AM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: Re: [Rasch] c-parameter
Rense writes: One down (guessing), one to go (discrimination)
Answer: OPLM with its fixed, but different, discrimination coefficients. http://www.cito.com/research_and_development/pyschometrics/psychometric_software/oplm.<http://www.cito.com/research_and_development/pyschometrics/psychometric_software/oplm.aspx>aspx
<http://www.cito.com/research_and_development/pyschometrics/psychometric_software/oplm.aspx>Mike L.
Mike Linacre
rmt at rasch.org<mailto:rmt at rasch.org> www.rasch.org/rmt/<http://www.rasch.org/rmt/> Latest RMT: 25:4 Spring 2012
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20120613/308d9448/attachment.html
More information about the Rasch
mailing list