[Rasch] FACETS design

Ricardo Primi rprimi at mac.com
Wed Feb 13 03:50:03 AEDT 2019


Dear List members, Fabio, Rense and Trevor

Thanks a lot Fabio, Resne and Trevor for a very interesting discussion!
I am aware that the design I proposed it is impossible to create connectedness and link person measures across countries (as was nicely portrayed by Fabio’s table). But that is the challenge. I am very interested to know if anyone have faced this challenge or a similar one before and what was proposed as a complementary design to try to solve this linking problem.

By reading literature (Eckes, 2015; Myford, & Wolfe, 2003, 2004) I found that one possible way would be: a two-step anchor/linking design. Imagine we have a writing essay task scored by raters using 4 items (quality, cohesion, originality and correct use of language) with a 3-point scale rubric. Assume we have a single dimension of writing ability (I know this could not be true but let’s assume that just as an example to focus in in the aspects of the problem) We could select writing essays from, say, 80 persons, representing various levels of ability (low, medium, high) and translate into various languages and create a common task for all raters from all countries. I would call this anchor person group. In Step A we run 3 facets design person (anchors) vs items vs rater. All raters from all countries score this group. With this dataset we can then calibrate raters/item parameters. We may also investigate bias and DRF and DIF.  Therefore, we from step 1 we save raters and item parameters.

Then we go to step 2, that is the example I presented in earlier e-mail. But now we know rater/item parameters and we can fix them on the values found in step 1. So although there is no connectedness we “borrow” the connection found in step 1. I see this similar to anchor test design. The difference is that this is actually a common person design. In step two we do a design similar to fixed item/rater calibration and putting other parameters specifically person measures on the frame of reference found in Step1.

What do you think? Should this work? Other ideas for designs?



Best


Ricardo

> Em 12 de fev de 2019, à(s) 00:37, Bond, Trevor <trevor.bond at jcu.edu.au> escreveu:
> 
> Dear Ricardo
> For all the reasons explicit and implicit in the advice of Fabio and Rense, the nesting of tests and raters (and languages?) within countries condemn you to unconnected subsets of data. It seems to me that the equivalence of items across language must be a first step. But how that is impacted on by raters remains open to debate. Are two languages used in any country? Are any languages used across countries? Are any raters sufficiently bi-lingual to rate across languages? Is more than one rater used in every/any country?  Seems that both language and country need to be considered as facets, and it would be very difficult , , at best, even with post facto allocations of raters, to double mark across these subsets.
> Collegially
> Trevor
>  
> From: Rasch <rasch-bounces at acer.edu.au> on behalf of Rense Lange <rense.lange at gmail.com>
> Reply-To: rasch <rasch at acer.edu.au>
> Date: Tuesday, 12 February 2019 at 3:51 am
> To: rasch <rasch at acer.edu.au>
> Subject: Re: [Rasch] FACETS design
>  
> My basic point is that Facets was created to avoid having all raters grade all questions - missing data are not just “smoothed over,” rathe they simply cause no problem for Facets. “Structural missing data” turn out to be no real problem, although of course fewer missing data is always better. The nice thing about Facets is that it will tell you what the SE of the parameter estimates is, including that of the rater severity - we don’t have to guess at this - and if you don’t like the SE you know what to do! (more raters, more items, whatever) 
>  
> The main issue is that the raters have to be “connected” in a graph theoretical sense - e.g., not all Raters A and B both have to rate Languages 1 and 2, as long as there is another language that is rated by common raters that connect to languages 1 and 2. So, the requirement remains that some raters have to grade in at least two languages. In theory, languages just need one rater who can connect them to the other languages, but having more would be better (again, look at the relevant SE)
>  
> Conversely, if you don’t have raters grading in two languages, then Facets will report that there are “subsets” - each with their own frames of reference such that logit values are no longer comparable - the various facets seem to have different offsets. This is what would happen in the design that you sketch.
>  
> The issue that you mention for raters, also exists for items! How do you know that a translated question is just as hard as the original? If you cannot find graders that are just as severe in two languages, you may be able to solve the issue by using items that are known to be equally difficult in both languages - or different by known amounts. You would then anchor the items and everything would thereby be unambiguously determined. But, now you need students for which tests in two languages are of equal difficulty. Finding such students is easier to do it would seem, but not by much, than finding equivalent raters. 
>  
> There must be language experts who have tried some of this - I have not - and I would like to know what - if any - solutions there are. I seem to recall that equating raters across contexts is not very successful. What did Pisa do? 
>  
> It may well be that you will be forced to simply assume that the raters and items on average are equally severe across countries …. 
>  
> This is interesting, please let me know what you end up doing and how it all worked out!
>  
> Rense
>  
> On Feb 12, 2019, at 1:18 AM, Fabio La Porta <fabiolaporta at mail.com <mailto:fabiolaporta at mail.com>> wrote:
>  
> Rense, 
>  
> in the first statement I specified "theoretically". In practice, the estimation of rater severity is surely possible in the presence of missing rater data, but in the presence of too many missing data the model may not converge and/or
> the precision of rater severity estimates may be too imprecise.
>  
> Maybe this is not the case of the within-country design, although I doubt very much that a facet design could be implemented across countries, given that there are no overlapping raters across countries. 
>  
> Indeed, if I understood well Ricardo's design, it seems that subjects of Country A are rated by judges of their own Country, but not by raters of Countries B, C, D; subjects of Country B are rated by raters of their own Country, but not by raters of Countries A, C, D; and so on and so forth. 
>  
> Considering the above example, the resulting data matrix would be the following, with no raters (or subjects) overlapping across countries:
>  
>>>> Country Raters
> Country subjects     A     B    C    D
>>> A         X
>>> B                 X
>>> C                        X
>>> D                               X
>  
> Could such a facet design lead the model to converge? And if so, how much could we trust the precision of the rater severity estimates across countries in the presence of so many structural missing data?
>  
> Does this make sense?
>  
> Fabio 
>  
>  
>  
> Il giorno lun 11 feb 2019 alle ore 16:29 Rense Lange <rense.lange at gmail.com <mailto:rense.lange at gmail.com>> ha scritto:
>> I mostly agree, but the first point seems incorrect. Facets was in fact designed for the case where raters grade only partly overlapping subsets of respondents. See RMT for examples of very minimal rater overlap …
>>  
>> Rense Lange
>>  
>> On Feb 11, 2019, at 7:54 PM, Fabio La Porta <fabiolaporta at mail.com <mailto:fabiolaporta at mail.com>> wrote:
>>  
>> Hi Ricardo,
>> In order to conduct a multifacet analysis the theoretical requirement is that all the raters examine all the subjects.  <<<<< NO
>>  
>> For sure, you cannot apply this design across countries. Here, you will want to test the requirement of invariance of the item difficulties trans culturally, i.e. the absence of DIF by Country.
>>  
>> Regarding the within-Country facet design, its feasibility depends on the amount of missing rater data.
>>  
>> I Hope this helps
>> Fabio 
>>  
>> Il giorno lun 11 feb 2019 alle 12:27 Ricardo Primi <rprimi at mac.com <mailto:rprimi at mac.com>> ha scritto:
>>> Dear list
>>> 
>>> I have a question about FACETS. I am working in a project where we have tasks translated into various languages administered to students within several countries. Raters are nested within countries, that is, a group A of raters of country A will score tasks of country A students, a group B of raters of country B will score tasks of students of country B students and so on.
>>> 
>>> I have two questions: 1) Do you know any similar study with these characteristics that employed FACETS? Specifically any idea of a design that can be employed to create connectedness and link measures, that is, create a metric that permits to compare countries? 2) The main interest is to estimate country means as reliably as possible. Do you know any study that address how person reliability (level 1 in a multilevel design) is related to group mean reliability (level 2)? A specific question is how design features (number of tasks, number of raters per task, rater agreement) affects this reliability of means at second level.
>>> 
>>> Thanks a lot  in advance 
>>> 
>>>  
>>> 
>>> Ricardo Primi
>>> 
>>> USF, Brazil
>>>  
>>> ________________________________________
>>> Rasch mailing list
>>> email: Rasch at acer.edu.au <mailto:Rasch at acer.edu.au>
>>> web: https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com <https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com>
>> -- 
>> Fabio La Porta, MD PhD
>> ________________________________________
>> Rasch mailing list
>> email: Rasch at acer.edu.au <mailto:Rasch at acer.edu.au>
>> web: https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com <https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com>
>>  
>> ________________________________________
>> Rasch mailing list
>> email: Rasch at acer.edu.au <mailto:Rasch at acer.edu.au>
>> web: https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com <https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com>
> ________________________________________
> Rasch mailing list
> email: Rasch at acer.edu.au <mailto:Rasch at acer.edu.au>
> web: https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com <https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com>
>  
> ________________________________________
> Rasch mailing list
> email: Rasch at acer.edu.au
> web: https://mailinglist.acer.edu.au/mailman/options/rasch/rprimi%40mac.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20190212/2813f42d/attachment-0001.html>


More information about the Rasch mailing list