[Rasch] FACETS design

Bond, Trevor trevor.bond at jcu.edu.au
Tue Feb 12 13:37:09 AEDT 2019


Dear Ricardo
For all the reasons explicit and implicit in the advice of Fabio and Rense, the nesting of tests and raters (and languages?) within countries condemn you to unconnected subsets of data. It seems to me that the equivalence of items across language must be a first step. But how that is impacted on by raters remains open to debate. Are two languages used in any country? Are any languages used across countries? Are any raters sufficiently bi-lingual to rate across languages? Is more than one rater used in every/any country?  Seems that both language and country need to be considered as facets, and it would be very difficult , , at best, even with post facto allocations of raters, to double mark across these subsets.
Collegially
Trevor

From: Rasch <rasch-bounces at acer.edu.au> on behalf of Rense Lange <rense.lange at gmail.com>
Reply-To: rasch <rasch at acer.edu.au>
Date: Tuesday, 12 February 2019 at 3:51 am
To: rasch <rasch at acer.edu.au>
Subject: Re: [Rasch] FACETS design

My basic point is that Facets was created to avoid having all raters grade all questions - missing data are not just “smoothed over,” rathe they simply cause no problem for Facets. “Structural missing data” turn out to be no real problem, although of course fewer missing data is always better. The nice thing about Facets is that it will tell you what the SE of the parameter estimates is, including that of the rater severity - we don’t have to guess at this - and if you don’t like the SE you know what to do! (more raters, more items, whatever)

The main issue is that the raters have to be “connected” in a graph theoretical sense - e.g., not all Raters A and B both have to rate Languages 1 and 2, as long as there is another language that is rated by common raters that connect to languages 1 and 2. So, the requirement remains that some raters have to grade in at least two languages. In theory, languages just need one rater who can connect them to the other languages, but having more would be better (again, look at the relevant SE)

Conversely, if you don’t have raters grading in two languages, then Facets will report that there are “subsets” - each with their own frames of reference such that logit values are no longer comparable - the various facets seem to have different offsets. This is what would happen in the design that you sketch.

The issue that you mention for raters, also exists for items! How do you know that a translated question is just as hard as the original? If you cannot find graders that are just as severe in two languages, you may be able to solve the issue by using items that are known to be equally difficult in both languages - or different by known amounts. You would then anchor the items and everything would thereby be unambiguously determined. But, now you need students for which tests in two languages are of equal difficulty. Finding such students is easier to do it would seem, but not by much, than finding equivalent raters.

There must be language experts who have tried some of this - I have not - and I would like to know what - if any - solutions there are. I seem to recall that equating raters across contexts is not very successful. What did Pisa do?

It may well be that you will be forced to simply assume that the raters and items on average are equally severe across countries ….

This is interesting, please let me know what you end up doing and how it all worked out!

Rense

On Feb 12, 2019, at 1:18 AM, Fabio La Porta <fabiolaporta at mail.com<mailto:fabiolaporta at mail.com>> wrote:

Rense,

in the first statement I specified "theoretically". In practice, the estimation of rater severity is surely possible in the presence of missing rater data, but in the presence of too many missing data the model may not converge and/or
the precision of rater severity estimates may be too imprecise.

Maybe this is not the case of the within-country design, although I doubt very much that a facet design could be implemented across countries, given that there are no overlapping raters across countries.

Indeed, if I understood well Ricardo's design, it seems that subjects of Country A are rated by judges of their own Country, but not by raters of Countries B, C, D; subjects of Country B are rated by raters of their own Country, but not by raters of Countries A, C, D; and so on and so forth.

Considering the above example, the resulting data matrix would be the following, with no raters (or subjects) overlapping across countries:

Country Raters
Country subjects     A     B    C    D
A         X
B                 X
C                        X
D                               X

Could such a facet design lead the model to converge? And if so, how much could we trust the precision of the rater severity estimates across countries in the presence of so many structural missing data?

Does this make sense?

Fabio



Il giorno lun 11 feb 2019 alle ore 16:29 Rense Lange <rense.lange at gmail.com<mailto:rense.lange at gmail.com>> ha scritto:
I mostly agree, but the first point seems incorrect. Facets was in fact designed for the case where raters grade only partly overlapping subsets of respondents. See RMT for examples of very minimal rater overlap …

Rense Lange

On Feb 11, 2019, at 7:54 PM, Fabio La Porta <fabiolaporta at mail.com<mailto:fabiolaporta at mail.com>> wrote:

Hi Ricardo,
In order to conduct a multifacet analysis the theoretical requirement is that all the raters examine all the subjects.  <<<<< NO

For sure, you cannot apply this design across countries. Here, you will want to test the requirement of invariance of the item difficulties trans culturally, i.e. the absence of DIF by Country.

Regarding the within-Country facet design, its feasibility depends on the amount of missing rater data.

I Hope this helps
Fabio

Il giorno lun 11 feb 2019 alle 12:27 Ricardo Primi <rprimi at mac.com<mailto:rprimi at mac.com>> ha scritto:
Dear list
I have a question about FACETS. I am working in a project where we have tasks translated into various languages administered to students within several countries. Raters are nested within countries, that is, a group A of raters of country A will score tasks of country A students, a group B of raters of country B will score tasks of students of country B students and so on.
I have two questions: 1) Do you know any similar study with these characteristics that employed FACETS? Specifically any idea of a design that can be employed to create connectedness and link measures, that is, create a metric that permits to compare countries? 2) The main interest is to estimate country means as reliably as possible. Do you know any study that address how person reliability (level 1 in a multilevel design) is related to group mean reliability (level 2)? A specific question is how design features (number of tasks, number of raters per task, rater agreement) affects this reliability of means at second level.
Thanks a lot  in advance

Ricardo Primi
USF, Brazil

________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com
--
Fabio La Porta, MD PhD
________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com

________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/fabiolaporta%40mail.com
________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/rense.lange%40gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20190212/e29a1dea/attachment-0001.html>


More information about the Rasch mailing list