```Dear Rense and Tom,

Thank you so much for your comments. The question is that whether native speakers varied in the way they rate responses. and if there is any kind of bias among them.

We reach answers anyway. But I am not sure if this answer is correct conceptually as you mentioned earlier. Facet produces four important information that needs to be reported (besides many other good things). They are item fit, person fit, vertical ruler table, Bias table. My problem is with item fit and measure statistics. There in that table we have two items with the highest measure. I guess the only interpretation we need to make is that these situations were difficult for the examinee that raters give 1 or 2 (from 5) to them. But the reality is that we have only one questionnaire or interview and all raters rate it. And we want to see if they vary in their decision or not. Do you think this scoring (as raters are more important than examinee) is coding or rating?

Best,
ParisaTable 3  24 items Measurement Report  (arranged by mN).

+-----------------------------------------------------------------------------------------------------------------------+
|  Total   Total   Obsvd  Fair-M|        Model | Infit      Outfit   |Estim.| Correlation |       |                     |
|  Score   Count  Average Avrage|Measure  S.E. | MnSq ZStd  MnSq ZStd|Discrm| PtMea PtExp | Group | Nu 24 items         |
|-------------------------------+--------------+---------------------+------+-------------+-------+---------------------|
|    71      51       1.4   1.32|   3.45   .27 | 1.64  2.7  1.45  2.0|  .51 |   .58   .32 |     2 | 10 2                | in subset: 2
|    73      51       1.4   1.45|   2.95   .27 | 2.00  4.0  1.80  3.4|  .32 |   .41   .33 |     4 | 19 4                | in subset: 4
|    97      51       1.9   1.86|   1.76   .22 | 1.56  2.1  1.45  1.8|  .58 |   .59   .40 |     1 |  6 1                | in subset: 1
|   105      51       2.1   1.93|   1.59   .20 |  .77 -1.0   .74 -1.1| 1.39 |   .50   .43 |     2 | 12 2                | in subset: 2
|   107      51       2.1   2.04|   1.33   .20 |  .93  -.2   .85  -.5| 1.17 |   .52   .43 |     1 |  4 1                | in subset: 1
|   119      51       2.3   2.27|    .90   .18 |  .99   .0  1.02   .1| 1.14 |   .35   .46 |     1 |  1 1                | in subset: 1
|   115      51       2.3   2.31|    .82   .19 |  .68 -1.5   .69 -1.5| 1.42 |   .53   .45 |     3 | 18 3                | in subset: 3

I think your problem is conceptual.  Is there a single concept that each item taps and such that the response numbers indicate more or less of that concept?  If that is not true then a Rasch analysis is not appropriate.  If it is true then you simply analyze the data with Winsteps or Facets with two facets.  Then do an analysis of variance with type of respondent as the independent measure.

Tom

>Dear Rasch members
>I guess I am a bit confusing. Let me explain the situation more. In all research on Speech act, several examinees give answers to a questionnaire. Their replies are rated on a scale (for example from zero to 3). This means that the responses are rated because always 3 means the response is better. In this research there is noexaminees but a questionnaire. These raters are to say their idea about the appropriacy of the response to that situation. This makes the questionnaire look like a multiple choice.
>If I run Facet on such a data, I will have an item measure table in which some of the items are reported as very difficult (misfit) not because the raters do not agree on their rating but because the answer to that item was 1 (very unsatisfactory). I guess this is not real misfit but the function of rating score. That is if the option "very unsatisfactory" was coded as 5, the item might not be reported as a difficult item.
>I hope this time I am able to explain the situation. Because this is a PhD dissertation of my friend, I hope to have your help and comments that if we are correct to use facet for such a rating situation or not.
>Best,
>Parisa
>Dear Rasch Members,
>>I have faced a complex situation and I hope to have your kind decisive comments. In a paper we are trying to investigate the possible bias of some native speakers answering a questionnaire on speech act. In that questionnaire, there are several situations and only one answer for each situation. Raters need to rate each response to that situation on five options likert scale  with 1 = very unsatisfactory, 2 = unsatisfactory, 3 = somehow appropriate, 4 = appropriate, and 5 = most appropriate.
>>for example,
>>1.   While eating in a restaurant, you notice an insect in your food. What would you say to complain to the waiter?
>>
>>Answer:Hey man, come here. What the hell is it? Where on the earth do you cook your food?
>>1. very unsatisfactory  2. unsatisfactory  3. somehow appropriate  4. appropriate  5. most appropriate
>>
>>
>>1.      The teacher gave you a lower-than-expected grade in your exam. What would you say to complain?
>>
>>Answer:Excuse me, something might have been wrong. It’s not my grade, sir.
>>1. very unsatisfactory  2. unsatisfactory  3. somehow appropriate  4. appropriate  5. most appropriate
>>The problem here is that the possible answer to some of the questions is 2, for some questions is 3,  for some other is 5 and the like. Of course, native speakers (raters) are the norms and there is no wrong or correct answer to each question. What we are trying to see is the way different native speakers from different cultures approach the appropriacy of the answer to that specific situation (bias and variation)
>>My Question is that What kind of scoring is this? nominal, ordinal or interval?
>>My second question is that is it possible to search for possible Bias in this type of research through Facet?
>>Parisa Daftarifard
