Dear all, 

I need a piece of advice (again!). 

The context:
I have a many-facets design when students' responses are rated by examiners on a number of questions. An examiner may rate some of the responses of a person and other examiners may rate some other of his/her responses (the same examiner does not rate all the responses of a person). There is no double-rating (only one examiner rates a specific response of a specific person). It is a connected design (no disjoined subsets). We are using a PArtial Credit scoring. 

My question is:
We have a large number of misfitting students. Some of them are very misfitting (Infit MNSQR>3).  My aim is to investigate the differential rater effects e.g. differential severity (I do not plan to to take any decisions about the students). The fit of the questions is very good. The fit of the examiners is good. 
How does the inclusion of very misfitting students in the analysis affect (if it does) the quality of measurement for the examiners? 

I know, you will tell me to remove the misfitting persons and re-run the analysis. OK, I can do that. But  I want some theoretical advice, or any sort of guidance about existing literature on the issue. The general question is: how does the inclusion of misfitting students affect the qualit of measurement for other facets e.g. examiners?

Thank you for your time


