Dear colleagues, 
  I have a bit similar problem as Purya is having.  In my case, I am dealing with a dataset based on a performance appraisal practice.  I analyzed this dataset based on a Multifaceted Rasch model with the partial credit model, including evaluator, ratee, and item facets.  Twenty items were
 involved in this appraisal.
  The problems I am having are:
  (1) only the top three categories out of the five categories were used for ratings (The distributions of ratings were heavily and negatively skewed.  Over 70% of evaluators used the fifth category to rate ratees for some items.  There were many inversions of average category measures for the
 first and second lowest rating categories); and
  (2) the map of measure estimates showed that ratee measures were far greater than item measures, which makes it difficult to differentiate ratee competencies.   
  My task is to find solutions to these problems, and I am looking for any studies that were successful in improving existing rating scale points, ideally for performance assessment practices.  I have read several articles about creating effective rating scales.  All of them provided with
 guidelines for creating rating scales, as Dr. Stone previously posted in this discussion forum, but I was not able to find studies that actually used the guidelines and improved existing ratings scales in a real setting.  I would use those studies as templates to improve the performance-rating
 tool that I am investigating.  Thank you very much in advance.   
"Stone, Gregory" <gstone at UTNet.UToledo.Edu> wrote:      The number of rating scale points should be determined both by the theory and rationale of increase from less to more, and the ability of the instrument maker to define points consistently distinguishable by respondents.  In models that do
 not evaluate the use of rating scale points by respondents, simply adding more may lead to the faulty observation of improved precision.  However, in reality, more does not equate to greater precision.  The better questions to ask would be not whether moving from 5 to 8 would improve the measure,
 but rather:

Are the respondents using the current 5 point scale appropriately?

Do each of the current 5 points have clear and unique descriptors that are easy for respondents to use?

If additional points were to be added, could they be defined and described to respondents in a way such that respondents would consistently make use of each point across the scale?

This is likely the more critical question, assuming the first two are true.  We can add 3 points or 30 to a scale.  The problem is that human respondents simply cannot retain and consistently apply definitions associated with the points along lengthy, poorly defined scales.  Only when each point
 on the scale is unambiguous does more = potentially better.

As to the map, if there are gaps, then there are gaps in how your items cover the construct and more points will likely not solve that problem in any way. 

Dear colleagues,

In a Likert scale analysis with 5 points using Rasch RSM, the map

showed that most of the items are clustered at the lower end of the

scale and there are only two items at the top with many respondents

falling at the upper part of the scale where there are no items. Well,

the solution is to add some items to cover the gap

The other finding was that the distance between the last two

thresholds was more than 4 logits. Although this is within the

acceptable distance of 1.4 to 5 logits (Linacre, 1999) it's still a

wide gap.

I was just wondering if increasing the number of categories from 5 to

say, 8 can solve the problem of lack of items at the upper part of the

scale. Because it seems that respondents are capable of distinguishing

more subtle levels of the construct at the upper end of the scale. Can

increasing the number of categories be a substitute for writing more





