Yes, in summary, check against the Rasch model, amongst other things, tells
whether raw scores can be used to characterise persons, rather than simply
assuming that they can be used in this way.
My take on this is "so what" as well. Getting "scores" is hardly the main
achievement of Rasch modeling. There simply is little appreciable difference
between raw score sums and the Rasch model, as well as estimates obtained
via one, two, or three-parameter logistic models. 

Instead, I see easy and meaningful equating as Rasch scalings' major virtue
together with the availability of misfit and bias statistics and the option
to include additional independent variables a la Facets or Conquest.

The main problem here is that non-IRT folks often do not (or cannot) even
conceive of the possibilities afforded by test / form equating, item /
person fit and bias testing, Hence, the advantages of all this appear rather
"academice" ( read: useless). I've given too many presentations to educators
on test results in terms scale scores, only to be asked what the "real"
(i.e., raw) scores were. It all reminds me of trying to convince illiterates
that pencils might be useful.

Rense Lange

Hi there,
Fan (1998) argues that item and person statistics based on CTT and IRT are
very similar and the complexities of IRT are not worthwhile.
Has this been reacted to by the Rasch and IRT communities?
I’d be thankful for any sources that are brought to my attention.
Is replicating Fan’s research worthwhile?
This is my own view:
Item and person raw scores are the sufficient statistics in the Rasch model
and item and person measures are based on these and highly correlate with
raw scores. Therefore, one will always get Fan’s results. What fan
hasn’t apparently noticed is the robustness of the person and item
measures in the Rasch measurement to the idiocyncracies of the test and
He’s arguing on the basis of high correlations (obtained from large and
homogeneous groups) which simply mean the rank order of the items and
persons remain the same regardless of the testing theory. This can hardly be
S/he defines the high-ability group “as those whose scores fall within the
15th and 100th percentile ….low-ability group was defined as those whose
scores fall within the 0 to 85th percentile range…” (p.363). Most of
this sampling is shared. They have a lot in common. 
Ben Wright’s 1968 experiment (Memo#1) provides the acid test for the
comparison of the two models. 

