pbarrett at hoganassessments.com
Tue Oct 30 09:05:13 EST 2007
From: rasch-bounces at acer.edu.au
[mailto:rasch-bounces at acer.edu.au] On Behalf Of renselange at earthlink.net
Sent: Tuesday, October 30, 2007 5:28 AM
To: Anthony James; rasch at acer.edu.au
Subject: Re: [Rasch] Fan
My take on this is "so what" as well. Getting "scores" is hardly
the main achievement of Rasch modeling. There simply is little
appreciable difference between raw score sums and the Rasch model, as
well as estimates obtained via one, two, or three-parameter logistic
There you have it in a nutshell, why many scientists (vs
psychometricians) concerned with explaining substantive phenomena,
sometimes using questionnaire data, are not concerned with itemmetric
Edumetrics is frankly "technician work"; the utility and benefits of
probabilistic modeling of item response data seems to be irrelevant to
explaining or predicting phenomenal outcomes more accurately. Whilst
other benefits might accrue to those primarily concerned with
examinations, licensing, or other such pragmatic ventures, as a
scientist, you are concerned with minimizing substantive error in any
predictions you might make on the basis of theory or even "dustbowl"
As you state Reese, there is no substantive difference between
probabilistic (latent variable) or deterministic (raw score) scale
scores. Which, by default, means all of the Metametrics Lexile and
Quantile work could have been achieved using conventional
deterministic/actuarial score algorithms (now there's a sobering
Whether or not IRT is a more "efficient" method of going about these
activities seems to be the issue - which is an important one
But what use is "efficiency" when the magnitudes obtained are equivalent
to just summing up item responses? Person reliability can easily be
computed without any fuss using conventional item difficulties and rank
order response discrepancy (between item difficulty ranking and observed
item responses). You don't need a "data model" for this simple
heuristic. Cross-validation sorts out replicability.
I'm not so sure IRT is any more efficient at all than a sensible and
intelligent approach to construct assessment, except perhaps where
edumetrics is concerned - but even here, where are the advantages to be
seen, and by whom? Are educational standards improving because of IRT
methods being applied?
As far as I can see, the only reason IRT came into existence was because
Rasch and others were fascinated with test theory, statistical data
models, and fundamental measurement, instead of the actual phenomena of
interest, and capturing these directly without the need for bucketloads
of assumptions or practically incomprehensible algebra and math.
Indeed, what is apparent (to me at least) is that the major source of
explanatory/predictive error is not in the "measurement" metrics per se,
but in the concepts we invoke to explain phenomena, and how we actually
try to measure or assess "magnitudes" on them.
And, I haven't forgotten my reply to Andrew or Moritz(!) - but I'm flat
out on an interrater discrepancy/reliability program right now - for the
next 5 days or so - but I couldn't resist replying to this as I've seen
the same "even though there is no difference, this is still wonderful
stuff" statements in a recent paper comparing Ideal Point unfolding
models, dominance IRT, and simple raw score methods ...
Cherneyshenko, O.S., Stark, S., Drasgow, F., & Roberts, B.W. (2007)
Constructing personality scales under the assumption of an Ideal Point
response process: toward increasing the flexibility of personality
measures. Psychological Assessment, 19, 1, 88-106.
See page 103, Col 2, 2nd para beginning "Note also that examination of
correlations .. " ... which contains the sentence ... "The use of ideal
point methods, however, is unlikely to yield increases in
and then read the 1st para of the discussion ..
Their argument, as far as I can see, is more about the
technicalities/differential pragmatic benefits of test construction
...and the consequences on particular measurement ranges within a scale.
But, if the criterion prediction remains the same - what is the achieved
real-world or scientific benefit?
I have the feeling they are discovering that the real problem is not
with the technology of assessment, but with the very nature of the
constructs we try to assess.
However, this is not a call to undo or dump IRT - as those like
Metametrics, ACER, and others who have generated impressive measurement
and assessment systems around IRT are testament to its functionality,
utility, and profitability.
This is merely a statement which says that it is just another way of
"modeling data" and constructing test scores which has proved to have no
tangible advantages over any other "intelligent" method of analyzing
data to make important decisions.
If you think think this is unfair, consider the development of the
"perceptron" (The perceptron is a type of artificial neural network
invented in 1957 at the Cornell Aeronautical Laboratory by Frank
Rosenblatt .. Wikipedia). In 50 years look at what this simple
"algorithm" for a neural net has spawned .. an entire new area of
psychological science investigation "connectionist cognitive
psychology", a neurophysiological model for developmental brain networks
and brain function, artificial intelligence, computational biology, and
literally physical working models of the development of human
intelligence, stock market trading algorithms, seriously accurate market
research/brand analysis prediction models, security camera
person-feature detection, biometric sensors, handwriting, and speech
detection, etc, etc..
And what has IRT spawned in the same period? A few tests which are
almost indistinguishable from the old tests, hundreds of books and
thousands of papers on test theory and tiny one-shot applications for
which CTT would have done the same job just as well - and was probably
doing so, and a few major-league companies who sell the technology and
some of its results to others.
If I am to do any work as a psychological scientist - I want to be the
inventor of a "perceptron", and not the inventor of yet another "test
theory" - which is why methods which predict or measure no better than
simple "back of the matchbox" algorithms or heuristics are not given the
"time of day" by those like myself who want to make big strides in our
So, in response to:"The main problem here is that non-IRT folks often do
not (or cannot) even conceive of the possibilities afforded by test /
form equating, item / person fit and bias testing, Hence, the advantages
of all this appear rather "academic" ( read: useless). " ..
You now have a better idea why some like me really don't bother with IRT
at all anymore - it's virtually irrelevant to substantive progress in
psychological science. And, if you need to make decisions using test
scores, why bother when a simple sum score interpreted using
representative population norms and other relevant and somewhat more
obvious item and scale statistics does "the business" just as well as
scores and indices associated with the "latent variable" work which only
a "psychometric wizard" can seemingly convey!
What made the neural net so easy to "sell" was that it predicted
valuable and substantive outcomes for which no other model came close.
That's why it was grabbed with open arms by those "non-academics and
academics" alike who looked at the results and thought " I've gotta have
this ". Sheesh, even Shepard, Kruskal, Lingoes, and Guttman's non-metric
Multidimensional Scaling algorithms had a bigger impact in 10 years from
their inception than IRT.
Until IRT, unfolding, or any test theory model can produce that "wow"
effect - they will remain "niche" activities for a minority of
individuals in a tiny sub-domain of the social and human sciences.
Regards ... Paul
Paul Barrett, Ph.D.
2622 East 21st Street | Tulsa, OK 74114
Chief Research Scientist
Office | 918.749.0632 Fax | 918.749.0635
pbarrett at hoganassessments.com
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 29886 bytes
Url : https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20071029/86fd7483/attachment.bmp
More information about the Rasch