[Rasch] Rasch analysis of interval data
Paul Barrett
pbarrett at hoganassessments.com
Wed Jul 23 00:15:32 EST 2008
________________________________
From: Mark Moulton [mailto:markhmoulton at gmail.com]
Sent: Monday, July 21, 2008 5:09 PM
To: Paul Barrett
Cc: rasch at acer.edu.au
Subject: Re: [Rasch] Rasch analysis of interval data
Paul,
....
I am left with a relativistic notion of psychometric spaces.
Each Rasch analysis erects a unique space. That space bears no
necessary relationship to any other Rasch space (except perhaps in some
topographic one-to-one homeomorphic kind of way). However, objects
within that space are distributed in a way that is reproducible, hence
objective, with respect to other objects in that space. Two Rasch
spaces can be reconciled only by "anchoring" one space to the other via
common persons or items. This forces the two spaces to share the same
"distortions," and thus to become one space, and to preserve invariance
for all objects residing in that space.
Your point about the MetaMetrics lexile scale is well-taken.
All texts and readers are forced into a common space anchored on the
physical properties represented by word frequency and sentence length
(or log transformations thereof). This was facilitated by the fact that
MetaMetrics discovered and exploited a linear relationship between
textual empirical variables and item difficulties. But even without
that relationship, the two types of variables could have been forced (by
a method MetaMetrics did not use, or need to use) into a common space
through an anchoring procedure. This is still not a guarantee that the
lexile unit captures the "true" spacing, but fortunately it does not
seem to matter.
One space is as good as another, so long as they are internally
consistent. Am I reading this right?
Hello Mark
An excellent analysis and synthesis ... many thanks for your thinking
here.
After reading the thesis by Courville, T.G. (2004) An empirical
comparison of item response theory and classical test theory item/person
statistics. PhD Thesis - Texam A& M University :
http://txspace.tamu.edu/handle/1969.1/1064, , , 1-129.
Abstract
In the theory of measurement, there are two competing measurement
frameworks, classical test theory and item response theory. The present
study empirically examined, using large scale norm-referenced data, how
the item and person statistics behaved under the two competing
measurement frameworks. The study focused on two central themes: (1) How
comparable are the item and person statistics derived from the item
response and classical test framework? (2) How invariant are the item
statistics from each measurement framework across examinee samples? The
findings indicate that, in a variety of conditions, the two measurement
frameworks produce similar item and person statistics. Furthermore,
although proponents of item response theory have centered their
arguments for its use on the property of invariance, classical test
theory statistics, for this sample, are just as invariant.
as well as the paper by Cherneyshenko, O.S., Stark, S., Drasgow, F., &
Roberts, B.W. (2007) Constructing personality scales under the
assumption of an Ideal Point response process: toward increasing the
flexibility of personality measures. Psychological Assessment, 19, 1,
88-106.
Abstract
The main aim of this article is to explicate why a transition to ideal
point methods of scale construction is needed to advance the field of
personality assessment. The study empirically demonstrated the
substantive benefits of ideal point methodology as compared with the
dominance framework underlying traditional methods of scale
construction. Specifically, using a large, heterogeneous pool of order
items, the authors constructed scales using traditional classical test
theory, dominance item response theory (IRT), and ideal point IRT
methods. The merits of each method were examined in terms of item pool
utilization, model-data fit, measurement precision, and construct and
criterion-related validity. Results show that adoption of the ideal
point approach provided a more flexible platform for creating future
personality measures, and this transition did not adversely affect the
validity of personality test scores. {for this, read there was no
difference in the rank orders of individuals or in any external validity
criteria correlations between simple sum-score through to any of the
advanced measurement models}
I find myself wondering why so many have labored for so long with
advanced statistical methods which seem to yield results largely
indistinguishable from the use of simple sum scores (forget CTT and all
that "true-score" baloney - just think sum-scores and simple-fuzzy-order
classification).
For the "philosophical" position in my approach to data analysis these
days - I think the first section in Leo Breiman's paper is probably
that which has influenced me most ...
Breiman, L. (2001) Statistical Modeling: the two cultures. Statistical
Science, 16, 3, 199-231.
Abstract
There are two cultures in the use of statistical modeling to reach
conclusions from data. One assumes that the data are generated by a
given stochastic data model. The other uses algorithmic models and
treats the data mechanism as unknown. The statistical community has been
committed to the almost exclusive use of data models. This commitment
has led to irrelevant theory, questionable conclusions, and has kept
statisticians from working on a large range of interesting current
problems. Algorithmic modeling, both in theory and practice, has
developed rapidly in fields outside statistics. It can be used both on
large complex data sets and as a more accurate and informative
alternative to data modeling on smaller data sets. If our goal as a
field is to use data to solve problems, then we need to move away from
exclusive dependence on data models and adopt a more diverse set of
tools.
This latter paper does not mean a knee-jerk dive into
data-mining/machine learning algorithms - but merely the relinquishing
of statistical data models and "latent variables" unless there is a very
good reason to invoke such strong measurement models and such
hypothetical ("non-empirical") variables.
To be fair, when dealing with educational attainment data, I'm not sure
what relevance any of the above really has - I guess I am just musing
over why so many went hell-for-leather for IRT and Rasch - when the
cost-benefit ratio seems to have turned out to be more "cost" than
benefit. The arguments about "greater precision" of IRT seem to fade
into trivia when one realises that the precision is "constructed" rather
than "evidence-based" via empirical observation in many cases. When you
can no longer "score" an individual's test results without applying a
complex Maximum Likelihood algorithm to do so - and yet as Courville
shows you might as well have just added up the correct responses in the
first place - and as Cherneyshenko et al showed, the score validity
remains the same either way - well, I do find it curious.
For all Denny Borsboom's brave words (aka .. Borsboom, D. (2006) The
attack of the psychometricians. Psychometrika, 71, 3, 425-440), I wonder
whether statistical latent variable theory is really the last hurrah for
quantitative psychologists before they quietly fade away, their efforts
rendered irrelevant by the twin steamrollers of Joel Michell and Michael
Maraun, and the new knowledge about how the brain might actually be
working from AI and cognitive neuroscience (and what this implies for
the "measurement" of anything "psychological").
I know, a long way from Rasch modeling ... can't help myself thinking
about these things ... and it is interesting working with data without
being a hostage to any particular data model. All you have in the
cupboard are replication/cross-validation, and predictive accuracy to
guide you.
But, what Jack Stenner and Metametrics has achieved is a reminder of the
very real utility (in the financial as well as in other senses of the
word) of a data model when you get things "right"! So I'm not advocating
the throwing out of the baby with the bathwater - but maybe having a few
more baths on hand to move the baby about a bit on occasion!
Regards .. Paul
__________________________________________________
Paul Barrett 918.749-0632
x 326
Chief Research Scientist Skype:
pbar088
Hogan Assessment Systems Inc.
2622 East 21st St., Tulsa, OK 74114
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20080722/99ab3316/attachment.html
More information about the Rasch
mailing list