[Rasch] counts to scale conversion

Here is a draft write up.

.96 vs .94 translates to about an entire unit of the SD/error ratio (separation, G), so maybe the Rasch scaled money is not just "a little" worse in predicting, but significantly.

Where .94 gives a G of 4, .96 gives 5. The difference between .94 and .96 is the same, in terms of the separation, as between .50 and .80, .80 and .90, .90 and .94, .96 and .973, .973 and .985, .985 and .988, etc.

On the other hand, is the log money over-predicting, in a sample-dependent sort of way? Does the Rasch method work more consistently over samples? If not, what is being lost in the Rasch estimation process? Do the results stay constant across estimation methods (JMLE, unconditional, Monte Carlo, PROX, etc.)? If not, do one or more of them better approximate the log money?

When I was predicting the amount of money that a counselor was going to be sued for a bad outcome based the hierarchical complexity of the informed consent procedure. Log money did the best.  Right behind it by just a little worse .96 versus .94 was Rasch scaled money.  The value of money has been argued to require loging for years.

What do you make of that?

Regarding the model as an ideal disconnected from reality forgets that the data may be derived from questions that may be irrelevant, poorly formulated, or off-construct for some other reason, or that some respondents may not belong to the intended population.

Why try to describe data that are not reproducible and replicable? How well do we understand a construct when the only data we can produce are not theoretically tractable, and so remain tied to particular questions and respondents? It seems pretty cynical to me to do research with the sole aim of applying fancy statistics to data, publishing articles, and advancing one's own career, while deliberately limiting your potential for generalizing your results past your own local samples of persons and items because you choose models and methods that do not push you toward the highest possible level of generality.

I vote for 3. Strong construct theory is not automatically implied by strong measurement theory. Being able to predict item difficulties when the items have been previously calibrated is great, but the real goal is to be able to predict their calibrations on the basis of their theoretical properties, in the manner of Lexiles or Commons' stage scoring system.

When we have this, then we're getting somewhere. After all, imagine how different our economic lives would be if rulers, weight scales, thermometers, clocks, volt meters, and the resistance properties of every meter of every type of electrical cable all had to be calibrated individually on data, instead of en masse, by theory..  Theoretical predictability is the mark of a real science, where we understand a variable to the point that we can recognize it for what it is in any amount when we see it.

After all, don't we say that a basic mark of knowing what we're talking about is being able to put it in our own words? Shouldn't any valid articulation of a construct be a viable medium for measuring in a univerally uniform reference standard metric?

Jack Stenner has recently done some work describing several more than three stages of this kind in the development of measurable constructs..  Maybe we can get him to weigh in..

Hi Tim...it could be nice to have further votes for the three options and see how well this example fits to the person's opinion in this listserve... or how well our opinions fit a model...

Thank you.

Agustin

Tim Pelton wrote:

What a great example - and starting point for a discussion...

My vote is for option 2.

I think that the phrase in option 3 "...telling us (and the crickets too)..."
demonstrates quite nicely the
limitations of blindly applying an 'ideal' model. Is it reasonable to favor
an elegant theoretical model
that deviates substantially in it's predictions from the observed data when
our lack of understanding of
related factors means that we cannot effectively explain such deviations? Is
it not more appropriate to
choose a pragmatic model (balancing simplicity and accuracy) as an
intermediate model to help us
establish a control or baseline which may then be used to support our search
for other factors?

Tim

- >===== Original Message From Agustin Tristan =====
>Hi! I'm trying to follow this topic concerning crickets and scales.
> In abstract: Which is better?
> 1) The simple linear model even if it doesn't fit (the linear model for the
crickets' case).
> 2) Any model who permits us to fit the data (the exp(something) looks to be
like that).
> 3) A theoretical model telling us (and the crickets too) how is the way
frequency of the noise they produce according to temperature...specially if
this theoretical model is
exp(someting) because it looks more interesting or impressive.
>
> I like Nature and its relationship with math and for me it was interesting
to know that crickets may
use the exponential (even if they don't care about the mathematical
formulation), as well as the seeds in
sunflowers grow exponentially from their center, or the snails grow their
shell, or the ivy plants grow in
an helical 3D curve, or the soil slopes (in soil mechanics) become unstable
and fail according to a
logarithmic spiral, or the growth of populations follows a logistic model, and
so forth... I can also
recognize that I prefer objective items that behave as the Rasch model,
but...I cannot decide in all those
case which is better between (1), (2) and (3)...
> Regards
> Agustin Tristan
>
>Rense wrote:
>
>All this illustrates that if we want to stay in business as test gurus then
>we'd better forget about meaningful item hierarchies, sample independence,
>additive measures, and other such niceties. Rather, using methods whose
>results need to be recalibrated for boys, girls, old, and young, ..,
>whatever, ... - and adding a few things like "log(exp(something ...))" to
>our tech reports - should greatly help with job security. :)
>
>Rense Lange
>
```