# [Rasch] Empirical order & Theoretical order

Agustin Tristan ici_kalt at yahoo.com
Mon Nov 1 11:35:18 EST 2010

Hello Anthony,
Concerning your first question: what can say about the test? Many things indeed.

This is an old discussion because the theoreticians try to produce a set if hierarchical categories, following a classification order, according to the specific theoretical criterion but sometimes without an empirical proof. The measurement approach tries to confirm their hierarchical ordering and it is quite common to see that theory is not confirmed by measurement.
This topic has been discussed during the last ICOM because in English learning as a 2nd language the theoreticians provide an ordered set of categories, but the item difficulties follow a different order. You may also find a comment about this in Daftarifard, P. and Lange Rense (2009) Rasch Measurement Transactions, 23, 2, autumn.

The first approach regards theory. I've discussed this topic with Dr. Massof during the ICOM. He has the same idea as many others we can have, coming directly from measurement: if measure cannot confirm the theory probably it is a good moment to define a new theory or to modify the existing one.
Following the experimental approach you may probably see the SOLO taxonomies (SOLO means Structure of the Observed Learning Outcome) by Biggs & Collis (1982) and many other followers (Biggs, J.B., & Collis, R.E. (1982) Evaluating the quality of learning: The SOLO taxonomy. New York: Academic Press) you may find more SOLO taxonomies including Rasch approach if you make a search in the ERIC data base (http://www.eric.ed.gov/). This approach is focused on the categories that can be measured and ordered in a sequence from easy to hard categories. It is possible to define the categories using a partitioning at reasonable distances involving a difference or distinction among them. If you have a test with difficulties from -3 to +3 logits, you may anchor your items in 6 intervals of 1 logit wide (-3 to -2, -1.99 to 1, -0.99 to 0, 0.01 to 1, 1.01 to 2, 2.01 to 3), and have some judges or experts helping you to identify which is the definition of the 6
constructs or categories you're measuring from -3 to +3; this method is similar to bookmark but has been proposed by Beaton and Allen ( See Beaton and Allen, 1982, Interpreting scales through scale anchoring, Journal of Educational Statistics, 17, 191-204), there are more papers by Beaton and others. This approach has been used in projects like the NAEP.

The second approach regards another methodological concept. I have another idea that is based on an old approach by Guilford (Guilford, J.P. (1959) Three faces of intellect. American Psychologist, 14, 469-457). The idea is based on Guilford's but it is not presented as such in the original paper: a 3D model, where we can identify that we have a content (mathematics or history competencies), related to a taxonomy classifying complexity levels of the mental operations (or abilities of competencies...) needed to learn and develop the content, measured using a test with items of different difficulty.  With this idea it should be clear that difficulty is a different thing than taxonomical complexity. Difficulty has to be measured while the complexity categories are theoretically defined according to a theory that explains some aspects of mental abilities ( or learning or competencies).
Let's look at the Bloom taxonomy, it defines 6 categories from simple to complex mental performances; Bloom  et al didn´t say from easy to hard, but from simple to complex. To be clear: You may begin with the simplest category: knowledge (or memory competencies), you can measure memory using a set of items from easy to hard. And the last category: evaluation, you may measure it with items from easy to hard. Once you have measured you may identify persons having low or high memory, and also you may identify persons having low or high decision ability. In fact, memory is not always easy and decision is not hard for ever: persons able to take decisions don't have necessarily a good memory, and persons having low memory may take good decisions. A person can be good in one of the Bloom's categories, poor in other and medium in other.

Suppose a test of memory of capital cities of the world, items may concern France, England, Peru, Somalia, Honduras, Burkina Faso, Oriental Timor, Borneo, Mali, those items may go from easy to hard. In all cases it is just Geography (the content), memory is what we're looking for (one of the categories of Bloom) and the test has a set of items from easy to hard according to the number of correct responses of the persons (difficulty).

Complexity concerns the mental operations involved in a task (from simple to complex), it is defined according to a theory; it is defined a priori by a set of judges or theoreticians, it is not necessarily able to be measured and there is no objective criterion to define the hierarchy. Difficulty concerns the frequency of incorrect responses of a group of persons in the focal group to an item, it is defined by a measure a posteriori, i.e. after the test has been administered, the measure can be calculated following a procedure or formula.

If the definitions of difficulty and complexity are different, why not treat them as two variables instead to consider as a single one?
Once we can accept thy are two variables, we can define a cubic model (similar to Guilford's), where you have the three dimensions: Content-complexity-difficulty.

In any case an item of the test measures one dimension: the complexity level of a specific content. I am not thinking of a single item measuring two different contents (English and Math) or measuring two different complexity levels (it has no sense from the theory), and of course an item has only one measure or difficulty for the focal group or persons.

Does this invalidate the additivity of the scores, even if we have perfect fit?
No. additivity has to do with the measurement dimension, not with content, not with complexity. The problem is ours when we try to collapse the 3 dimensions of this cube in a single one.
I wish my paper on this 3D model will be accepted soon to be published.

I hope this helps.
Agustin

FAMILIA DE PROGRAMAS KALT.
Ave. Cordillera Occidental  No. 635
Colonia  Lomas 4ª San Luis Potosí, San Luis Potosí.
C.P.  78216   MEXICO
(52) (444) 8 25 50 76
(52) (444) 8 25 50 77
(52) (444) 8 25 50 78
web page (in Spanish): http://www.ieia.com.mx
web page (in English) http://www.ieesa-kalt.com/English/Frames_sp_pro.html

--- On Sun, 10/31/10, Anthony James <luckyantonio2003 at yahoo.com> wrote:

From: Anthony James <luckyantonio2003 at yahoo.com>
Subject: [Rasch] Empirical order & Theoretical order
To: rasch at acer.edu.au
Date: Sunday, October 31, 2010, 6:44 AM

Dear colleagues,
If the empirical difficulty order of items does not corresponde with the theoretical difficulty in terms of coginitive demand of the items, what can we conclude about the test?
Does this invalidate the additivity of the scores, even if we have perfect fit?

Cheers
Anthony

_______________________________________________
Rasch mailing list
Rasch at acer.edu.au
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/ici_kalt%40yahoo.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20101031/9ba0cf22/attachment.html