[Rasch] Empirical order & Theoretical order

Andrew Kyngdon AKyngdon at lexile.com
Tue Nov 9 10:50:43 EST 2010

Dear Parisa,

Those who teach psychometrics have to strike a balance between breadth of material, the mental demand made upon students and limited time. Hence most courses focus upon one or two of the more simpler members of a family of models so as to not overwhelm students. Moreover, “student demand” is usually only for these simple models and nothing more.

I would not base the judgment of any descriptive, psychological theory upon model fit alone. Choices between pairs of simple lotteries will fit the paired comparison Rasch model quite well. However, this model is a descriptively false account of the utility of gains and losses under conditions of risk or uncertainty. For example, it treats preference reversals as “error”. Such reversals are not error at all, but are experimentally robust phenomena caused by the way people weigh the risk of uncertain events (Allais, 1953; Kahneman & Tverksy, 1979).

Remember that some of psychology’s most influential thinkers – Pavlov, Skinner and Freud, for example – did not use statistical data analysis models at all.

I would like to respond to a couple of your criticisms of the Lexile Framework.

Firstly, the Framework is not limited to narrative continuous prose text, either empirically or theoretically. The text of Australian High Court and US Supreme Court decisions is typically around 1700L in difficulty. These decisions are examples of persuasive text, not narrative. Furthermore, polemical, political monographs such as “The Female Eunuch” (1350L) by Germaine Greer and “Animal Liberation” (1360L) by Peter Singer are also non-narrative texts, yet their difficulty is putatively measurable in Lexiles. Informative text, such as newspaper and magazine articles, can also be assessed within the Lexile Framework. The Sydney Morning Herald and The Australian newspapers, for example, are around 1400L.

Secondly, the Lexile Framework is not based on simplistic, atheoretical text readability formulae. It is concerned with those text features which are viable proxies for the demands made upon the cognitive processes that cause individual differences in reading ability. These processes are verbal working memory (VWM) capacity and vocabulary. In short, people with larger vocabularies and verbal working memories are more able readers.

Your criticism that text is only one side of the story is simplistic. It ignores the substantial research literature in cognitive psychology that has found that text features are vitally important in the cognitive psychology of reading. Let me mention some of it briefly.

Cognition research has found that VWM capacity is critical in the comprehension of prose text (e.g. Daneman & Carpenter, 1980; Daneman & Merikle, 1996; Jincho, Namiki & Mazuka, 2008; Just & Carpenter, 1992; Kintsch & van Dijk, 1978; McDonald, Just & Carpenter, 1992). Working memory is both a storage buffer with limited capacity (Miller, 1956) and a processing unit (Baddeley, 1986). Reading makes processing as well as storage demands upon VWM (Daneman & Merikle, 1996). People with greater VWM capacity have greater facility with syntactic complexity, as they can hold in VWM multiple interpretations of sentences containing syntactical and lexical ambiguities (McDonald, Just & Carpenter, 1992; Miyake, Just & Carpenter, 1994) and are better able comprehend sentences with centre-embedded relative clauses (King & Just, 1991). Longer sentences, in addition to greater storage demand, can contain both more and more elaborate syntactic complexities than smaller sentences, which place greater processing demands on VWM. Sentence length is thus hypothesised to be a good proxy variable for the demands prose text places upon VWM capacity (Crain & Shankweiler, 1988; Klare, 1963; Liberman, Mann, Shankweiler & Werfelman, 1982; Shankweiler & Crain, 1986).

In the Lexile Framework, the demand continuous prose text places upon a reader’s VWM capacity is proxied by log mean sentence length. This is simply the common logarithm of a ratio of two counts - the ratio of the number of words in a text passage to the number of sentence endings.

Word frequency is highly predictive of eye fixation times and total gaze duration in eye movement studies of reading (Rayner, 1998). Readers fixate and gaze at low frequency words for several hundred milliseconds longer than for high frequency words in passages of continuous prose text (Just & Carpenter, 1980). This phenomenon is not attenuated when either word length is controlled for (Inhoff & Rayner, 1986; Rayner, Ashby, Pollatsek & Reichle, 2004) or by the presence of other variables, such as number of letters, subjective word familiarity or age of word acquisition (Juhaz & Rayner, 2003). Low frequency words are also skipped less than high frequency words when words consist of six letters or less (O’Regan, 1979; Rayner, Sereno & Raney, 1996). Rarer words require greater lexical processing as new information from prose text is obtained only when the eyes fixate (Rayner, 1998).

Lexical decision task experiments have found significantly greater word recognition reaction times for low frequency words than for high frequency words (Balota & Chumbley, 1985; Hudson & Bergman, 1985; Jastrzembski, 1981). Word recognition involves vocabulary (Lewellen, Goldinger, Pisoni, & Greene, 1993; Kitzan, Ferraro, Petros & Ludorf, 1999) and vocabulary plays a role independent of VWM in prose text comprehension (Baddeley, Logie, Nimmo-Smith & Brereton, 1985; Dixon, LeFevre & Twilley, 1988; Engle, Nations & Cantor, 1990; Jincho, Namiki & Mazuka, 2008, Shiotsu & Weir, 2007). However, many different vocabulary variables may cause individual differences in test performance. Stenner, et al (1983) investigated 50 vocabulary variables including parts of speech, number of letters, number of syllables, content classification, modal grade at which words appeared in school books, word frequency and various algebraic transformations of these variables. Consistent with the eye movement and lexical decision task literature, they found that word frequency was the most predictive of the difficulty of vocabulary items on Forms L and M of the Peabody Picture Vocabulary Test – Revised (Dunn & Dunn, 1981).

In the Lexile Framework, the demand placed by text upon a reader’s long term memory word store (vocabulary) is proxied by mean log word frequency. The frequency with which each word appears in written expression is calculated using a text corpus. MetaMetrics uses a 500 million word corpus. The common logarithm of each word’s frequency is obtained and an arithmetic mean calculated.

Thirdly, those who created the Lexile Framework do not believe that reading ability is static. The award winning work of Dr Gary Williamson found that whilst students grew in reading ability across the whole of their school lives, at the end of Grade 12 their reading ability was approximately 200L beneath the mean difficulty level of texts used in introductory courses in tertiary education programs. A non technical white paper on this work can be found here http://lexile.com/m/uploads/whitepapers/What_is_Expected_Growth.pdf So it would seem that reading ability is dynamic rather than static. Moreover, there is the well known phenomenon of “summer loss” where students’ reading abilities decline slightly over the long summer holidays due to lack of exposure to texts.

The Lexile Framework for Reading has been successfully equated to around 40 different reading tests, most of which have been created with different ideas of what constitutes reading ability. If the Lexile Framework was not a general theory of individual differences in the ability to read continuous prose text, I cannot see why it should have been so successfully equated with so many reading tests.

To my knowledge only one reading test failed to equate with the Lexile Framework. This test of reading ability utilises an item type where students “cloze” out passages of text using words they produce themselves (i.e., they do not choose the correct words from a list as in multiple choice item types). A list of acceptable word choices that students can use appears in the marking manual for each test. A “desktop linking” study found that the Lexile Framework did not equate well to this test. Why? Because the item type used by the test assesses not only a reader’s semantic and syntactic “receptive store”, but their “productive store”. That is, it assesses both a test taker’s reading and writing abilities. To use a commonly abused parlance, the test is “multidimensional”. Rather predictably, the organisation that created this test blamed the Lexile Framework rather than their own assessment.

Anyway, I hope you find this somewhat lengthy response to be useful.



From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Parisa Daftari Fard
Sent: Saturday, 6 November 2010 10:52 AM
To: rasch list
Subject: Re: [Rasch] Empirical order & Theoretical order

Dear Andrew,

Thank you for your comments. I am sorry for being late in responding.
You can shape a Rasch model to be more in accordance with theory. Take the Lexile Framework for Reading, for example. This psychometric framework uses a modified version of the simple dichotomous item Rasch model. When person reading ability and item difficulty are equal, the item response probability is .75.

I guess you are right. As I mentioned before, my major is not statistics and I have learned to work with Rasch through conferences, classes, books and this great list.
Concerning Lexile, I think, based on my understanding, that it does not take skill-based approach into account and texts are mostly narratives. It approaches readability from text based factors whereas, text is only one side of the story although some believe that reading narratives and knowing words Will bring other skills automatically (Grabe, 2008 personal communication).
When items show good fit in Rasch but the orders are not expected, I believe that we have taken a wrong approach in our theory of cognitive ability say reading comprehension.

In a recent paper I examined item patterning through Dynamic assessment. Although the differences were not statistically significant, the result showed that after mediation I have a better and explainable item ordering in terms of cognitive difficulty. I approached to Reading dynamically than statically.


Parisa Daftarifard
PhD student of TEFL
--- On Fri, 11/5/10, Andrew Kyngdon <AKyngdon at lexile.com> wrote:

From: Andrew Kyngdon <AKyngdon at lexile.com>
Subject: Re: [Rasch] Empirical order & Theoretical order
To: "rasch list" <rasch at acer.edu.au>
Date: Friday, November 5, 2010, 8:14 AM


I think your Point 1 is valid (as is 2 and 3, of course). What is meant casually by “Rasch model” is the simple dichotomous item case, when in fact there exists a whole family of Rasch models (e.g., paired comparison, Poisson Counts, extended frame of reference). It is quite plausible that one member of this family is more suitable than another for a given empirical / theoretical situation.

You can shape a Rasch model to be more in accordance with theory. Take the Lexile Framework for Reading, for example. This psychometric framework uses a modified version of the simple dichotomous item Rasch model. When person reading ability and item difficulty are equal, the item response probability is .75. This was done because it would be unlikely that a reader has genuinely understood the content of an “embedded sentence cloze” reading item if the response probability was the conventional .5. This modification is not to the detriment of raw score sufficiency or invariant comparisons.

There is more scope for stimulus – theory –  model interplay than what is commonly perceived in psychometrics. Look at utility theory, for example.



From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On Behalf Of Parisa Daftari Fard
Sent: Wednesday, 3 November 2010 9:53 AM
To: rasch list
Subject: Re: [Rasch] Empirical order & Theoretical order

The topic sounds interesting. I believe that When we do not have an agreement between Rach model and Theoretical model, there are three possibilities

1.  Rasch model requires revision

2.  Theory requires revision

3.  Items requires revision

One of the 1, 2, or 3 should be revised.



--- On Tue, 11/2/10, Trevor Bond <trevor.bond at jcu.edu.au> wrote:

From: Trevor Bond <trevor.bond at jcu.edu.au>
Subject: Re: [Rasch] Empirical order & Theoretical order
To: "Raschlist" <rasch at acer.edu.au>
Date: Tuesday, November 2, 2010, 11:18 AM

In a nutshell

you might use Rasch info to reject items

but who wrote the items?

Unlikely , it was the theorist...and even if it was
is the theorist a good item writer (not equivalent skills)?

over to you...

Prof Trevor G BOND
Adjunct Professor
School of Education
James Cook University
mob: +61 416 82 70 83
Rasch mailing list
Rasch at acer.edu.au<http://us.mc1200.mail.yahoo.com/mc/compose?to=Rasch@acer.edu.au>
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/pdaftaryfard%40yahoo.com

-----Inline Attachment Follows-----
Rasch mailing list
Rasch at acer.edu.au<http://us.mc1200.mail.yahoo.com/mc/compose?to=Rasch@acer.edu.au>
Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/pdaftaryfard%40yahoo.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20101108/38de09a7/attachment.html 

More information about the Rasch mailing list