[Rasch] formative v. reflective measures in assessment research

Agustin Tristan
Mon Oct 28 11:05:22 EST 2013

Hello. You will need reliability in any test as well as you first need validity and objectivity. But what I understand about this discussion is that you are working in a learning process with steps such as those:
1) you apply a "diagnostic test" before the instruction, it is supposed that you get low measures from the students and you want to improve their competencies or knowledge.
2) you propose some activities and a learning intervention, hoping that you will increase the competencies of the students
3) once you finish the activities, you administer a "formative test" to measure if the learning intervention works
4) you analyze the results:
4.1) if the results of (3) are low, then your intervention did not work as you expected. Then, you should make a new approach and repeat steps 2 and 3 until you have better results from your students
4.2) if results of (3) are satisfactory, you pass to a new topic (or subject or goal) and repeat steps 1 to 4.

If this is the case, it is evident that:
4.1 needs that you have repeatability, because students get low measures before and after your intervention. In such case, it has no sense that I design an intervention hoping to get repeatability of measures in the pretest and the posttest, maybe it makes sense to try another thing.
BUT 4.2 needs that you do not have reliability in the sense of repeatability, because you need that the learning intervention works, students get low measures before and high measures after your intervention.

Nevertheless you will need internal consistency in both cases, that means that the internal consistency must be acceptable in the pretest and in the posttest, separately. 

did I get the point?

Donald, I don’t believe this …
My associate dean has argued that we don’t need reliability on a test that we’re working on because we’re using formative measurement
If you claim to make measurement, any kind of measurement, you have no choice but to evaluate its reliability (as repeatability).
No psychometrics test-theory, assumption-laden hoopla needed, just a straightforward answer to a very simple question: 
“If I make a measure X of this person again, will I obtain the same magnitude, order, or class-category?” If you can’t answer that question, quantifying the accuracy/outcome discrepancy, your ‘measure’ is immediately rendered of ‘uncertain’ utility.
And yes, time matters (duration between assessment occasions);  you may not be able to gain such a repeatability measure because the attribute you seek to measure changes in relation to influences beyond your control and/or memory of prior responses becomes a significant factor. So, ‘reliability’ under these conditions then has to be sought indirectly via a process of gaining criterion validity (which is tricky as you are now also testing your understanding of the meaning a phenomenon/attribute in addition to the measurement device itself).
Paul
Thanks all for your thoughtful responses.  I found the Edwards (2011) cite that Thomas Salzberger mentioned to be particularly helpful. (Edwards: The fallacy of formative measurement, Organizational Research Methods 2011 vol. 14 no. 2 370-38).  For those interested, Edwards offers several examples.
My associate dean has argued that we don’t need reliability on a test that we’re working on because we’re using formative measurement.  Now I’ve got a great article to send him.  I don’t think that he will also argue that multiple-choice items have no measurement error, or that a census of all possible items will be more convenient than a sample.
Thanks again – this group is a great help!
Dear all, one of the problems that I see in most of the papers talking of "cause" in the context we are talking, is the absence of time. For instance all the figures in the article mentioned by Thomas: "The fallacy of..." the arrows go in one direction, and one may think that the "something cause" precedes  the items somebody is using to measure the "something cause", or the other way around.  I think that in most of the cases the arrow should go in both directions, if there is not an empirical evidence of the precedence (time) of the cause.
In the article "Cause in Rasch models" there is a clear "intervention" that precedes (time) the change observed in Rasch measures. This is what is being done in RCT, mostly in Psychiatry and Rehabilitation.
Please forgive my english, I am quite good in spanish. 
very good comments.
Luigi Tesio

Don & Luis,
>I fear Jack et al.'s paper on causal Rasch models is not exactly what Don looks for. The idea of a Causal RM is that one should identify the mechanism that drives measurement and experimentally manipulate item difficulty and person ability.
>Formative "measurement", or "formative indicators",  is something else. But Jack has also commented on this (http://www.rasch.org/rmt/rmt221d.htm).
>It does not surprise me that many Rasch advocates know very little or nothing about "formative measurement". And for a good reason. 
>In my mind, formative measurement is a misconception, a fallacy (see Edwards: The fallacy of formative measurement, Organizational Research Methods 2011 vol. 14 no. 2 370-38), and seriously misleading.
>Borsboom (2003 The Theoretical Status of Latent Variables. Denny Borsboom, Gideon J. Mellenbergh, and Jaap van Heerden in Psychological Review; 2005 book Measuring the mind) has convincingly demonstrated that the idea of a latent variable (and thus, a unidimensional construct) and formative "measurement" are incompatible provided you are in the realist camp. 
>If you critically question Jarvis et al's rules for deciding whether something has to be measured with reflective indicators (-> factor analysis in CTT, IRT, Rasch) or formative indicators (-> ? what "model" would you actually use?), then you notice that the criteria follow from what you assume (reflective or formative) in the first place. Yes, there sometimes is indeed a misspecification, but it is not a misspecification in terms of reflective measurement vs formative measurement. Rather, it is a misspecification with respect to measurement vs the summary of measurements. The latter essentially is a structural model. 
>The reason why many people have no problems accepting formative measurement as measurement is probably due to Stevens' definition of measurement. If assigning numerals and interpreting them as numbers is measurement, then of course formative indicators also give you measurement. But then anything goes.
>I was contacted by my agent to make changes in my policy
>I was contacted by my agent to sell me more life insurance
>I was contacted by my agent to describe new insurance offerings
>I was contacted by my agent to keep my policy in place
>This does not make any sense to me. If you are interested in the intensity of contact, why not just ask whether one was contacted?
>You may add reasons and that gives you additional information. But these items do not form a scale, and the sum score cannot lead to measurement.
>If I wanted to measure "subjective warmth", I would not use the following items, would I:
>I feel warm because it is hot outside.
>I feel warm because the heating is on.
>I feel warm because I have a fever.
>There is a recent article in Frontiers in Psychology august 2013 volume 4 by Stenner, Fisher, Stone , Burdick  "Causal Rasch models". That could help you. 
>Hi all --
>To offer more explanation on my reflective v. formative question, in reflective measurement models, the construct/latent trait causes the measures/indicators/items, whereas in formative models, the measures cause the latent trait.  Jarvis, MacKenzie and Podsakoff (J Consumer Research, 2003) offer a thorough description of this difference.  They cite Bollen and Lennox (Psych Bull, 1991) as identifying the same distinction, except that Bollen and Lennox call reflective indicators ‘effects indicators’ in a principal component model, and formative indicators ‘causal indicators’ in a composite latent construct model.
>Jarvis et al. cite a formative example from Crosby and Stephens (1987) that I will paraphrase here.  In measuring the construct “personal contact with life insurance agents”, suppose the following measures are used:
>I was contacted by my agent to make changes in my policy
>I was contacted by my agent to sell me more life insurance
>I was contacted by my agent to describe new insurance offerings
>I was contacted by my agent to keep my policy in place
>In these items, the agent would not suggest keeping the policy and changing the policy, so the inter-item correlations here should be low or perhaps even negative, yet all of these statements indicate personal contact.  Internal consistency is not necessary for formative measures; to assess the quality of formative measures we need to look at criterion-related validity.  Thus, formative models are a bit more like regression models, where the many independent variables are assumed to have no error and may have low intercorrrelations but the one dependent variable does have error.
>Jarvis et al. go on to report how measurement models have often been misspecified, even in top marketing journals, as formative measures have been treated as reflective and vice versa.  They show how such misspecification can bias structural path estimates.  
>Getting back to Rasch measurement, it seems to me that Rasch assumes a reflective/effects/principal components model.  However, Rasch may occasionally be misapplied to formative indicators.  In the latter case, model fit and internal consistency would probably be low, but this is due mainly to fundamental model misspecification.  Or can Rasch be used with formative indicators?
>Has this type of model misspecification been discussed in the Rasch literature?  More specifically, are all tests of ability or knowledge generally reflective models?  Is there a good cite that someone can point me too? 
>Thanks for any insights you may be able to provide.
>Hi all –
>   The Rasch model assumes that each measure in a scale reflects the same underlying trait, and so it seems that a reflective measurement model is appropriate, and internal consistency is a desirable quality.  But what about the case of a long comprehensive exam, such as one we might use for assessment.  In my experience, these exams often behave as if they were close to unidimensional, even though many different learning outcomes are captured.  If well designed, these tests often exhibit high internal consistency.  Because the models fit well, I’ve always thought of the measures as reflective, but perhaps my theory is wrong even though the fit is good; maybe the measures are formative.  Is there any way to use the Rasch model with formative indicators?
>Thanks for any insights you might have –
Thomas.Salzberger
