[Rasch] PISA critique in TES

Margaret Wu wu at edmeasurement.com.au
Sun Sep 28 12:08:07 EST 2014


I think there appears to be some misunderstanding about imputation.
Imputation never creates new data. Of course we cannot invent data. Here is
a simple example about imputation. Suppose we have a data set with people’s
height and weight measures, with some data records missing either height or
weight measures. We carry out a regression with only complete data records
(respondents with both height and weight measures). We obtain estimates of
regression coefficients from this analysis (Analysis A). For the data
records with missing responses, we can impute a value in the following way.
Suppose person n has a weight of 65kg, but his/her height measure is
missing. We look up the regression model from Analysis A, and look at the
distribution of heights of people with a weight of 65kg. This (conditional)
distribution represents the likely heights of people with a weight of 65
kg. We randomly draw an observation from this conditional distribution and
produce an imputed height for person n. If we now re-analyse the data with
the imputed height included, we first observe that the regression estimates
should not change, since we imputed from the regression model obtained in
Analysis A. However, the standard errors for the regression coefficients
will now be smaller than those from Analysis A, because our second analysis
assumes the imputed values are actually observed (so we have more data than
actually observed). To make sure that we don’t have increased precision
when imputed data are added, we make multiple imputations. For each imputed
data set, we carry out a regression analysis. The results from these
regressions will vary, because the imputed values are not the same each
time, since we impute from a distribution. (Note that we do not use the
mean of the conditional distribution as our imputed value. Instead we do a
random draw). The variations between the multiple regression runs will
reflect the uncertainty introduced by the imputations. We then have a
formula for combining the multiple regression runs to add the uncertainty
back into our regression parameters, so that the data sets with imputed
values will produce just the same estimates and standard errors as for
Analysis A.



So you may ask why bother to do imputation if the results of the imputed
data sets produce the same results as the complete data set. Sometimes we
have many variables of interest. If we have lots of missing values among
different variables, and we do list-wise deletion of records, we may throw
away a lot of records. Using imputation, we can have complete data sets for
carrying out many statistical analyses, using all the data we have
collected, and at the same time take into account that some data are
missing.



There has been a lot of literature on imputation. See Rubin, Little,
Graham, Schafer,…



Using plausible values (PV) is one method of imputation. I should mention
that the methodology of PV was not invented by PISA. The work of Bock,
Mislevy, among others, in the 1980s has greatly contributed to large-scale
assessment methodologies. I think one misconception of people who are not
familiar with Bayesian IRT is that you need to estimate individuals well
before you can measure the population parameters well. Actually, in
Bayesian IRT (or the MML method), individual person abilities are not
parameters in the model to be estimated. Bayesian IRT models overcome a lot
of issues relating to forming population estimates from individual ability
estimates which contain measurement errors. A lot of work has been done in
showing how the (so-called) PISA model works well for large-scale surveys.
Of course, for PISA, there are always issues with real-life applications of
mathematical models, but these issues are not those raised in the articles
recently mentioned such as in this thread of discussion. It appears that
there are “suspicions”, but there is a lack of a full understanding of the
actual models, so some explanations about the methodologies may help.



Margaret



*From:* rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] *On
Behalf Of *Paul Barrett
*Sent:* Sunday, 28 September 2014 7:46 AM
*To:* rasch at acer.edu.au
*Subject:* Re: [Rasch] PISA critique in TES



I came across this paragraph in an ‘explanatory’ article in:

http://understandinguncertainty.org/about

*What is this site?*

*This site is produced by the Winton programme for the public understanding
of risk based in the Statistical Laboratory in the University of Cambridge*.




The article is at:

http://understandinguncertainty.org/pisa-statistical-methods-more-detailed-comments



The paragraph which caught my interest was:

A simple Rasch model (PISA Technical Report , Chapter 9) is assumed, and
five values for each student are generated at random from the 'posterior'
distribution given the information available on that student. *So for the
half of students in 2006 who did not answer any reading questions*, five
'plausible' reading scores are generated on the basis of their responses on
other subjects.



Look at that last sentence, and the bit I’ve underlined. Ordinarily, as a
scientist rather than statistician, I’d burst out laughing at such an
idiotic research design which ended up with this state of affairs as a
purposeful ‘design’ feature.



But maybe the “imputation” prediction model really does work as claimed on
such data? My laughing is foolish after all. I’m not interested in simple
monte-carlo  expositions, but with what happens with real data, messy
sampling, and items which don’t all fit the Rasch model.



And, indeed, empirical evidence using such PISA appears to exist, detailing
the accuracy of the plausible value procedure to correctly estimate the
population scores for students who answer no items at all on reading
ability, from scores on other variables (Svend Kreiner). The result seems
to indicate that it is substantively *inaccurate*.



The validity or otherwise of such ‘plausible values’ claims are matters for
empirically-determined quantified predictive accuracy, where actual
observational data are used, doing exactly what PISA does on say several
groups of students who have undertaken several ‘same-item’ tests on two or
more attributes, Then re-estimating the population parameters for each
group based upon half of that group not answering any questions for a
particular attribute. This is not rocket science.  I’m assuming it has been
done and published, and replicated by independent research groups? (Anyone
have a reference or two)?



If this is so, it is puzzling how Kriener’s analyses could have revealed
contrasting results.



Regards .. Paul



*Chief Research Scientist*

*Cognadev.com*

*__________________________________________________________________________________*

*W*: www.cognadev.com <http://www.pbarrett.net/>

*W*: www.pbarrett.net

*E*: paul at pbarrett.net

*M*: +64-(0)21-415625



*From:* rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au
<rasch-bounces at acer.edu.au>] *On Behalf Of *Adams, Ray
*Sent:* Saturday, September 27, 2014 2:08 PM
*To:* rasch
*Subject:* Re: [Rasch] PISA critique in TES



Mike,



We have always shown things like range of possible ranks, standard errors
and so on. We've also reported the effects of item selection and all data
collected is publicly accessible for others to scrutinise.



Morrison dismisses all latent variable models kreiner says throw away
everything that doesn't fit rasch perfectly and Goldstein says we throw
away too much



Oh, and the comments about plausible values are just statistical naïveté,
believe them and NAEP would have to be scrapped as would any statistical
methods that use montecarlo estimation, the theory of which was regarded as
sound last time I looked.



I love this criticism, it shows pisa is important. Putting energy into
criticising it is good, I just wish genuine problems were uncovered and
addressed. Goldstein does best on that front, I too would love longitudinal
components and more finer grained analyses of subsets of items



Ray


Sent from my iPhone


On 27 Sep 2014, at 11:13 am, Mike Linacre <mike at winsteps.com> wrote:

Thanks, T.

That article, and the comments following it, suggest to me that PISA
results should be reported as box-and-whisker plots, not rankings. Then
every country could choose to be at the top of its own whisker ....

<mime-attachment.png>
Or perhaps PISA already do this??

Mike Linacre

On 9/27/2014 10:18 AM, Bond, Trevor wrote:

http://www.tes.co.uk/article.aspx?storycode=6344672

Collegially

TGB



________________________________________
Rasch mailing list
email: Rasch at acer.edu.au
web:
https://mailinglist.acer.edu.au/mailman/options/rasch/ray.adams%40acer.edu.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20140928/8db53b8d/attachment-0001.html 


More information about the Rasch mailing list