# [Rasch] PISA critique in TES

Larry Ludlow larry.ludlow at bc.edu
Sun Sep 28 12:56:24 EST 2014

```nice explanation Margaret.
Larry

On Sat, Sep 27, 2014 at 10:08 PM, Margaret Wu <wu at edmeasurement.com.au>
wrote:

> I think there appears to be some misunderstanding about imputation.
> Imputation never creates new data. Of course we cannot invent data. Here is
> a simple example about imputation. Suppose we have a data set with people’s
> height and weight measures, with some data records missing either height or
> weight measures. We carry out a regression with only complete data records
> (respondents with both height and weight measures). We obtain estimates of
> regression coefficients from this analysis (Analysis A). For the data
> records with missing responses, we can impute a value in the following way.
> Suppose person n has a weight of 65kg, but his/her height measure is
> missing. We look up the regression model from Analysis A, and look at the
> distribution of heights of people with a weight of 65kg. This (conditional)
> distribution represents the likely heights of people with a weight of 65
> kg. We randomly draw an observation from this conditional distribution and
> produce an imputed height for person n. If we now re-analyse the data with
> the imputed height included, we first observe that the regression estimates
> should not change, since we imputed from the regression model obtained in
> Analysis A. However, the standard errors for the regression coefficients
> will now be smaller than those from Analysis A, because our second analysis
> assumes the imputed values are actually observed (so we have more data than
> actually observed). To make sure that we don’t have increased precision
> when imputed data are added, we make multiple imputations. For each imputed
> data set, we carry out a regression analysis. The results from these
> regressions will vary, because the imputed values are not the same each
> time, since we impute from a distribution. (Note that we do not use the
> mean of the conditional distribution as our imputed value. Instead we do a
> random draw). The variations between the multiple regression runs will
> reflect the uncertainty introduced by the imputations. We then have a
> formula for combining the multiple regression runs to add the uncertainty
> back into our regression parameters, so that the data sets with imputed
> values will produce just the same estimates and standard errors as for
> Analysis A.
>
>
>
> So you may ask why bother to do imputation if the results of the imputed
> data sets produce the same results as the complete data set. Sometimes we
> have many variables of interest. If we have lots of missing values among
> different variables, and we do list-wise deletion of records, we may throw
> away a lot of records. Using imputation, we can have complete data sets for
> carrying out many statistical analyses, using all the data we have
> collected, and at the same time take into account that some data are
> missing.
>
>
>
> There has been a lot of literature on imputation. See Rubin, Little,
> Graham, Schafer,…
>
>
>
> Using plausible values (PV) is one method of imputation. I should mention
> that the methodology of PV was not invented by PISA. The work of Bock,
> Mislevy, among others, in the 1980s has greatly contributed to large-scale
> assessment methodologies. I think one misconception of people who are not
> familiar with Bayesian IRT is that you need to estimate individuals well
> before you can measure the population parameters well. Actually, in
> Bayesian IRT (or the MML method), individual person abilities are not
> parameters in the model to be estimated. Bayesian IRT models overcome a lot
> of issues relating to forming population estimates from individual ability
> estimates which contain measurement errors. A lot of work has been done in
> showing how the (so-called) PISA model works well for large-scale surveys.
> Of course, for PISA, there are always issues with real-life applications of
> mathematical models, but these issues are not those raised in the articles
> recently mentioned such as in this thread of discussion. It appears that
> there are “suspicions”, but there is a lack of a full understanding of the
> actual models, so some explanations about the methodologies may help.
>
>
>
> Margaret
>
>
>
> *From:* rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] *On
> Behalf Of *Paul Barrett
> *Sent:* Sunday, 28 September 2014 7:46 AM
> *To:* rasch at acer.edu.au
> *Subject:* Re: [Rasch] PISA critique in TES
>
>
>
> I came across this paragraph in an ‘explanatory’ article in:
>
>
> *What is this site?*
>
> *This site is produced by the Winton programme for the public
> understanding of risk based in the Statistical Laboratory in the University
> of Cambridge*.
>
>
>
> The article is at:
>
>
>
>
>
> The paragraph which caught my interest was:
>
> A simple Rasch model (PISA Technical Report , Chapter 9) is assumed, and
> five values for each student are generated at random from the 'posterior'
> distribution given the information available on that student. *So for the
> half of students in 2006 who did not answer any reading questions*, five
> 'plausible' reading scores are generated on the basis of their responses on
> other subjects.
>
>
>
> Look at that last sentence, and the bit I’ve underlined. Ordinarily, as a
> scientist rather than statistician, I’d burst out laughing at such an
> idiotic research design which ended up with this state of affairs as a
> purposeful ‘design’ feature.
>
>
>
> But maybe the “imputation” prediction model really does work as claimed on
> such data? My laughing is foolish after all. I’m not interested in simple
> monte-carlo  expositions, but with what happens with real data, messy
> sampling, and items which don’t all fit the Rasch model.
>
>
>
> And, indeed, empirical evidence using such PISA appears to exist,
> detailing the accuracy of the plausible value procedure to correctly
> estimate the population scores for students who answer no items at all on
> reading ability, from scores on other variables (Svend Kreiner). The result
> seems to indicate that it is substantively *inaccurate*.
>
>
>
> The validity or otherwise of such ‘plausible values’ claims are matters
> for empirically-determined quantified predictive accuracy, where actual
> observational data are used, doing exactly what PISA does on say several
> groups of students who have undertaken several ‘same-item’ tests on two or
> more attributes, Then re-estimating the population parameters for each
> group based upon half of that group not answering any questions for a
> particular attribute. This is not rocket science.  I’m assuming it has been
> done and published, and replicated by independent research groups? (Anyone
> have a reference or two)?
>
>
>
> If this is so, it is puzzling how Kriener’s analyses could have revealed
> contrasting results.
>
>
>
> Regards .. Paul
>
>
>
> *Chief Research Scientist*
>
>
>
> *__________________________________________________________________________________*
>
>
> *W*: www.pbarrett.net
>
> *E*: paul at pbarrett.net
>
> *M*: +64-(0)21-415625
>
>
>
> *From:* rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au
> <rasch-bounces at acer.edu.au>] *On Behalf Of *Adams, Ray
> *Sent:* Saturday, September 27, 2014 2:08 PM
> *To:* rasch
> *Subject:* Re: [Rasch] PISA critique in TES
>
>
>
> Mike,
>
>
>
> We have always shown things like range of possible ranks, standard errors
> and so on. We've also reported the effects of item selection and all data
> collected is publicly accessible for others to scrutinise.
>
>
>
> Morrison dismisses all latent variable models kreiner says throw away
> everything that doesn't fit rasch perfectly and Goldstein says we throw
> away too much
>
>
>
> Oh, and the comments about plausible values are just statistical naïveté,
> believe them and NAEP would have to be scrapped as would any statistical
> methods that use montecarlo estimation, the theory of which was regarded as
> sound last time I looked.
>
>
>
> I love this criticism, it shows pisa is important. Putting energy into
> criticising it is good, I just wish genuine problems were uncovered and
> addressed. Goldstein does best on that front, I too would love longitudinal
> components and more finer grained analyses of subsets of items
>
>
>
> Ray
>
>
> Sent from my iPhone
>
>
> On 27 Sep 2014, at 11:13 am, Mike Linacre <mike at winsteps.com> wrote:
>
> Thanks, T.
>
> That article, and the comments following it, suggest to me that PISA
> results should be reported as box-and-whisker plots, not rankings. Then
> every country could choose to be at the top of its own whisker ....
>
> <mime-attachment.png>
> Or perhaps PISA already do this??
>
> Mike Linacre
>
> On 9/27/2014 10:18 AM, Bond, Trevor wrote:
>
> http://www.tes.co.uk/article.aspx?storycode=6344672
>
> Collegially
>
> TGB
>
>
>
> ________________________________________
> Rasch mailing list
> email: Rasch at acer.edu.au
> web:
>
>
> ________________________________________
> Rasch mailing list
> email: Rasch at acer.edu.au
> web: https://mailinglist.acer.edu.au/mailman/options/rasch/ludlow%40bc.edu
>

--
Larry Ludlow, PhD
Professor and Department Chair
Educational Research, Measurement and Evaluation Department
Lynch School of Education
Boston College
140 Commonwealth Avenue
Campion Hall 336C
Chestnut Hill, MA 02467
617-552-4221
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20140927/7e27e1eb/attachment-0001.html
```