```I came across this paragraph in an ‘explanatory’ article in:

The article is at:

http://understandinguncertainty.org/pisa-statistical-methods-more-detailed-comments
omments

The paragraph which caught my interest was:

A simple Rasch model (PISA Technical Report , Chapter 9) is assumed, and
five values for each student are generated at random from the 'posterior'
distribution given the information available on that student. So for the
half of students in 2006 who did not answer any reading questions, five
'plausible' reading scores are generated on the basis of their responses on
other subjects.

Look at that last sentence, and the bit I’ve underlined. Ordinarily, as a
scientist rather than statistician, I’d burst out laughing at such an
idiotic research design which ended up with this state of affairs as a
purposeful ‘design’ feature.

But maybe the “imputation” prediction model really does work as claimed on
such data? My laughing is foolish after all. I’m not interested in simple
monte-carlo  expositions, but with what happens with real data, messy
sampling, and items which don’t all fit the Rasch model.

And, indeed, empirical evidence using such PISA appears to exist, detailing
the accuracy of the plausible value procedure to correctly estimate the
population scores for students who answer no items at all on reading
ability, from scores on other variables (Svend Kreiner). The result seems to
indicate that it is substantively inaccurate.

The validity or otherwise of such ‘plausible values’ claims are matters for
empirically-determined quantified predictive accuracy, where actual
observational data are used, doing exactly what PISA does on say several
groups of students who have undertaken several ‘same-item’ tests on two or
more attributes, Then re-estimating the population parameters for each group
based upon half of that group not answering any questions for a particular
attribute. This is not rocket science.  I’m assuming it has been done and
published, and replicated by independent research groups? (Anyone have a
reference or two)?

If this is so, it is puzzling how Kriener’s analyses could have revealed
contrasting results.

Mike,

We have always shown things like range of possible ranks, standard errors
and so on. We've also reported the effects of item selection and all data
collected is publicly accessible for others to scrutinise.

Morrison dismisses all latent variable models kreiner says throw away
everything that doesn't fit rasch perfectly and Goldstein says we throw away
too much

believe them and NAEP would have to be scrapped as would any statistical
methods that use montecarlo estimation, the theory of which was regarded as
sound last time I looked.

I love this criticism, it shows pisa is important. Putting energy into
criticising it is good, I just wish genuine problems were uncovered and
addressed. Goldstein does best on that front, I too would love longitudinal
components and more finer grained analyses of subsets of items

On 27 Sep 2014, at 11:13 am, Mike Linacre <mike at winsteps.com
<mailto:mike at winsteps.com> > wrote:

Thanks, T.

That article, and the comments following it, suggest to me that PISA results
should be reported as box-and-whisker plots, not rankings. Then every
country could choose to be at the top of its own whisker ....

Or perhaps PISA already do this??

http://www.tes.co.uk/article.aspx?storycode=6344672
