[Rasch] PISA critique in TES - a competition?

Svend Kreiner svkr at sund.ku.dk
Mon Sep 29 19:04:10 EST 2014

Hi All,

So it is back to PISA. Again.

It is correct that I and Karl Bang Christensen show that the Rasch model does not fit PISA’s reading items, but the problem as we see it does not have anything to do with multidimensionality. The fundamental fundamental problem is that PISA disregards DIF relative to Country and disregards that fact that the testlet structure in PISA has creates local dependence among items. The DIF problem means that country means are confounded, and the local dependence means that the estimates of the variance of the person parameter are inflated, and both of these problems influence the ranks of the countries. Whether or not this is important is another matter. We therefore tried to assess the robustness of the ranks in several ways including comparison of PISAQ’s ranks with the ranks from a model that takes DIF and local dependence into account. Our reading of the results is that the model errors do matter. I suggest that you read our paper and draw your own conclusions. If you have other ideas about how to test the robustness we will be very happy to hear them and promise to try them out and send you the results.

Up to last summer, PISA has never admitted in print that there is DIF relative to Country. The most interesting thing that came out of the TES discussion last year is therefore the letter by Andreas Schleicher that you can find at http://www.tes.co.uk/article.aspx?storycode=6345213 where he admits that “it is nonsense” to assume “that there should be no variability in performance on individual question between students in different countries”. That is, that it is nonsense to assume that there is no DIF relative to Country.  
You can read my response at http://www.tes.co.uk/article.aspx?storycode=6360708. Schleicher’s statement is nothing less than remarkable, because it is the first time that PISA admits that PISA has a DIF problem. It is also remarkable, because it follows that he must regard PISA’s Rasch model  as a nonsense model. Let me quote PISA’s technical reports to put his remarks in perspective. In these reports, you can read statements like “The interpretation of a scale can be severely biased by unstable item characteristics from one country to the next” (page 86 in the report from 2006) and “consistency of item parameter estimates across countries was of particular interest” (page 147 in the same report). These and similar statements can be found in all the technical reports, so we have to assume that they mean it. Schleicher’s remarks can therefore only mean one of the following things. He either disagrees with the way ACER has analysed the data, he has not read the technical reports, or he does not understand what he has read. Whatever the reason, the point he makes is important. To use his own words, it requires little consideration to realise that the idea of no DIF relative to Country is nonsense.
He then goes on to claim that “PISA has convincingly and conclusively shown that the design of the tests and the scaling model used to score them lead to robust measures of country performance that are not affected by the composition of the item pool” and that “the results of these analyses are documented in the PISA technical reports”. This, of course, is a complete falsehood and typical of the way PISA argues when they have to defend themselves. There is nothing whatsoever about assessment of robustness of PISA ranks in the technical reports. Please check yourself, if you do not believe me. The only thing I know of from their hands is a report on science items in PISA 2006. Karl and I have commented on it in our Psychometrika paper. The report (Adams R, Berezner A. & Jakubowski, M. Analysis of PISA 2006 preferred items ranking using the  percent-correct method) can be downloaded at  http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdf). It was published after our discussion back in March 2010. During this discussion, Ray Adams did not refer to this or similar reports and I am sure that no such reports existed at that time, because he could have closed our discussion back then before it started to get serious.
Anyway, you do not have to believe me. Download the things yourself and see what you can find. And remember to ask PISA about the precise references every time they claim to and reported something.

And finally, we do appreciate the benefit of the doubt, but actually want more than this. Please read our Psychometrika paper and let us know, if you find some errors. We will be happy to acknowledge and correct them. It is perhaps too much to ask, but it would also be nice if somebody would read PISA's technical reports. Until now, I know of nobody besides me and Karl who has done so, but it would be much more difficult for PISA to avoid the discussion if people systematically checked their claims. And the only way to do it is by reading their reports.



From: rasch-bounces at acer.edu.au [rasch-bounces at acer.edu.au] on behalf of Borsboom, Denny [D.Borsboom at uva.nl]
Sent: Monday, September 29, 2014 9:48 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] PISA critique in TES - a competition?

Hi all

isn't it somewhat strange that one should buy a commercial book to assess the details on dimensionality assessment of one of the largest psychometric attacks on the tax payer's wallet in the history of mankind?

In general, the lack of publicy available data and documentation on the psychometric procedures followed is worrying. I'd suggest that if PISA wants to gain back some street credibility in scientific circles, they simply have to make public the data, analysis code, and decision procedures used to arrive at their results. By this I don't mean the woolly stuff that's in the tech reports I've seen, but simply datasets with analysis code that reproduce the analyses used to assess e.g. dimensionality and DIF.

As long as PISA cannot offer such documentation, we should give Kreiner the benefit of the doubt, simply because, in contrast to PISA's, his work is in fact sufficiently detailed to pass the minimum scientific tresholds of reproducibility and transparency.


From: rasch-bounces at acer.edu.au [rasch-bounces at acer.edu.au] on behalf of Brady Michael Jack [bradyjack at gmail.com]
Sent: Monday, September 29, 2014 3:23 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] PISA critique in TES - a competition?

Hi Steve,

In the article "Linking PISA competencies over three cycles - results from Germany", Claus H. Carstensen (p. 204) explains how the uni-dimensionality of PISA items are assessed. His description may be of value to the conversation.

Carstensen, C. H. (2013). Linking PISA competencies over three cycles - results from Germany. In M. Prenzel, M. Kobarg, K. Schöps & S. Rönnebeck (Eds.), Research on PISA: research outcomes of the PISA research conference 2009 (pp. 199-214): Springer.


On Mon, Sep 29, 2014 at 8:51 AM, Steve Kramer <skramer1958 at verizon.net<mailto:skramer1958 at verizon.net>> wrote:
There  is a lot of noise in the TES article, but one potentially legitimate
problem: Kreiner claims that the PISA questions violate the
Rasch uni-dimensionality assumption to such a degree that a Rasch model
can't be used, or at least can't be used with sufficient precision to rank
countries meaningfully.  He says he tested this by trying out legitimate
subsets of questions, and investigating whether the differing subsets
predicted rankings that were similar to one another-and they didn't.
Specifically, "Canada could have finished anywhere between second and 25th
and Japan between eighth and 40th.".
Ray Adams responded by saying that PISA accounts for this problem, noting,
"We have always shown things like range of possible ranks, standard errors
and so on. We've also reported the effects of item selection ."

In fact the 2012 country report on Japan looks nothing like "between 8th and
40th".  In 2012 Japan was ranked between 1st and 3rd in all subjects.
Kreiner may have been using data from a different year, but I can't imagine
that the RANGE of possible ranks shrank from 32 (40-8) down to 2 (3-1).  I
see only two possibilities:  either Kreiner's methodology was absolutely
lousy, or else the PISA "range of ranks" was computed ASSUMING
uni-dimensionality and did not adequately CHECK for uni-dimensionality.

Ray, do you have any technical articles you can reference explaining how
PISA either designed test items for uni-dimensionality or else checked that
uni-dimensionality was an adequate model for creating country ranks? Also,
are there any tech reports on how PISA determined each country's potential
range of ranks?   Finally, I'd appreciate any tech reports in which PISA
investigated how choosing differing subsets of items affected country
ranking, or else articles (not necessarily PISA) explaining why a procedure
like Kreiner's sub-setting is not a legitimate test.    I suspect that
Kreiner's claims are simply based on invalid methodology, but I'd like to be
able to verify that suspicion.

Steve Kramer
The 21st Century Partnership for STEM Education

-----Original Message-----
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au> [mailto:rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au>] On Behalf
Of Mike Linacre
Sent: Sunday, September 28, 2014 6:13 PM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: Re: [Rasch] PISA critique in TES - a competition?

Yes, Jason.

Are PISA, TIMSS and similar studies really intended to advance education
worldwide or to advance political agendas? If the answer is "political
agendas" then the most politically-acceptable statistical methodologies are
the ones to choose. If the answer is "advance education" then everyone,
including the politicians, should be working towards discovering and using
the most effective statistical methodologies.

According to www.rasch.org/software.htm<http://www.rasch.org/software.htm> there are now seven Rasch-related R
modules. They are free. Wonderful! But there are 5,889 R modules. We will
need more Rasch R modules before we make a noticeable impact.

Mike L.

On 9/28/2014 21:21 PM, liasonas wrote:
> Mike, this is a great idea. But can the policy makers and the
> politicians allow us (the academics) to spoil their new toys ( the
> international studies)? The politicians use us (the academics) to
> produce data and reports which then the politicians use to carry out
> their little in-fightings and political debates.
>  We cannot afford to angry them, because we need their money and
> support. Maybe we need to train them on how to use our data most
> appropriately and sensibly. Pisa and Timms tables, for example, can be
> useful, but they are not the equivalent of The Bible.
> Having said all that, we need to thank Margaret and the other
> researchers for providing the methodological tools and packages (have
> you all had a glance of the TAM package on the R platform?). But we
> also need to thak Paul for seeding the seeds of doubt, because this is
> the only way for science to prosper.
> Jason

Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>

Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web: https://mailinglist.acer.edu.au/mailman/options/rasch/bradyjack%40gmail.com

Brady Michael Jack, Ph.D.
Postdoctoral Research Fellow
Center for General Education
National Sun Yat-sen University
70 Lien Hai Road,
Kaohsiung 804 Taiwan
Phone: +886-7-525-2000
Website: http://www.researchgate.net/profile/Brady_Jack/

More information about the Rasch mailing list