[Rasch] PISA critique in TES - a competition?

Margaret Wu wu at edmeasurement.com.au
Mon Sep 29 21:26:58 EST 2014


To present both sides of argument, you should be aware of the following
document

www.oecd.org/pisa/47681954.pdf

(I wasn't involved in writing the above document. I only found it today).


My first impression after reading the Kreiner and Christensen paper in
Psychometrika was that the authors didn't fully understand the scaling
model used in PISA. Since many queries raised in the paper were well known
methodologies established long before PISA came along, and these were
published as scholarly papers and implemented in NAEP and TIMSS for
several decades now.

Item DIF certainly exists in PISA (and TIMSS), and in many national
assessments. I didn't feel OECD tried to hide this. I did a working paper
for OECD comparing PISA and TIMSS (published by OECD 2010). Essentially
the working paper found that different countries had different strengths
and weaknesses in mathematics sub-strands, and that different content
balance in the test would lead to different rankings of countries. OECD
published this and it is downloadable. If you search the literature, many
papers discussed DIF in PISA, including
Monseur, C., & Berezner, A. (2007). The computation of equating errors in
international surveys in education. Journal of Applied Measurement, 8(3),
323-335.
In the above paper, the authors used PISA reading items and showed that
different sets of link items could lead to quite different results in
PISA. Monseur was at ACER working in the ACER PISA team for the first two
cycles of PISA, and has been a long serving member on the PISA technical
advisory group. I certainly don't share the view that there are blatant
attempts by the OECD to hide issues with DIF.

Margaret

-----Original Message-----
From: rasch-bounces at acer.edu.au [mailto:rasch-bounces at acer.edu.au] On
Behalf Of Svend Kreiner
Sent: Monday, 29 September 2014 7:04 PM
To: rasch at acer.edu.au
Subject: Re: [Rasch] PISA critique in TES - a competition?

Hi All,

So it is back to PISA. Again.

It is correct that I and Karl Bang Christensen show that the Rasch model
does not fit PISA’s reading items, but the problem as we see it does not
have anything to do with multidimensionality. The fundamental fundamental
problem is that PISA disregards DIF relative to Country and disregards
that fact that the testlet structure in PISA has creates local dependence
among items. The DIF problem means that country means are confounded, and
the local dependence means that the estimates of the variance of the
person parameter are inflated, and both of these problems influence the
ranks of the countries. Whether or not this is important is another
matter. We therefore tried to assess the robustness of the ranks in
several ways including comparison of PISAQ’s ranks with the ranks from a
model that takes DIF and local dependence into account. Our reading of the
results is that the model errors do matter. I suggest that you read our
paper and draw your own conclusions. If you have other ideas about how to
test the robustness we will be very happy to hear them and promise to try
them out and send you the results.

Up to last summer, PISA has never admitted in print that there is DIF
relative to Country. The most interesting thing that came out of the TES
discussion last year is therefore the letter by Andreas Schleicher that
you can find at http://www.tes.co.uk/article.aspx?storycode=6345213 where
he admits that “it is nonsense” to assume “that there should be no
variability in performance on individual question between students in
different countries”. That is, that it is nonsense to assume that there is
no DIF relative to Country.
You can read my response at
http://www.tes.co.uk/article.aspx?storycode=6360708. Schleicher’s
statement is nothing less than remarkable, because it is the first time
that PISA admits that PISA has a DIF problem. It is also remarkable,
because it follows that he must regard PISA’s Rasch model  as a nonsense
model. Let me quote PISA’s technical reports to put his remarks in
perspective. In these reports, you can read statements like “The
interpretation of a scale can be severely biased by unstable item
characteristics from one country to the next” (page 86 in the report from
2006) and “consistency of item parameter estimates across countries was of
particular interest” (page 147 in the same report). These and similar
statements can be found in all the technical reports, so we have to assume
that they mean it. Schleicher’s remarks can therefore only mean one of the
following things. He either disagrees with the way ACER has analysed the
data, he has not read the technical reports, or he does not understand
what he has read. Whatever the reason, the point he makes is important. To
use his own words, it requires little consideration to realise that the
idea of no DIF relative to Country is nonsense.
He then goes on to claim that “PISA has convincingly and conclusively
shown that the design of the tests and the scaling model used to score
them lead to robust measures of country performance that are not affected
by the composition of the item pool” and that “the results of these
analyses are documented in the PISA technical reports”. This, of course,
is a complete falsehood and typical of the way PISA argues when they have
to defend themselves. There is nothing whatsoever about assessment of
robustness of PISA ranks in the technical reports. Please check yourself,
if you do not believe me. The only thing I know of from their hands is a
report on science items in PISA 2006. Karl and I have commented on it in
our Psychometrika paper. The report (Adams R, Berezner A. & Jakubowski, M.
Analysis of PISA 2006 preferred items ranking using the  percent-correct
method) can be downloaded at
http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdf). It was
published after our discussion back in March 2010. During this discussion,
Ray Adams did not refer to this or similar reports and I am sure that no
such reports existed at that time, because he could have closed our
discussion back then before it started to get serious.
Anyway, you do not have to believe me. Download the things yourself and
see what you can find. And remember to ask PISA about the precise
references every time they claim to and reported something.

And finally, we do appreciate the benefit of the doubt, but actually want
more than this. Please read our Psychometrika paper and let us know, if
you find some errors. We will be happy to acknowledge and correct them. It
is perhaps too much to ask, but it would also be nice if somebody would
read PISA's technical reports. Until now, I know of nobody besides me and
Karl who has done so, but it would be much more difficult for PISA to
avoid the discussion if people systematically checked their claims. And
the only way to do it is by reading their reports.

Best

Svend

________________________________________
From: rasch-bounces at acer.edu.au [rasch-bounces at acer.edu.au] on behalf of
Borsboom, Denny [D.Borsboom at uva.nl]
Sent: Monday, September 29, 2014 9:48 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] PISA critique in TES - a competition?

Hi all

isn't it somewhat strange that one should buy a commercial book to assess
the details on dimensionality assessment of one of the largest
psychometric attacks on the tax payer's wallet in the history of mankind?

In general, the lack of publicy available data and documentation on the
psychometric procedures followed is worrying. I'd suggest that if PISA
wants to gain back some street credibility in scientific circles, they
simply have to make public the data, analysis code, and decision
procedures used to arrive at their results. By this I don't mean the
woolly stuff that's in the tech reports I've seen, but simply datasets
with analysis code that reproduce the analyses used to assess e.g.
dimensionality and DIF.

As long as PISA cannot offer such documentation, we should give Kreiner
the benefit of the doubt, simply because, in contrast to PISA's, his work
is in fact sufficiently detailed to pass the minimum scientific tresholds
of reproducibility and transparency.

Best
Denny

________________________________
From: rasch-bounces at acer.edu.au [rasch-bounces at acer.edu.au] on behalf of
Brady Michael Jack [bradyjack at gmail.com]
Sent: Monday, September 29, 2014 3:23 AM
To: rasch at acer.edu.au
Subject: Re: [Rasch] PISA critique in TES - a competition?

Hi Steve,

In the article "Linking PISA competencies over three cycles - results from
Germany", Claus H. Carstensen (p. 204) explains how the uni-dimensionality
of PISA items are assessed. His description may be of value to the
conversation.

Carstensen, C. H. (2013). Linking PISA competencies over three cycles -
results from Germany. In M. Prenzel, M. Kobarg, K. Schöps & S. Rönnebeck
(Eds.), Research on PISA: research outcomes of the PISA research
conference 2009 (pp. 199-214): Springer.

Respectfully,
Brady

On Mon, Sep 29, 2014 at 8:51 AM, Steve Kramer
<skramer1958 at verizon.net<mailto:skramer1958 at verizon.net>> wrote:
There  is a lot of noise in the TES article, but one potentially
legitimate
problem: Kreiner claims that the PISA questions violate the Rasch
uni-dimensionality assumption to such a degree that a Rasch model can't be
used, or at least can't be used with sufficient precision to rank
countries meaningfully.  He says he tested this by trying out legitimate
subsets of questions, and investigating whether the differing subsets
predicted rankings that were similar to one another-and they didn't.
Specifically, "Canada could have finished anywhere between second and 25th
and Japan between eighth and 40th.".
Ray Adams responded by saying that PISA accounts for this problem, noting,
"We have always shown things like range of possible ranks, standard errors
and so on. We've also reported the effects of item selection ."

In fact the 2012 country report on Japan looks nothing like "between 8th
and 40th".  In 2012 Japan was ranked between 1st and 3rd in all subjects.
Kreiner may have been using data from a different year, but I can't
imagine that the RANGE of possible ranks shrank from 32 (40-8) down to 2
(3-1).  I see only two possibilities:  either Kreiner's methodology was
absolutely lousy, or else the PISA "range of ranks" was computed ASSUMING
uni-dimensionality and did not adequately CHECK for uni-dimensionality.

Ray, do you have any technical articles you can reference explaining how
PISA either designed test items for uni-dimensionality or else checked
that uni-dimensionality was an adequate model for creating country ranks?
Also, are there any tech reports on how PISA determined each country's
potential
range of ranks?   Finally, I'd appreciate any tech reports in which PISA
investigated how choosing differing subsets of items affected country
ranking, or else articles (not necessarily PISA) explaining why a
procedure
like Kreiner's sub-setting is not a legitimate test.    I suspect that
Kreiner's claims are simply based on invalid methodology, but I'd like to
be able to verify that suspicion.

Steve Kramer
The 21st Century Partnership for STEM Education

-----Original Message-----
From: rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au>
[mailto:rasch-bounces at acer.edu.au<mailto:rasch-bounces at acer.edu.au>] On
Behalf Of Mike Linacre
Sent: Sunday, September 28, 2014 6:13 PM
To: rasch at acer.edu.au<mailto:rasch at acer.edu.au>
Subject: Re: [Rasch] PISA critique in TES - a competition?

Yes, Jason.

Are PISA, TIMSS and similar studies really intended to advance education
worldwide or to advance political agendas? If the answer is "political
agendas" then the most politically-acceptable statistical methodologies
are the ones to choose. If the answer is "advance education" then
everyone, including the politicians, should be working towards discovering
and using the most effective statistical methodologies.

According to www.rasch.org/software.htm<http://www.rasch.org/software.htm>
there are now seven Rasch-related R modules. They are free. Wonderful! But
there are 5,889 R modules. We will need more Rasch R modules before we
make a noticeable impact.

Mike L.


On 9/28/2014 21:21 PM, liasonas wrote:
> Mike, this is a great idea. But can the policy makers and the
> politicians allow us (the academics) to spoil their new toys ( the
> international studies)? The politicians use us (the academics) to
> produce data and reports which then the politicians use to carry out
> their little in-fightings and political debates.
>  We cannot afford to angry them, because we need their money and
> support. Maybe we need to train them on how to use our data most
> appropriately and sensibly. Pisa and Timms tables, for example, can be
> useful, but they are not the equivalent of The Bible.
> Having said all that, we need to thank Margaret and the other
> researchers for providing the methodological tools and packages (have
> you all had a glance of the TAM package on the R platform?). But we
> also need to thak Paul for seeding the seeds of doubt, because this is
> the only way for science to prosper.
> Jason

________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web:
https://mailinglist.acer.edu.au/mailman/options/rasch/skramer1958%40verizo
n.
net

________________________________________
Rasch mailing list
email: Rasch at acer.edu.au<mailto:Rasch at acer.edu.au>
web:
https://mailinglist.acer.edu.au/mailman/options/rasch/bradyjack%40gmail.co
m



--
Brady Michael Jack, Ph.D.
Postdoctoral Research Fellow
Center for General Education
National Sun Yat-sen University
70 Lien Hai Road,
Kaohsiung 804 Taiwan
Phone: +886-7-525-2000
Website: http://www.researchgate.net/profile/Brady_Jack/
________________________________________
Rasch mailing list
email: Rasch at acer.edu.au
web:
https://mailinglist.acer.edu.au/mailman/options/rasch/wu%40edmeasurement.c
om.au


More information about the Rasch mailing list