I suspect that cheating (a higher score than ability without really posessing the quality) is possible in any assessment, as it is also possible to turn in work that was purchased or some other form of dishonesty.  That's why certification and high-stakes decisions that protect the public from harm are well-served by a certain amount  of performance that is observed, realistic, and repeatable on-demand.   Performance testing makes cheating more difficult and detectable, even of the tasks are openly available.

It's difficult to "fake" knowledge incorporated into a skill if you are being observed or the process is scored.  For example if you are performing an experiment in a laboratory or solving a problem using algebra while your work is recorded.  In many cases, there is simply no way to memorize a response if I say, "Use the materials provided and solve this problem using algebra and tell me what you are thinking and doing as you solve it."    You can't guess the answer.  You have to reveal your knowledge of the content and process.  You may even have to defend or discuss your response if faking is suspected.

Standardizing that task, a scoring rubric, and creating a pool of calibrated performance items is quite an effort!  In some cases there would be multiple forms of some problem types and the pool of items would be clearly organized by a taxonomy or framework that makes sense.  I think that the difficulty of cheating with performance testing is a plus.  Performance items would likely be PART of an open assessment library useful for high-stakes decisions.   Multiple forms of items is another.

The downside of  performance items are in two areas:  1.) Performance items are more costly and time-consuming to administer, 2.) The system must monitor a complex scoring process.

If you consider the number of errors in an admissions process for college, it seems that investment in some performance indicators may be useful compared to the wasted resources after the fact.   The tasks expected and the scoring criteria can be open to the public.    

It's my guess that machine-scored paper tests often hide the item disfunction (I'll let others debate Rasch vs. 3PL here) and ignore person misfit, including some forms of "cheating".  Most folks simply get a score and assume it has validity because of some "item alignment" with a set of standards.    Since the items are often secured, no one knows if the results are meaningful for a given purpose.  Most importantly, person fit is simply not reported.  Performance items combined with person fit results in scores that are very diagnostic in ways that I find quite useful, and cheating is only one small area of the conclusions possible. 

Judy Wilkerson and I often say that, "Rasch puts the people back in testing."

Steve Lang
University of South Florida St. Petersburg

Is it not possible for someone to memorise answers without understanding
what they (the questions and the answers) mean. For example, a primary
school student may memorise enough answers to items on Quantum Physics
and, without having the foggiest idea of what it is all about, pass a
test. What would such a result mean (other than demonstration of a good

John J Barnard
Executive Director: EPEC Pty Ltd

We have used an open item bank (we call it an assessment task library)
in Florida.  With performance tasks necessary for teacher certification,
most performances and products are required to be rated on a rubric at
an "minimally acceptable" level.  Most allow multiple tries for the
teacher candidate to pass.  The items that are standards-based are known
to those who are to be assessed.

The initial item analysis, calibration, and subsequent reporting are
Rasch based.  In Florida we have approximately 45 "tasks" and 350
"items" (criteria) that must be completed.  Think of it like
be a pilot, you must demonstrate you can land the plane.   Items are the
criteria that provide evidence that you can land the plane successfully.

In another example, think of a set of items that are skilled-based like
the famous Knox Cube data.  Can a test-taker "cheat"?

The idea is to deal with difficulty by avoiding knowledge-based items
that in favor of performance-based items.  It's more difficult and
expensive to perform a large number of items over a longitudinal time
and deal with professional raters, but from a validity standpoint it
makes sense as job-related.   You can already imagine why rater-error
diagnosis (FACETS) is useful.

We have found some interesting results comparing the Rasch difficulties
from the first try of candidates to the second or third try.   If items
become "easy", it says something about the mentoring/training of a
program and the "teachability" of the items.  I hope that makes sense.
"Lesson planning" seems to improve rapidly with training, but "classroom
management"  improves more slowly (some never get it).   Some items are
relatively easy, but necessary for teachers; like "communication with
parents".  Others are difficult, but also necessary; "assessment
accomodations for special students".   We wrote an initial report for

In a book to be releases next month:  "Assessing Teacher Competency:
Five Standards-Based Steps to Valid Measurements Using the CAATS Model"
, we (myself and Judy Wilkerson) describe Rasch and the reasons for
using Rasch to a general audience, but we expect the performance items
to typically be known to all the users.

Steve Lang
University of South Florida

Thank you everybody for offering me so much help on this issue. The
truth is that I received tens of messages from all over the world, but I
would like other to contribute as well, if they have not done so
It seems that there is much interest, and the trends are the following:

* excellent idea to have an open ended item bank accessible by everybody
for high stakes exams because it makes securityu issues non existent,
the quality of the items can be monitored, the candidates know what they
will face etc.


* item calibrations make no senese since item difficulties will drift
* a backwash effect is expected and teachers will probably teach to the
item bank
* the results/scores of the candidates will be inflated (they will have
the chance to memorize responses)

Have I missed anything? Please remind us.

I will also try to think whether anyone else is using such a thing in
practice, and I'll make a literature review.

I think this is progressing to a journal article. Does anyone think that
this is a good idea? Does anyone think that I should not spend my time
writing such an article because there may not be enough interest by the
audience or the editorial boards of journals? Which are the journals
that would be more likely to publish such a think?

Please help me to 'interview' experts in the fields - please suggest

Thanks everybody,
you have been really helpful. I hope you will help me more.

P.S. Apologies for hijacking the Rasch list. If not appropriate, please
let me know.

        Another thing Iasonas:
        Once you have an "open" item bank, is it useful to calibrate the
items? the difficulty of the items will ve reduced, even most of them
will be solved by mostly all the initial calibration has
nothing to do with the original calibration.
        This is another important topic to discuss.
        Agustin Tristan
Dr. Iasonas Lamprianou
