[Rasch] Rasch and Factor Analysis questions
Donald Van Metre
devium at mindspring.com
Fri Mar 6 11:14:31 EST 2009
First, I apologize for cross-posting this on the Rasch and LTEST-L boards, but I thought it would be appropriate because my questions are both Rasch and non-Rasch related.
I'm preparing to do a masters dissertation in Language Testing at Lancaster University. I'm a newcomer to analyzing test data, and have some basic questions that I'm hoping some of you may be able / willing to answer.
I have access to student response data for a set of 8 simulated iBTs. I'd like to look at the data for the listening sections. The items are all coded by skill, for example, Detail, Inference, Vocabulary, and so on.
I'd like to look at whether the items are actually measuring different skills. My instinct tells me that there is probably a great deal of overlap among the item types (skills), and that while there may be 10 item types (skills) they are all measuring a smaller set of subskills.
I'd like to find out to what degree the skills overlap, where the redundancies are, and how the skills might be ranked in terms of difficulty.
My initial thought was to use Rasch or an IRT analysis to get ability estimates for the items, and then compare the means for each skill code, yielding a hierarchy of skills by difficulty. (This could have implications for teaching listening skills, allowing resources to be allocated rationally.)
If Rasch / IRT indicates that items with skill code X are more difficult on average than items of code Y, can I infer that the resulting logit pattern (ability estimates) arrays the skills (and not just the items) along a difficulty continuum?
If the items are supposedly testing discrete skills, does this imply multi-dimensionality of the entire listening data set taken together? If so, does this violate unidimensionality assumptions for Rasch / IRT? In other words, if I can show that the items test different skills (latent traits), can I still use Rasch / IRT on the aggregate data set?
I'm thinking a more appropriate approach would be factor analysis. I'm expecting that the items will load highly on a general factor - let's call it "general listening ability" - and then load differentially on a small number of factors, representing the individual skills. I expect to see most of the variance accounted for by a small number of factors - smaller than the number of skill codes. I expect several of the items types (skills) to show similar patterns of factor loading.
I guess my questions come down to:
What information can Rasch / IRT provide beyond estimates of the ability of the items?
Is factor analysis appropriate in this situation?
If factor analysis shows two skills to load similarly on the factors, can this be taken as evidence that they are likely testing the same thing? What statistic would I use to indicate the strength of the relationship between two (or more) skills?
If an item is shown to load on a general factor, but also to load heavily on additional factors, does this imply that the item is multi-dimensional? If so, what implications does this have under Rasch or IRT ?
What else should I be considering?
Thank you very much for whatever assistance you can provide.
-Donald Van Metre-
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rasch