[Rasch] Rasch-inspired open-source software now available for highly multidimensional data

Mark Moulton markhmoulton at gmail.com
Sat Mar 24 16:00:18 EST 2012


Hey Patrick, great to hear from you!  I like following you in LinkedIn.

Examples that can produce highly multidimensional data sets:

   - Movie ratings, music ratings, book ratings, restaurant ratings, etc.
   - Personality profiles, facial profiles, criminal profiles, HR hiring,
   dating profiles
   - Risk assessment for banks making loans, business characteristics
   - Sports statistics, physical fitness/health, physical performance,
   mental performance
   - Image recognition, image cleaning, audio compression
   - Group characteristics, social networks
   - Internet traffic, website popularity
   - Words and language (latent semantic analysis) -- English words have
   been found to reduce to somewhere around 300 dimensions
   - DNA -- the human genome
   - Educational data -- analyzed at the distractor level rather than the
   scored level (typically less than 10 dimensions and more than 1)

And so on.  Some of these analyses I've already done.  Most I desperately
want to do if I can find time.

The key, as Mark Wilson has pointed out, is distinguishing between
"within-item" multidimensionality and "between-item" multidimensionality.
 Say you have a test (Test A) composed solely of word problems.  This is
within-item multidimensionality -- each item requires some math and some
language, in differing proportions.  Say you have a test (Test B) which is
strictly math problems for half the test, strictly language items for the
other half.  This is between-item multidimensionality.

Generally, Damon prefers Test A "within-item" multidimensionality.  All
items participate in the same 2-dimensional space.  To analyze Test B you
would either analyze the two halves of the test separately (like Rasch) or
use a special method within Damon called objectify() which offers
strategies for dealing with "ragged" spaces.

Most of the time, there is no way to know the dimensionality ahead of time.
 Once Damon finds it, you perform the usual analysis of fit to make sure
items are participating in a common space.

Thanks for writing.  Hope to see you in Vancouver!

Mark





On Fri, Mar 23, 2012 at 8:39 PM, Patrick B. Fisher <
pfisher at sportsmeasures.com> wrote:

>  Mark,
>
> There are ever increasing levels of Rasch knowledge and ability on this
> list. Would you please explain what you mean by highly multidimensional?
> Can you give other examples similar to the Netflix dataset?
>
> Thanks,
>
> Patrick Fisher
> President
> SportsMeasures
> Sports Ranking Services
> and
> Character Assessment
>
> On 3/23/2012 10:06 PM, Mark Moulton wrote:
>
> *Dear Rasch Colleagues,*
>
>  I want to let you know about *Damon*, a Python-based software package
> that I'm releasing free to the Rasch psychometric community and general
> public.  The result of some 20 years hard labor (initiated by Ben Wright to
> his surprise), it has been in private commercial use for the last five
> years.  It was designed specifically to apply Rasch's objectivity criterion
> to *highly multidimensional* datasets, such as the infamous Netflix movie
> ratings dataset.
>
>  It is documented and available for download through
> www.pythiasconsulting.com .  If there is any interest, I will hold a *Damon
> workshop in Vancouver* at my hotel on the morning of *Friday, April 13,*after
> *IOMW*.
>
>  Contact me if you have any questions.
>
>  Thanks!
>
>  Mark H. Moulton, Ph.D.
> Pythias Consulting
>
>  markhmoulton at gmail.com
> 408-307-2794
>
>
>  *Features/Bugs (depending on point of view):*
>
>    - *Multidimensionality.*  Easily handles data with 1, 5, 10, 50, 100+
>    dimensions, including items that are negatively correlated.
>    - *Data types.*  Handles interval, ordinal, ratio, dichotomous,
>    polytomous, and nominal data, including within the same dataset.
>    - *Missing Data. * Handles pretty much any proportion of missing data,
>    random or non-random.
>    - *Labels.*  Datasets support row labels and column labels of any
>    depth.  Queries are label-based.
>    - *Objectivity.*  Damon offers clear criteria for determining and
>    optimizing the objectivity of all reported statistics, including "best
>    dimensionality".  These include Rasch's parameter invariance requirement.
>    - *Measures.*  Person measures consist of cell estimates, in either
>    the original or a linear (logit) metric, averaged across one or more items
>    within a dataset.  They answer the question:  *How able is Person A on
>    the construct embodied by a defined subset of items?*
>    - *Predictions.*  Predictions are of the form:  *How would Person A
>    have performed on Item 1 if he had taken the item?*
>    - *Equating.*  Parameters from one dataset are transferrable as
>    anchors to a comparable dataset.
>    - *Analysis of Fit.*  Damon reports fit statistics, standard errors,
>    etc., for determining the degree to which the observed data fits into the
>    objective space.
>    - *Rasch.*  Damon includes a Rasch module based on the Winsteps JMLE
>    implementation.
>
> *Algorithm*
>
>    - *ALS/Rasch.*  The algorithm is classified as a multidimensional
>    alternating least squares matrix decomposition subjected to a strict
>    Rasch-based objectivity optimization criterion.
>
>  *Usability*
>
>    - *Python.*  Damon is built on top of the Python scripting language
>    and the Numpy numerical package (both free).  It is accessed through a
>    command-line shell or run from scripts.  If you have experience with the
>    statistical language *R*, or *MatLab*, you will feel at home.
>     Although Damon is written in Python, it requires very little expertise in
>    Python programming.   The website includes a tutorial.  Damon's inline help
>    resources are extensive.
>
> *A Sample Damon script*
>
>  >>>  import damon1.core as dmn
> >>>  Data =
> dmn.DamonObj('California_May2012.csv','TextFile',nHeaders4Rows=5,nHeaders4Cols=1,ValidChars=['All',[0,1]])
> >>>  Data.standardize()
> >>>  Data.coord([range(1,11)],RunSpecs=[0.0001,20])
> >>>  Data.baseEst()
> >>>  Data.finEst()
> >>>  Data.summStat()
> >>>  Data.export(['summStat_out','baseEst_out','finEst_out'],'TextFile')
>
>  This short program imports and formats a text file, standardizes the
> data, looks for the optimal (most objective) dimensionality in a range of
> 10 dimensions, computes coordinates (multidimensional abilities and
> difficulties) for each person and item, computes an array of cell estimates
> (for both missing and non-missing cells), computes another array of cell
> (0,1) predictions, computes person measures, and exports three of the
> outputs as text files.
>
>  Various other statistics, functions, and methods are available in Damon,
> plus the massive libraries in Numpy and related Python packages.
>
>
>
>
>
> _______________________________________________
> Rasch mailing listRasch at acer.edu.au
> Unsubscribe: https://mailinglist.acer.edu.au/mailman/options/rasch/pfisher%40sportsmeasures.com
>
>
> _______________________________________________
> Rasch mailing list
> Rasch at acer.edu.au
> Unsubscribe:
> https://mailinglist.acer.edu.au/mailman/options/rasch/markm%40eddata.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailinglist.acer.edu.au/pipermail/rasch/attachments/20120323/614cf841/attachment.html 


More information about the Rasch mailing list