[ml] KDD cup submission status

Fri Jun 4 01:17:16 UTC 2010

Doing a few basic tricks, I catapulted the submission into the 50th
percentile.  That is not even running any ML algorithm.

I'm planning on running the NaiveBayesUpdateable classifier
(http://weka.wikispaces.com/Classifying+large+datasets) over
discretized IQ/IQ strength/Chance/Chance strength from the command
line to evaluate performance.  Another attempt would be to load all
data into memory (<3GB, even for full Bridge Train) and run SVMlib
over it.

If someone wants to try MOA
(http://www.cs.waikato.ac.nz/~abifet/MOA/index.html), this would be
helpful also in the long run (at least a tutorial how to set it up and
run).

The reduced datasets plus the IQ values are linked on the wiki: Features are:
   ...> row INT,
   ...> studentid VARCHAR(30),
   ...> problemhierarchy TEXT,
   ...> problemname TEXT,
   ...> problemview INT,
   ...> problemstepname TEXT,
   ...> cfa INT,
   ...> iq REAL

IQ strength (number of attempts per student) should be available soon.
 (perhaps add'l features will become available as well)

I'm still hoping somebody could cluster Erin's normalized skills data
and provide a row -> cluster id mapping for algebra and bridge train
and test sets (I don't have the data any more).

Andy