[ml] KDD cup submission status

Sat Jun 5 20:23:26 UTC 2010

Awesome.  I'm going to be looking at getting moa working today, and will
upload a how-to and code once I get it set.  Mike, thanks for setting up the
repository!  Andreas, if you have datasets with IQ/IQ strength available,
I'd love to make use of them (question, though: what is IQ strength as
compared to IQ?)  I'm also curious what you used for the submission, as I am
(happily) surprised at the good performance!

Go team!
-Thomas

On Sat, Jun 5, 2010 at 11:40 AM, Andreas von Hessling <vonhessling at gmail.com
> wrote:

> Sweet, Mike.  Please note that we need the row -> clusterid mapping
> for both training AND testing sets.  Otherwise it will not help the ML
> algorithms.
> If I understand correctly, your input are the orthogonalized skills.
> So far, the girls only provided these orthogonalizations for the
> training files.  I'm computing them for the test sets so you can use
> them.  If I don't understand this assumption correctly, please let me
> know so I can use my CPU's cycles for other tasks.
>
> Ideally you can provide these cluster mappings by about Sunday, which
> is when I want to start running classifiers.  I will need some time to
> actually run the ML algorithms.
>
> I have now IQ and IQ strength feature values for all datasets and am
> hoping time permits to compute chance and chance strength values for
> rows.
> Computing # of skills required should not be difficult and I will add
> this feature as well.  I plan on sharing my datasets as new versions
> become available.
>
> Andy
>
>
>
>
> On Fri, Jun 4, 2010 at 1:42 PM, Mike Schachter <mike at mindmech.com> wrote:
> > So it's taking about 9 hours to create a graph from a 4.4GB file, I'm
> > going to work on improving the code to make it a bit faster, and also
> > am investigating a MapReduce solution.
> >
> > Basically the clustering process can be broken down into two stages:
> >
> > 1) Construct the graph, apply the clustering algorithm to break graph
> into
> > clusters
> > 2) Apply the clustered graph to the data again to classify each skill set
> >
> > I'll keep working on it and let everyone know how things are going with
> it,
> > as I mentioned in another email, the source code is in our new
> sourceforge
> > project's git repository.
> >
> >  mike
> >
> >
> >
> >
> > On Thu, Jun 3, 2010 at 7:48 PM, Mike Schachter <mike at mindmech.com>
> wrote:
> >>
> >> Sounds like you're making great progress! I'll be working on the
> >> graph clustering algorithm for the skill set tonight and will keep
> >> you posted on how things are going.
> >>
> >>   mike
> >>
> >>
> >>
> >>
> >> On Thu, Jun 3, 2010 at 6:17 PM, Andreas von Hessling
> >> <vonhessling at gmail.com> wrote:
> >>>
> >>> Doing a few basic tricks, I catapulted the submission into the 50th
> >>> percentile.  That is not even running any ML algorithm.
> >>>
> >>> I'm planning on running the NaiveBayesUpdateable classifier
> >>> (http://weka.wikispaces.com/Classifying+large+datasets) over
> >>> discretized IQ/IQ strength/Chance/Chance strength from the command
> >>> line to evaluate performance.  Another attempt would be to load all
> >>> data into memory (<3GB, even for full Bridge Train) and run SVMlib
> >>> over it.
> >>>
> >>> If someone wants to try MOA
> >>> (http://www.cs.waikato.ac.nz/~abifet/MOA/index.html<http://www.cs.waikato.ac.nz/%7Eabifet/MOA/index.html>),
> this would be
> >>> helpful also in the long run (at least a tutorial how to set it up and
> >>> run).
> >>>
> >>> The reduced datasets plus the IQ values are linked on the wiki:
> Features
> >>> are:
> >>>   ...> row INT,
> >>>   ...> studentid VARCHAR(30),
> >>>   ...> problemhierarchy TEXT,
> >>>   ...> problemname TEXT,
> >>>   ...> problemview INT,
> >>>   ...> problemstepname TEXT,
> >>>   ...> cfa INT,
> >>>   ...> iq REAL
> >>>
> >>> IQ strength (number of attempts per student) should be available soon.
> >>>  (perhaps add'l features will become available as well)
> >>>
> >>> I'm still hoping somebody could cluster Erin's normalized skills data
> >>> and provide a row -> cluster id mapping for algebra and bridge train
> >>> and test sets (I don't have the data any more).
> >>>
> >>> Andy
> >>> _______________________________________________
> >>> ml mailing list
> >>> ml at lists.noisebridge.net
> >>> https://www.noisebridge.net/mailman/listinfo/ml
> >>
> >
> >
> > _______________________________________________
> > ml mailing list
> > ml at lists.noisebridge.net
> > https://www.noisebridge.net/mailman/listinfo/ml
> >
> >
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100605/c76e98e7/attachment.html>