So it's taking about 9 hours to create a graph from a 4.4GB file, I'm<br>going to work on improving the code to make it a bit faster, and also<br>am investigating a MapReduce solution.<br><br>Basically the clustering process can be broken down into two stages:<br>
<br>1) Construct the graph, apply the clustering algorithm to break graph into clusters<br>2) Apply the clustered graph to the data again to classify each skill set<br><br>I'll keep working on it and let everyone know how things are going with it,<br>
as I mentioned in another email, the source code is in our new sourceforge<br>project's git repository.<br><br> mike<br><br><br><br><br><div class="gmail_quote">On Thu, Jun 3, 2010 at 7:48 PM, Mike Schachter <span dir="ltr"><<a href="mailto:mike@mindmech.com">mike@mindmech.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>Sounds like you're making great progress! I'll be working on the<br>graph clustering algorithm for the skill set tonight and will keep<br>
you posted on how things are going.<br><font color="#888888"><br> mike</font><div><div></div><div class="h5"><br><br><br><br><br><div class="gmail_quote">
On Thu, Jun 3, 2010 at 6:17 PM, Andreas von Hessling <span dir="ltr"><<a href="mailto:vonhessling@gmail.com" target="_blank">vonhessling@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Doing a few basic tricks, I catapulted the submission into the 50th<br>
percentile. That is not even running any ML algorithm.<br>
<br>
I'm planning on running the NaiveBayesUpdateable classifier<br>
(<a href="http://weka.wikispaces.com/Classifying+large+datasets" target="_blank">http://weka.wikispaces.com/Classifying+large+datasets</a>) over<br>
discretized IQ/IQ strength/Chance/Chance strength from the command<br>
line to evaluate performance. Another attempt would be to load all<br>
data into memory (<3GB, even for full Bridge Train) and run SVMlib<br>
over it.<br>
<br>
If someone wants to try MOA<br>
(<a href="http://www.cs.waikato.ac.nz/%7Eabifet/MOA/index.html" target="_blank">http://www.cs.waikato.ac.nz/~abifet/MOA/index.html</a>), this would be<br>
helpful also in the long run (at least a tutorial how to set it up and<br>
run).<br>
<br>
The reduced datasets plus the IQ values are linked on the wiki: Features are:<br>
...> row INT,<br>
...> studentid VARCHAR(30),<br>
...> problemhierarchy TEXT,<br>
...> problemname TEXT,<br>
...> problemview INT,<br>
...> problemstepname TEXT,<br>
...> cfa INT,<br>
...> iq REAL<br>
<br>
IQ strength (number of attempts per student) should be available soon.<br>
(perhaps add'l features will become available as well)<br>
<br>
I'm still hoping somebody could cluster Erin's normalized skills data<br>
and provide a row -> cluster id mapping for algebra and bridge train<br>
and test sets (I don't have the data any more).<br>
<br>
Andy<br>
_______________________________________________<br>
ml mailing list<br>
<a href="mailto:ml@lists.noisebridge.net" target="_blank">ml@lists.noisebridge.net</a><br>
<a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>
</blockquote></div><br>
</div></div></blockquote></div><br>