All,<br><br>I've been trying to use MOA to generate a classifier...and while I seem to be able to do that, I'm having trouble getting it to actually output classifications for new examples, so thought I'd share my current status and see if anyone can help.<br>

<br>You can download the stream test and train files from <a href="http://thomaslotze.com/kdd/streams.tgz">http://thomaslotze.com/kdd/streams.tgz</a><br>You can also download the jarfiles needed for MOA at <a href="http://thomaslotze.com/kdd/jarfiles.tgz">http://thomaslotze.com/kdd/jarfiles.tgz</a><br>

<br>Unpack these all into the same directory.  Then, in that directory, using the following command, you can create a MOA classifier:<br>java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "LearnModel -l MajorityClass -s (ArffFileStream -f atrain.arff -c -1) -O amodel.moa"<br>

<br>You can also summarize the test arff file using the following command:<br>java -cp .:moa.jar:weka.jar weka.core.Instances atest.arff<br><br>But I cannot find a command for MOA which will input the amodel.moa model and generate predicted classes for atest.arff.  The closest I've come is the following, which runs amodel.moa on the atest.arff, and must be predicting classes and comparing, because it declares how many it got correct:<br>

java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluateModel -m file:amodel.moa -s (ArffFileStream -f atest.arff -c -1)"<br><br>So if anyone can figure it out (I've been using <a href="http://www.cs.waikato.ac.nz/~abifet/MOA/Manual.pdf">http://www.cs.waikato.ac.nz/~abifet/MOA/Manual.pdf</a> as a guide), I could certainly use some help with this step.<br>

<br>Cheers,<br>Thomas<br><br>P.S. If you'd like to get the SQL loaded yourself, you can download 

joined_tables.sql.gz (which was created using get_output.sh).  I then 

used run_moa.sh to create the .arff files and try to run MOA.<br><br><div class="gmail_quote">On Sat, Jun 5, 2010 at 2:05 PM, Andreas von Hessling <span dir="ltr"><<a href="mailto:vonhessling@gmail.com">vonhessling@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Mike,<br>

We're working on getting the test dataset orthogonalized.  Stay tuned.<br>

Andy<br>

<div><div></div><div class="h5"><br>

<br>

On Sat, Jun 5, 2010 at 1:55 PM, Mike Schachter <<a href="mailto:mike@mindmech.com">mike@mindmech.com</a>> wrote:<br>

> Hey Andy, the input to the classifier I'm trying to produce is<br>

> the orthogonalized dataset - i.e. the list of 1000+ columns where<br>

> each column has the value of the opportunity for that skill. The<br>

> dataset was produced by Erin and is is broken into several parts,<br>

> for the algebra dataset this looks like:<br>

><br>

> algebra-output_partaa<br>

> algebra-output_partab<br>

> ..<br>

> algebra-output_partah<br>

><br>

><br>

> You're going to have to orthogonalize the test datasets, which<br>

> I don't have a copy of. Erin - are you around? Maybe she can help<br>

> you convert the test datasets?<br>

><br>

>   mike<br>

><br>

><br>

><br>

><br>

> On Sat, Jun 5, 2010 at 11:40 AM, Andreas von Hessling<br>

> <<a href="mailto:vonhessling@gmail.com">vonhessling@gmail.com</a>> wrote:<br>

>><br>

>> Sweet, Mike.  Please note that we need the row -> clusterid mapping<br>

>> for both training AND testing sets.  Otherwise it will not help the ML<br>

>> algorithms.<br>

>> If I understand correctly, your input are the orthogonalized skills.<br>

>> So far, the girls only provided these orthogonalizations for the<br>

>> training files.  I'm computing them for the test sets so you can use<br>

>> them.  If I don't understand this assumption correctly, please let me<br>

>> know so I can use my CPU's cycles for other tasks.<br>

>><br>

>> Ideally you can provide these cluster mappings by about Sunday, which<br>

>> is when I want to start running classifiers.  I will need some time to<br>

>> actually run the ML algorithms.<br>

>><br>

>> I have now IQ and IQ strength feature values for all datasets and am<br>

>> hoping time permits to compute chance and chance strength values for<br>

>> rows.<br>

>> Computing # of skills required should not be difficult and I will add<br>

>> this feature as well.  I plan on sharing my datasets as new versions<br>

>> become available.<br>

>><br>

>> Andy<br>

>><br>

>><br>

>><br>

>><br>

>> On Fri, Jun 4, 2010 at 1:42 PM, Mike Schachter <<a href="mailto:mike@mindmech.com">mike@mindmech.com</a>> wrote:<br>

>> > So it's taking about 9 hours to create a graph from a 4.4GB file, I'm<br>

>> > going to work on improving the code to make it a bit faster, and also<br>

>> > am investigating a MapReduce solution.<br>

>> ><br>

>> > Basically the clustering process can be broken down into two stages:<br>

>> ><br>

>> > 1) Construct the graph, apply the clustering algorithm to break graph<br>

>> > into<br>

>> > clusters<br>

>> > 2) Apply the clustered graph to the data again to classify each skill<br>

>> > set<br>

>> ><br>

>> > I'll keep working on it and let everyone know how things are going with<br>

>> > it,<br>

>> > as I mentioned in another email, the source code is in our new<br>

>> > sourceforge<br>

>> > project's git repository.<br>

>> ><br>

>> >  mike<br>

>> ><br>

>> ><br>

>> ><br>

>> ><br>

>> > On Thu, Jun 3, 2010 at 7:48 PM, Mike Schachter <<a href="mailto:mike@mindmech.com">mike@mindmech.com</a>><br>

>> > wrote:<br>

>> >><br>

>> >> Sounds like you're making great progress! I'll be working on the<br>

>> >> graph clustering algorithm for the skill set tonight and will keep<br>

>> >> you posted on how things are going.<br>

>> >><br>

>> >>   mike<br>

>> >><br>

>> >><br>

>> >><br>

>> >><br>

>> >> On Thu, Jun 3, 2010 at 6:17 PM, Andreas von Hessling<br>

>> >> <<a href="mailto:vonhessling@gmail.com">vonhessling@gmail.com</a>> wrote:<br>

>> >>><br>

>> >>> Doing a few basic tricks, I catapulted the submission into the 50th<br>

>> >>> percentile.  That is not even running any ML algorithm.<br>

>> >>><br>

>> >>> I'm planning on running the NaiveBayesUpdateable classifier<br>

>> >>> (<a href="http://weka.wikispaces.com/Classifying+large+datasets" target="_blank">http://weka.wikispaces.com/Classifying+large+datasets</a>) over<br>

>> >>> discretized IQ/IQ strength/Chance/Chance strength from the command<br>

>> >>> line to evaluate performance.  Another attempt would be to load all<br>

>> >>> data into memory (<3GB, even for full Bridge Train) and run SVMlib<br>

>> >>> over it.<br>

>> >>><br>

>> >>> If someone wants to try MOA<br>

>> >>> (<a href="http://www.cs.waikato.ac.nz/%7Eabifet/MOA/index.html" target="_blank">http://www.cs.waikato.ac.nz/~abifet/MOA/index.html</a>), this would be<br>

>> >>> helpful also in the long run (at least a tutorial how to set it up and<br>

>> >>> run).<br>

>> >>><br>

>> >>> The reduced datasets plus the IQ values are linked on the wiki:<br>

>> >>> Features<br>

>> >>> are:<br>

>> >>>   ...> row INT,<br>

>> >>>   ...> studentid VARCHAR(30),<br>

>> >>>   ...> problemhierarchy TEXT,<br>

>> >>>   ...> problemname TEXT,<br>

>> >>>   ...> problemview INT,<br>

>> >>>   ...> problemstepname TEXT,<br>

>> >>>   ...> cfa INT,<br>

>> >>>   ...> iq REAL<br>

>> >>><br>

>> >>> IQ strength (number of attempts per student) should be available soon.<br>

>> >>>  (perhaps add'l features will become available as well)<br>

>> >>><br>

>> >>> I'm still hoping somebody could cluster Erin's normalized skills data<br>

>> >>> and provide a row -> cluster id mapping for algebra and bridge train<br>

>> >>> and test sets (I don't have the data any more).<br>

>> >>><br>

>> >>> Andy<br>

>> >>> _______________________________________________<br>

>> >>> ml mailing list<br>

>> >>> <a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>

>> >>> <a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>

>> >><br>

>> ><br>

>> ><br>

>> > _______________________________________________<br>

>> > ml mailing list<br>

>> > <a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>

>> > <a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>

>> ><br>

>> ><br>

><br>

><br>

_______________________________________________<br>

ml mailing list<br>

<a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>

<a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>

</div></div></blockquote></div><br>