Erin, Theo, any way I could get ahold of a subset of the orthogonalized<br>dataset before tomorrow's meeting?<br><br>  mike<br><br><br><div class="gmail_quote">On Tue, May 25, 2010 at 8:51 PM, Andreas von Hessling <span dir="ltr"><<a href="mailto:vonhessling@gmail.com">vonhessling@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Mike,<br>

<br>

it would be great if you could apply the clustering not to the raw<br>

datasets (which contain a lot of meaningless information), but to the<br>

orthogonalized dataset that Erin & Theo provided (where the<br>

skill/opportunity columns are split up into many features.  Erin/Theo<br>

should have the latest version of these datasets.  If these challenge<br>

datasets are too big for Weka, I suggest sampling some records -- I<br>

believe Thomas has some code for this.<br>

<br>

We *will* need to cluster the skills at some point to make use of the<br>

orthogonalized datasets.<br>

<br>

Looking forward to your results.<br>

<br>

Andy<br>

<div><div></div><div class="h5"><br>

On Tue, May 25, 2010 at 8:24 PM, Mike Schachter <<a href="mailto:mike@mindmech.com">mike@mindmech.com</a>> wrote:<br>

> Hey everyone,<br>

><br>

> Been super busy since last week's meeting, but started<br>

> reading up on k-Means clustering and expecation-maximization,<br>

> in the hopes that I can use one of these techniques to start<br>

> clustering the KDD data.<br>

><br>

> Tonight I'm finally getting around to using Weka's built-in<br>

> clustering to see if it works with the KDD data:<br>

><br>

> <a href="http://weka.wikispaces.com/Using+cluster+algorithms" target="_blank">http://weka.wikispaces.com/Using+cluster+algorithms</a><br>

><br>

> Can't promise anything in terms of results, but tomorrow I'd<br>

> be happy to give a (very) brief overview of k-means clustering<br>

> and expectation maximization, and hopefully some preliminary<br>

> results with a subset of the KDD data.<br>

><br>

> Perhaps some of us could work together to implement a clustering algorithm<br>

> in map-reduce form to work on an elastic map reduce cluster! Looking<br>

> forward to seeing everyone tomorrow,<br>

><br>

>   mike<br>

><br>

><br>

</div></div>> _______________________________________________<br>

> ml mailing list<br>

> <a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>

> <a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>

><br>

><br>

</blockquote></div><br>