[ml] clustering, weka
Andreas von Hessling
vonhessling at gmail.com
Wed May 26 03:51:37 UTC 2010
Mike,
it would be great if you could apply the clustering not to the raw
datasets (which contain a lot of meaningless information), but to the
orthogonalized dataset that Erin & Theo provided (where the
skill/opportunity columns are split up into many features. Erin/Theo
should have the latest version of these datasets. If these challenge
datasets are too big for Weka, I suggest sampling some records -- I
believe Thomas has some code for this.
We *will* need to cluster the skills at some point to make use of the
orthogonalized datasets.
Looking forward to your results.
Andy
On Tue, May 25, 2010 at 8:24 PM, Mike Schachter <mike at mindmech.com> wrote:
> Hey everyone,
>
> Been super busy since last week's meeting, but started
> reading up on k-Means clustering and expecation-maximization,
> in the hopes that I can use one of these techniques to start
> clustering the KDD data.
>
> Tonight I'm finally getting around to using Weka's built-in
> clustering to see if it works with the KDD data:
>
> http://weka.wikispaces.com/Using+cluster+algorithms
>
> Can't promise anything in terms of results, but tomorrow I'd
> be happy to give a (very) brief overview of k-means clustering
> and expectation maximization, and hopefully some preliminary
> results with a subset of the KDD data.
>
> Perhaps some of us could work together to implement a clustering algorithm
> in map-reduce form to work on an elastic map reduce cluster! Looking
> forward to seeing everyone tomorrow,
>
> mike
>
>
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
>
More information about the ml
mailing list