[ml] This week: KDD, next week: Hadoop!

Wed May 19 23:25:45 UTC 2010

Hi all,

For the discussion tonight it will be helpful if everybody could read
through the KDD data format;  It's fairly technical and is not
trivial, so instead of spending time to re-hash it during the meeting
it would be great if we could all be on the same page.

https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp

Deadline for the challenge is June 8th, so we need to move fast if we
are to submit an entry.

Looking forward to tonight.

On Tue, May 18, 2010 at 8:52 AM, Andreas von Hessling
<vonhessling at gmail.com> wrote:
> Mike,
> we haven't actually gotten far in running algorithms so far.  To this
> point you're the only one working on dimensionality reduction.  I say
> go for it; knock yourself out.  It will be good just to get a sense
> where we should focus our energy.
>
> BTW I'll put up a description of how to set up Weka with this dataset
> soon.  There's some NN algorithms right in there...
>
> Andy
>
>
>
>
> On Mon, May 17, 2010 at 9:31 PM, Mike Schachter <mike at mindmech.com> wrote:
>> Hey everyone!
>>
>> Just got back the other day and looking forward to meeting up Wednesday
>> and hearing about Hadoop. I just read a bit through the KDD challenge, and
>> was wondering if I could help out by doing something involving neural nets?
>>
>> Neural nets can be made good at generalization and prediction, and also
>> reducing problem dimensionality by clustering. For example, we could
>> cluster the input records into groups, and pass that group data into an SVM
>> or something. Or we could use some sort of dimensionality reducing network
>> and pass the dimensionally-reduced dataset to a bayesian learner (which
>> wouldn't work well if the data was high dimensional).
>>
>> If someone was already thinking of doing this I'd be happy to help out,
>> can't
>> glean much of what happened from the meeting notes.
>>
>> See you Wednesday!
>>
>>   mike
>>
>>
>>
>> On Wed, May 12, 2010 at 10:05 PM, Thomas Lotze <thomas.lotze at gmail.com>
>> wrote:
>>>
>>> Hello, all!  There was a good meeting today where we talked about the KDD
>>> dataset and plans for the next steps.  I think it'll be a really good
>>> opportunity for learning new tools and methods on machine learning, trading
>>> knowledge and upping our collective ability!  We've got plans to look at R,
>>> libsvm, weka, and Hadoop to tackle the problem.  I'm excited about working
>>> with it, and anyone else who wants to get involved should email me, download
>>> the data, and take a look at the wiki page I've put our initial plans in:
>>>
>>> https://www.noisebridge.net/wiki/KDD_Competition_2010
>>>
>>>
>>> Next week, Vikarem will be presenting Hadoop, with some scripts and tools
>>> to actually use it -- I think we're all aware of how important Hadoop
>>> already is and will continue to be in the future for analyzing large data
>>> sets, so I'm really glad that we've now got someone who knows about it and
>>> is willing to tell us more!  I think this is a really great opportunity, and
>>> many thanks to Vikarem for presenting!
>>>
>>>
>>> Best wishes,
>>> Thomas
>>>
>>> _______________________________________________
>>> ml mailing list
>>> ml at lists.noisebridge.net
>>> https://www.noisebridge.net/mailman/listinfo/ml
>>>
>>
>>
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>>
>