[ml] Hadoop going forward

Vikram Oberoi voberoi at gmail.com
Thu May 20 06:38:51 UTC 2010


Hey folks,

For those of you that came out tonight, I hope the code I walked through and
initial (albeit rough) overview of MapReduce helped. If you guys have any
questions or requests, the best way to ask would be to:

a) direct an email to me over ml at lists.noisebridge.net or...
b) open an issue at the Github project:
http://github.com/voberoi/hadoop-mrutils

Both of these ways someone else might be able to answer first and everyone
will benefit from the answer, as there's a high probability that everyone
will have the same questions.

 For next week, I'm going to write a script that transforms the KDD dataset
in... some useful way. Your guys' input on what exactly I should do here is
most welcome. The transformation should be involved enough that the code can
serve as an example for scripts you all might implement later.

I'll also be taking a look at Apache Mahout (a library containing Hadoop
MapReduce implementations of numerous machine learning algorithms) and
writing up an example of how to use it. If you have a particular algorithm
that you want to apply to the dataset, check if it's in the Mahout library
and let me know.

Finally, is any brainstorming/discussion about what we're doing happening
anywhere other than the meetups? I'd be happy to meet again some time before
next Wednesday to hash out some ideas and run with them, as in-person
conversation bandwidth is *so* much higher. Alternately, we could throw out
ideas on the list and brainstorm over email threads. It doesn't seem like
there's a whole lot of action on the wiki other than links to resources and
TODOs. Or is there?

Vikram
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100519/393d11ca/attachment.html>


More information about the ml mailing list