[ml] Hadoop going forward

Thu May 20 18:56:35 UTC 2010

Vikram,

>From my perspective you could contribute the most in setting up a
Hadoop + Mahout infrastructure and documenting the setup process and
the hello-world mapreduce program etc.  While we went through this
yesterday (thanks) I feel like people will actually get to DO the
things they learned later; so a written reference (new wiki page)
would be great, because these questions will be asked over and over.
Even better, and this is just an idea:  can we set up a shared AWS
account so each of us doesnt have to install everything by himself?  I
know there's the question of who pays for it, but that aside, are
there technical restrictions why we could not share an account?  One
approach would be each of us throws in $10, or perhaps theres a way to
split the bills between us according to usage, or, even better we
could push Noisebridge Inc to give us some allowance.  Getting a
turnkey cloud Mahout infrastructure for Noisebridge would be H-U-G-E,
even if it would not be ready in time for KDD submission.  Feel free
to take the lead on that initiative.  You would go down in the history
books of NB as a hero :-)

Erin and Mike are already working on transforming the data, so I think
we have already lots of manpower on that end.

Let's tentatively plan this Sunday night to get together again.  Erin
also mentioned she'd like to meet again before the next Wednesday.  I
can give an impromptu talk about classifiers/machine learning problem
setups.
Will confirm.

Andy

On Wed, May 19, 2010 at 11:38 PM, Vikram Oberoi <voberoi at gmail.com> wrote:
> Hey folks,
> For those of you that came out tonight, I hope the code I walked through and
> initial (albeit rough) overview of MapReduce helped. If you guys have any
> questions or requests, the best way to ask would be to:
> a) direct an email to me over ml at lists.noisebridge.net or...
> b) open an issue at the Github
> project: http://github.com/voberoi/hadoop-mrutils
> Both of these ways someone else might be able to answer first and everyone
> will benefit from the answer, as there's a high probability that everyone
> will have the same questions.
> For next week, I'm going to write a script that transforms the KDD dataset
> in... some useful way. Your guys' input on what exactly I should do here is
> most welcome. The transformation should be involved enough that the code can
> serve as an example for scripts you all might implement later.
> I'll also be taking a look at Apache Mahout (a library containing Hadoop
> MapReduce implementations of numerous machine learning algorithms) and
> writing up an example of how to use it. If you have a particular algorithm
> that you want to apply to the dataset, check if it's in the Mahout library
> and let me know.
> Finally, is any brainstorming/discussion about what we're doing happening
> anywhere other than the meetups? I'd be happy to meet again some time before
> next Wednesday to hash out some ideas and run with them, as in-person
> conversation bandwidth is *so* much higher. Alternately, we could throw out
> ideas on the list and brainstorm over email threads. It doesn't seem like
> there's a whole lot of action on the wiki other than links to resources and
> TODOs. Or is there?
> Vikram
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
>