[Noisebridge-discuss] yummy data ideas for machine learning night?
Geoff Schmidt
geoff at geoffschmidt.com
Mon Apr 13 01:27:43 UTC 2009
(Disclaimer: not sure I'll be at the class, but I'll chime in anyway..)
* The Netflix dataset meets all of your criteria, but you probably
thought of that. (netflixprize.com)
* Music recommendation always gets people excited. But IME attribute-
based models don't really work all that well, and attribute data is
not readily available (acoustic features suck for recommendation;
that's why Pandora uses expert listeners.. though if you can get your
hands on the commercial allmusicguide dump, it has some human-scored
attributes). And non-attribute models (where you use only people's
preference information/listening history/whatever) probably aren't
going to be a good fit for what I imagine your class is, because
they'll either look like collaborative filtering, or require a lot of
gymnastics to infer attributes onto the music. Still, music is fun,
and it's easy to get results that seem good... because in music, it's
easy to see good in recommendations.
Also: it's probably too big a topic for the class, but I've wanted to
play with reinforcement learning for a long time. The most famous
success of RL is backgammon:
http://www.cs.ualberta.ca/~sutton/book/ebook/node108.html
I thought that Tetris might be a fun thing to try with RL. It turns
out several people have looked into this; there are some citations here:
http://www.colinfahey.com/tetris/ApplyingReinforcementLearningToTetris_DonaldCarr_RU_AC_ZA.pdf
There's a Tetris harness in which you can implement your favorite
strategy:
http://www.ccs.neu.edu/home/punkball/tetris/modtetris.html
Also, Erik Demaine et al showed that Tetris is NP-complete.
http://erikdemaine.org/papers/Tetris_TR2002/
Anyway, I'm hosed this month, but time permitting I'd be up for a RL
study group later this year.
* geoff-schmidt
((email . geoff at geoffschmidt.com)
(twitter . immir))
More information about the Noisebridge-discuss
mailing list