[Noisebridge-discuss] yummy data ideas for machine learning night?

Mon Apr 13 01:27:43 UTC 2009

(Disclaimer: not sure I'll be at the class, but I'll chime in anyway..)

* The Netflix dataset meets all of your criteria, but you probably  
thought of that. (netflixprize.com)

* Music recommendation always gets people excited. But IME attribute- 
based models don't really work all that well, and attribute data is  
not readily available (acoustic features suck for recommendation;  
that's why Pandora uses expert listeners.. though if you can get your  
hands on the commercial allmusicguide dump, it has some human-scored  
attributes). And non-attribute models (where you use only people's  
preference information/listening history/whatever) probably aren't  
going to be a good fit for what I imagine your class is, because  
they'll either look like collaborative filtering, or require a lot of  
gymnastics to infer attributes onto the music. Still, music is fun,  
and it's easy to get results that seem good... because in music, it's  
easy to see good in recommendations.

Also: it's probably too big a topic for the class, but I've wanted to  
play with reinforcement learning for a long time. The most famous  
success of RL is backgammon:
http://www.cs.ualberta.ca/~sutton/book/ebook/node108.html

I thought that Tetris might be a fun thing to try with RL. It turns  
out several people have looked into this; there are some citations here:
http://www.colinfahey.com/tetris/ApplyingReinforcementLearningToTetris_DonaldCarr_RU_AC_ZA.pdf

There's a Tetris harness in which you can implement your favorite  
strategy:
http://www.ccs.neu.edu/home/punkball/tetris/modtetris.html

Also, Erik Demaine et al showed that Tetris is NP-complete.
http://erikdemaine.org/papers/Tetris_TR2002/

Anyway, I'm hosed this month, but time permitting I'd be up for a RL  
study group later this year.

* geoff-schmidt
((email . geoff at geoffschmidt.com)
(twitter . immir))