[ml] drama prediction - training set
Full Name
imsoexcitd at excite.com
Fri Jun 1 01:45:02 UTC 2012
Hey,
I am planning on coming to the space tonight, is anyone else planning on coming in? I'd like to talk about creating a training set from the mbox file so we can create a drama prediction model. We can consider all sorts of interesting features, but at the bare minimum, we should create a large spare matrix of wordcounts for all (or a subset) of the words contained in either the message body, subject line or both. Secondly, we need develop a protocol for labeling each message as drama or not-drama. I don't know how diligently the [DRAMA] tag was applied to drama messages, but we can start there, and possibly also mark any messages that contain the word drama as "drama."
Anyone want to work on creating the training set?
-Erin
More information about the ml
mailing list