[ml] [Continued] Collaborative workshop on Bayesian Spam Filters
Sam Tepper
sam.tepper at gmail.com
Thu Mar 6 23:29:28 UTC 2014
Hello MLers,
We spent some time last week putting together the data, splitting up the
tasks, and theorizing about Bayesian design. Tonight we'll continue,
and hopefully finish, the NB Bayesian spam filter. Come by if you're
interested in helping, or just curious and want to check it out.
Best,
Sam
-------- Original Message --------
Subject: Re: [ml] Tonight: Collaborative workshop on Bayesian Spam Filters
Date: Thu, 27 Feb 2014 23:05:08 -0800
From: Alexander Ko <aok1425 at gmail.com>
To: Sam Tepper <sam.tepper at gmail.com>
https://github.com/toshiakit/NaiveBayes
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=06.1-NaiveBayes-GenerativeLearningAlgorithms&speed=100
On Thu, Feb 27, 2014 at 10:10 AM, Sam Tepper <sam.tepper at gmail.com
<mailto:sam.tepper at gmail.com>> wrote:
Hi everyone,
Tonight I'm going to get a bunch of people together and try to put
together some working, personalized Bayesian spam filters we can use on
lists (eg, NB Discuss). This will be in part a continuation of my last
workshop, but I hope to structure the workshop as little as possible.
Instead, we'll have a sort of guided self-learning, where I'll be able
to provide the core of the Bayesian algorithm and ml implementation of
the filter, also answering any questions that come up, while you
(hopefully) put the pieces together, and discover what it all means in
practice and how to make it better.
Even if you weren't able to make it for the last class, you should be
able to follow along with the basic algorithms I give you. We'll work
together on all of this, so hopefully even if you don't finish your own
implementation, you'll at least have a good idea how to make a better
one from other people's ideas and feedback.
Beer, food, good cheer! It'll be as fun as a ML workshop can be!
Best,
Sam
_______________________________________________
ml mailing list
ml at lists.noisebridge.net <mailto:ml at lists.noisebridge.net>
https://www.noisebridge.net/mailman/listinfo/ml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20140306/f9bf8769/attachment-0002.html>
-------------- next part --------------
import statements
load emails
load email_dict or create email_dict
#parse email into list of words
prior=.5
prob_spam=prior
for word in email_word_list:
if word in email_dict:
return freq_spam,freq_nospam
#P(W|S)
word_given_spam=freq_spam/(total_spam_emails(*avg_words_per_spam_email))
spam_given_word=freq_spam/(total_spam_emails(*avg_words_per_spam_email))*prob_spam/(freq_spam/(total_spam_emails(*avg_words_per_spam_emails))*prob_spam+freq_nospam/(total_nospam_emails(*avg_words_per_nospam_email))*(1-prob_spam))
else:
#naive: P(W)
More information about the ml
mailing list