[ml] [Continued] Collaborative workshop on Bayesian Spam Filters

Thu Mar 6 23:29:28 UTC 2014

Hello MLers,

We spent some time last week putting together the data, splitting up the
tasks, and theorizing about Bayesian design.  Tonight we'll continue,
and hopefully finish, the NB Bayesian spam filter.  Come by if you're
interested in helping, or just curious and want to check it out.

Best,
Sam

-------- Original Message --------
Subject: 	Re: [ml] Tonight: Collaborative workshop on Bayesian Spam Filters
Date: 	Thu, 27 Feb 2014 23:05:08 -0800
From: 	Alexander Ko <aok1425 at gmail.com>
To: 	Sam Tepper <sam.tepper at gmail.com>

https://github.com/toshiakit/NaiveBayes
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=06.1-NaiveBayes-GenerativeLearningAlgorithms&speed=100

On Thu, Feb 27, 2014 at 10:10 AM, Sam Tepper <sam.tepper at gmail.com
<mailto:sam.tepper at gmail.com>> wrote:

    Hi everyone,

    Tonight I'm going to get a bunch of people together and try to put
    together some working, personalized Bayesian spam filters we can use on
    lists (eg, NB Discuss).  This will be in part a continuation of my last
    workshop, but I hope to structure the workshop as little as possible.

    Instead, we'll have a sort of guided self-learning, where I'll be able
    to provide the core of the Bayesian algorithm and ml implementation of
    the filter, also answering any questions that come up, while you
    (hopefully) put the pieces together, and discover what it all means in
    practice and how to make it better.

    Even if you weren't able to make it for the last class, you should be
    able to follow along with the basic algorithms I give you.  We'll work
    together on all of this, so hopefully even if you don't finish your own
    implementation, you'll at least have a good idea how to make a better
    one from other people's ideas and feedback.

    Beer, food, good cheer!  It'll be as fun as a ML workshop can be!

    Best,
    Sam
    _______________________________________________
    ml mailing list
    ml at lists.noisebridge.net <mailto:ml at lists.noisebridge.net>
    https://www.noisebridge.net/mailman/listinfo/ml

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20140306/f9bf8769/attachment-0002.html>
-------------- next part --------------
import statements

load emails

load email_dict or create email_dict

#parse email into list of words

prior=.5
prob_spam=prior

for word in email_word_list:
	if word in email_dict:
		return freq_spam,freq_nospam

		#P(W|S)
		word_given_spam=freq_spam/(total_spam_emails(*avg_words_per_spam_email))
		spam_given_word=freq_spam/(total_spam_emails(*avg_words_per_spam_email))*prob_spam/(freq_spam/(total_spam_emails(*avg_words_per_spam_emails))*prob_spam+freq_nospam/(total_nospam_emails(*avg_words_per_nospam_email))*(1-prob_spam))

	else:
		#naive: P(W)