[Noisebridge-discuss] I'm working on a series of articles about algorithms - data mining experts sought asap

Sat Jan 19 23:50:08 UTC 2013

Hi Noisebridge.

I'm an SF-based tech geek and occasional writer who is working on a series of articles for Fast Company (http://fastcompany.com/) about the algorithms selected in the 2007 paper "Top 10 algorithms in data mining" (which is locked behind an Springer-Verlag paywall, apropos of other more important matters going on right now). Citation, abstract, and a list of the algorithms are pasted at the end of this note; contact me if you need a copy of the paper, of course.

Our primary goal is to give Fast Company's readers a quick primer on each of the algorithms in a way that is accessible to a business/non-expert audience.

The articles will be brief, a few hundred words. l'm looking for data mining / domain experts who  are particularly adept at explaining complicated math and science in layman's terms. Interviews can take place in person in SF, via e-mail, IM, Skype, or phone, whatever is most convenient.

To elaborate, here is what I'm in search of for each algorithm:
	- "What it does" in plain language, maybe with a simple example
	- How the algorithm changed "everyday" practice when it emerged, or what it enabled that wasn't possible before
	- Pointers to companies, services, or even /types/ of services where the algorithm is likely in operation today

We can work by e-mail, phone, or Skype - whatever is most convenient.

Please feel free to forward this note to folks whom you believe may be a good fit for this project. 

An ideal person would have expert knowledge of one or more of these algorithms, a talent for explaining really technical stuff to regular people, 20 minutes to spare for an interview, and an interest in possibly being quoted in a Fast Company article. ;) I know I've seen people like that around Noisebridge - lots of them - I'm just not sure who's available in the next week or two.

Thanks!
- jim
jim at agentzero.com

----

Knowl Inf Syst (2008) 14:1–37 DOI 10.1007/s10115-007-0114-2
SURVEY PAPER
Top 10 algorithms in data mining
Xindong Wu · Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng · Bing Liu · Philip S. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg
Received: 9 July 2007 / Revised: 28 September 2007 / Accepted: 8 October 2007 Published online: 4 December 2007
© Springer-Verlag London Limited 2007
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

ALGORITHMS
1 C4.5 and beyond
2 The k-means algorithm
3 Support vector machines
4 The Apriori algorithm
5 The EM algorithm
6 PageRank
7 AdaBoost
8 kNN: k-nearest neighbor classification
9 Naive Bayes
10 CART