[ml] ml Digest, Vol 14, Issue 2

Fri Dec 3 00:40:09 UTC 2010

Hey folks,

I was hoping to catch up with everybody about the Kaggle competition
at last night's meeting, but I had to leave before the presentations
were over.

I've been attempting to work with a subset of the 7M+ edges in the
total dataset for speed's sake. Has anybody else tried this? This
should work because the Kaggle dataset is a subgraph of Flickr's
social graph. Furthermore, we can verify edge predictions by checking
if they're in the Kaggle dataset.

I wrote a script to randomly grab 10k edges of the graph, which I used
to search for cliques and a non-zero correlation coefficient. This
yielded only trivial results, so I'm going to retry with 100k and 1M
edges. Randomly selecting nodes (instead of edges) could work better.
Eventually, I'll try applying other methods (PageRank, etc.) with
hopes that they'll yield results even with a subset of the total data.

Adam