[Noisebridge-discuss] Share your Twitter Firehose?

Sun Nov 28 11:18:08 UTC 2010

On Sat, Nov 27, 2010 at 2:55 PM, John Adams <jna at retina.net> wrote:
> http://dev.twitter.com/pages/streaming_api

I know. Hence "I could process it myself if needed". ;-)

But I know that the kind of very simple analyzed data I want is
already being done, so I'd prefer to use someone else's if possible
rather than re-doing the same thing over again.

> The social graph isn't directly available. You'll have to query each
> user via the REST API for that, and it changes constantly.

Can I query, say, a thousand users' friends at once?

> For the tweets themselves, you'll be interested in the Site streams
> feed, which we offer in a low-bandwidth, free mode called the
> "spritzer." There's very little chance you could consume the full
> firehose. We don't offer it to anyone except paid partners, and even
> then it's bandwidth is in the 5-8 megabits/second range.

Right - hence hoping someone who already had a processed feed would be
willing to share. ;-)

> Seeing that I also work in the security group here, I'm also
> interested in what you think you may be able to do with the feed.

It's an elaboration of my CSS Fingerprint site, which can run the CSS
history hack on the order of a million URLs per minute in good
browsers.

For Twitter users running vulnerable browsers (i.e. basically
everything except the Firefox 4 beta or with special plugins), I would
do roughly the following:

1. Test whether they've visited the top million Twittered links
(unshortened but non-normalized, "top" as in most tweeted in the last
[browser link expiry period])
2. Test top links posted by people who posted their hits and their
friends (other than those already tested)
3. Continue crawling the 'social links graph' until no more data gathered
4. Analyze hits (and misses) to figure out who the user is
5. Display probable user ID, demographic profile, known hits, etc
6. Display educational info re online privacy, non-vulnerable browsers, EFF, etc

It's nothing all that new, really; just an extension of my more
efficient history hack and iSecLab's work with social network group
deanonymization, but crafted to make a clearer story (i.e. one where
the implications are self-evident rather than implicit if you grok the
vulnerabilities).

Why Twitter? Simple: a) lots of links posted = lots more usable data
for me; b) viral opportunity to make a media hit.

- Sai