[Noisebridge-discuss] Share your Twitter Firehose?

Sat Dec 11 06:07:05 UTC 2010

I wrote a simple proof-of-concept along the same lines a little while
ago that did a timing attack on loading twitter profile images to find
out if they were in a user's cache or not.  The next step would be to
do exactly what you're proposing on the social graph in order to
figure out who the user is based on the signature of who they seem to
be following.  How's your project coming?

P.S., hey John!  How's Twitter?  Pivotal is fun, but I miss the scene
sometimes. :)

On Sun, Nov 28, 2010 at 3:18 AM, Sai <sai at saizai.com> wrote:
> On Sat, Nov 27, 2010 at 2:55 PM, John Adams <jna at retina.net> wrote:
>> http://dev.twitter.com/pages/streaming_api
>
> I know. Hence "I could process it myself if needed". ;-)
>
> But I know that the kind of very simple analyzed data I want is
> already being done, so I'd prefer to use someone else's if possible
> rather than re-doing the same thing over again.
>
>> The social graph isn't directly available. You'll have to query each
>> user via the REST API for that, and it changes constantly.
>
> Can I query, say, a thousand users' friends at once?
>
>> For the tweets themselves, you'll be interested in the Site streams
>> feed, which we offer in a low-bandwidth, free mode called the
>> "spritzer." There's very little chance you could consume the full
>> firehose. We don't offer it to anyone except paid partners, and even
>> then it's bandwidth is in the 5-8 megabits/second range.
>
> Right - hence hoping someone who already had a processed feed would be
> willing to share. ;-)
>
>> Seeing that I also work in the security group here, I'm also
>> interested in what you think you may be able to do with the feed.
>
> It's an elaboration of my CSS Fingerprint site, which can run the CSS
> history hack on the order of a million URLs per minute in good
> browsers.
>
> For Twitter users running vulnerable browsers (i.e. basically
> everything except the Firefox 4 beta or with special plugins), I would
> do roughly the following:
>
> 1. Test whether they've visited the top million Twittered links
> (unshortened but non-normalized, "top" as in most tweeted in the last
> [browser link expiry period])
> 2. Test top links posted by people who posted their hits and their
> friends (other than those already tested)
> 3. Continue crawling the 'social links graph' until no more data gathered
> 4. Analyze hits (and misses) to figure out who the user is
> 5. Display probable user ID, demographic profile, known hits, etc
> 6. Display educational info re online privacy, non-vulnerable browsers, EFF, etc
>
> It's nothing all that new, really; just an extension of my more
> efficient history hack and iSecLab's work with social network group
> deanonymization, but crafted to make a clearer story (i.e. one where
> the implications are self-evident rather than implicit if you grok the
> vulnerabilities).
>
> Why Twitter? Simple: a) lots of links posted = lots more usable data
> for me; b) viral opportunity to make a media hit.
>
> - Sai
> _______________________________________________
> Noisebridge-discuss mailing list
> Noisebridge-discuss at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/noisebridge-discuss
>