[Noisebridge-discuss] city of oakland internal emails dump on DocumentCloud

Nicholas Granado ngranado at gmail.com
Mon Mar 5 22:35:18 UTC 2012


you can run the following code to download them all ....

#!/usr/bin/python
import os
import sys
import socket
import urllib
import urllib2

def download_image_url(url):
request = urllib2.Request(url)
opener = urllib2.build_opener(urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0))
 handle = opener.open(request)
payload = handle.read()
filename = url.split('/')[6]
 image_filename = "./data/%s" % (filename)
fh = open(image_filename, 'w')
 fh.write(payload)
fh.close()
print "%s" % (filename)

def main():
for i in range(1, 2184):
url = "
http://s3.documentcloud.org/documents/320449/pages/oakland-city-official-emails-10-11-2011-to-11-13-p%d-normal.gif"
% (i)
 download_image_url(url)

if __name__ == "__main__":
main()

nick



On Mon, Mar 5, 2012 at 2:21 PM, Nicholas Granado <ngranado at gmail.com> wrote:

> they are gif files. the file format is ....
>
>
> http://s3.documentcloud.org/documents/320449/pages/oakland-city-official-emails-10-11-2011-to-11-13-p#-normal.gif
>
> so for example if i wanted page 54
>
>
> http://s3.documentcloud.org/documents/320449/pages/oakland-city-official-emails-10-11-2011-to-11-13-p54-normal.gif
>
> cheers,
> nick
>
>
>
>
> On Mon, Mar 5, 2012 at 2:18 PM, Jake <jake at spaz.org> wrote:
>
>> does anyone know how to download the entire 2183 pages?
>> I couldn't find a download button :)
>>
>> http://www.mercurynews.com/documents/ci_20040081
>> _______________________________________________
>> Noisebridge-discuss mailing list
>> Noisebridge-discuss at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/noisebridge-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.noisebridge.net/pipermail/noisebridge-discuss/attachments/20120305/4adbd0b9/attachment-0003.html>


More information about the Noisebridge-discuss mailing list