【Python】Download Flickr images within 50-mile radius of a given latlon

1 mile = 1.609344 kilometres

The Flickr API will disable your key if you query too rapidly, so it makes sense to do large queries which return hundreds of results.
The image download code is written in Matlab. It accesses images on Flickr’s http server instead of going through the API, and thus doesn’t require an API key.

Downloading FAQ
The image download code is written in Matlab. It accesses images on Flickr’s http server instead of going through the API, and thus doesn’t require an API key. It reads the text files produced by get_imgs_geo_gps_search.py, downloads the photo, and saves all of the image attributes (tags, interestingness, long/lat, etc…) as a matlab cell string array in the comment field of each jpg. Use imfinfo() to read them later.
a) What size images will this get?
Currently the code will try and find the Flickr “Large” size photo, which has max width or height of 1024. Failing that it will try to get the “Original” size photo. If the “Original” is larger than 1024 height/width it will be downsampled to 1024. If it is smaller than 500 height or width it will be thrown away. Otherwise the image will be kept.
A significant fraction of images are too small by this criteria and thus are thrown away. An alternative strategy would be to download only the default size images, which will always be available although somewhat small.
b) How are the images written to disk?
Since most file systems have trouble with thousands of files in a directory, the images are put into a hierarchy of directories that contain no more than 1000 images each. The hierarchy is

base_db_path / keyword / numbered subdir / img_name

for example

Flickr_gps/Argentina/00015/315157387_c36ba74681_100_23812473@N00.jpg

The image filenames contain the photo id, secret, server id, and owner which can be used to trace the .jpg back to its source on Flickr. See the source code for examples of how the URLs are constructed.
c) Can I run the download script in parallel?
Yes, I’ve run 15 copies in parallel in the past. I wouldn’t recommend doing any more than this because Flickr could get mad at us. They’re aware that researchers are using Flickr as a data source but their main concern is that we don’t impact the quality of service for the millions of people who use Flickr.
To run multiple scripts in parallel you’ll need to split up the text files from the query process manually, then change the path in downloadphotos_int.m for each call.
d) What about copyrights?
It is worth noting that Flickr allows photographers to specify Creative Commons licenses for their images instead of the default “all rights reserved”. This script saves the license info with each .jpg file, so you can pick out Creative Commons images after the fact (in my experience it’s less than 10% of images) It is also possible to restrict the search to images with certain licenses at query time. See the Flickr API for details.

From this post, Let’s look at using a library called flickrpy available freely. Download the file flickr.py. You will need an API Key from Flickr to get this to work. Keys are free for non-commercial use. Just click the link”Apply for a new API Key” on the Flickr API page and follow the instructions.

Once you have an API key, open flickr.py and replace the empty string on the line
API_KEY = ”
with your key. It should look something like this:
API_KEY = ‘123fbbb81441231123cgg5b123d92123’

Let’s create a simple command line tool that downloads images tagged with a particular tag. Add the following code to a new file called tagdownload.py.

import flickr
import urllib, urlparse
import os
import sys

if len(sys.argv)>1:
    tag = sys.argv[1]
else:
    print 'no tag specified'

# downloading image data
f = flickr.photos_search(tags=tag)
urllist = [] #store a list of what was downloaded

# downloading images
for k in f:
    url = k.getURL(size='Medium', urlType='source')
    urllist.append(url)
    image = urllib.URLopener()
    image.retrieve(url, os.path.basename(urlparse.urlparse(url).path))
    print 'downloading:', url

If you also want to write the list of urls to a text file, add the following lines at the end.

# write the list of urls to file
fl = open('urllist.txt', 'w')
for url in urllist:
    fl.write(url+'\n')
fl.close()

From the command line, just type

$ python tagdownload.py goldengatebridge

and you will get the 100 latest images tagged with “goldengatebridge”. As you can see, we chose to take the “Medium” size. If you want thumbnails or full size originals or something else, there are many sizes available, check the documentation on the Flickr website.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: