Finding geographical points of interest using Python

This blog will take a look at scraping the TomTom and Google Places APIs to get all the points of interest in an area. A recursive grid search algorithm is discussed that efficiently identifies all of the POIs in a large area where there is a limit on the number of results the API returns.

TomTom vs Google

First, let’s compare each API:

TomTom Google Places
Max free daily requests 2500 2500
Max results returned 100 20
Point search Yes Yes
Max point search radius none 50km
Rectangle search Yes No
Up-to-date No Yes

In each case, you need to register for your own API key which you include as a parameter in the search. Both provide the same number of free daily requests, but TomTom returns more results and allows you to search a broader area. Google’s results, however, are generally more up-to-date and complete.

The rectangular search function of the TomTom API allows us to recursively search smaller and smaller areas untill all points of interest are found so we use that instead of the Google API. Let’s look at the code.

Imports

We’ll use the requests library to make the API calls and pandas to view the results.

import os
import requests
import json
import pandas

This function makes each API call and processes the results by only selecting the fields we need. It’s a good idea to save the full response to a text file as well.

def tomtom_category_search_request(api_key, category, top_left=None, btm_right=None):
    # create the directory if it doesn't exist:
    directory = 'output/' + category
    if not os.path.exists(directory):
        os.makedirs(directory)
    url = """
            https://api.tomtom.com/search/2/categorySearch/{category}.json?key={api_key}&topLeft={top_left}&btmRight={btm_right}&countrySet=ZA&limit=100
          """.format(category=category, api_key=api_key, top_left=top_left, btm_right=btm_right)
    filename = 'output/' + category + '/out_' + str(top_left).replace('.','') + '_' + str(btm_right).replace('.','') + '.txt'
    response = requests.get(url)
    data = response.json()
    if 'httpStatusCode' not in data:
        with open(filename, 'w') as outfile:
            json.dump(data, outfile)

        number_pois = len(data['results'])
        poi_data = {}
        # extract the fields we need
        for result in data['results']:
            name = result["poi"]["name"] if "name" in result["poi"] else None
            id = result["id"] if "id" in result else None
            lat = result["position"]["lat"] if "lat" in result["position"] else None
            lon = result["position"]["lon"] if "lon" in result["position"] else None
            categories = [j[0]["name"] for j in [i["names"] for i in result["poi"]["classifications"]]][0]    
            countrySubdivision = result["address"]["countrySubdivision"] if "countrySubdivision" in result["address"] else None
            freeformAddress = result["address"]["freeformAddress"] if "freeformAddress" in result["address"] else None
            poi_data[id] = {
                'name': name,
                'lat': lat,
                'lon': lon,
                'categories': categories,
                'countrySubdivision': countrySubdivision,
                'freeformAddress': freeformAddress
            }

        return {"data": poi_data, "filename": filename, "number_pois": number_pois}
    else:
        return {"error": "error"}

Testing this on the area around the Kalahari Mall in Upington:

api_key = '<My API KEY>' # get your API key here: https://developer.tomtom.com/tomtom-maps-apis-developers
result = tomtom_category_search_request(api_key, 'cash dispenser', '-28.416455,21.213253', '-28.457805,21.309467')
print("Number of POIs found: " + str(result['number_pois']))
print("Output file: " + result['filename'])
# display the first few results
display(pd.DataFrame.from_dict(result['data'], orient='index').head())
Number of POIs found: 18
Output file: output/cash dispenser/out_-28416455,21213253_-28457805,21309467.txt
lon freeformAddress name lat categories countrySubdivision
ZA/POI/p0/11783 21.25640 Oosterville, Khara Hais, Northern Cape, 8801 FNB Kalahari Mall -28.44360 cash dispenser Northern Cape
ZA/POI/p0/14482 21.24600 Upington, Khara Hais, Northern Cape, 8801 FNB Kalahari Pick And Pay Centre -28.45380 cash dispenser Northern Cape
ZA/POI/p0/156887 21.24940 Scott Street, Upington, Khara Hais, Northern C... Standard Bank Elron Motors -28.45490 cash dispenser Northern Cape
ZA/POI/p0/169466 21.24681 Scott Street, Upington, Khara Hais, Northern C... Capitec Bank Upington Scott Street -28.45729 cash dispenser Northern Cape
ZA/POI/p0/186303 21.26872 Hummel Street, Oosterville, Khara Hais, Northe... ABSA Village Plaza Upington, Oosterville -28.43968 cash dispenser Northern Cape

Since the API doesn’t return all the results, but rather only a maximum of 100, we need a way to find the rest of the POIs.

We start with the smallest rectangle that contains the whole search area. If the maximum number of results is returned for this area, there could be more lurking. So we subdivide the rectangle into 4 equally sized rectangles and repeat the search on each of them. In this way, we perform a narrower search on areas with a high concentration of POIs, but don’t waste API calls on sparsely populated areas.

In the fgure below, the search is performed first on rectangle A. Since the result limit was reached, the search is repeated on rectangle B. Again, the limit is reached, so the search is repeated on rectangles C, D, E, F. Then we go back to rectangles G, H and I - all of which returned a small number of results and so we are done.

def grid_search(top_left, btm_right, api_key, key_word, pois={}):
    print('\nTop-left: ' + top_left)
    print('Bottom-right: ' + btm_right)
    result = tomtom_category_search_request(api_key, key_word, top_left=top_left, btm_right=btm_right)
    num_discovered = 0
    if 'error' not in result:
        ids = set(result['data'].keys())
        new_ids = ids.difference(set(pois.keys()))
        num_discovered = len(new_ids)
        for id in new_ids:
            pois[id] = result['data'][id]
        print("Total POIs found: " + str(len(ids)))
        print("New POIs discovered: " + str(num_discovered))        
        # if the search returned the max number of results, subdivide
        if result["number_pois"] > 98:
            print('Subdividing...')
            # subdivide
            top_leftLat, top_leftLon = top_left.split(',')
            btm_rightLat, btm_rightLon = btm_right.split(',')
            midLat = str((float(top_leftLat) + float(btm_rightLat))/2)
            midLon = str((float(top_leftLon) + float(btm_rightLon))/2)
            new_grid = [
                [[top_leftLat, top_leftLon], [midLat, midLon]], # top left block
                [[top_leftLat, midLon], [midLat, btm_rightLon]], # top right block
                [[midLat, top_leftLon], [btm_rightLat, midLon]], # bottom left block
                [[midLat, midLon], [btm_rightLat, btm_rightLon]] # bottom right block
            ]
            for grid in new_grid:
                top_left = grid[0][0] + ','+ grid[0][1]
                btm_right = grid[1][0] + ',' + grid[1][1]

                grid_search(top_left, btm_right, api_key, key_word, pois)
    return pois

Finding the shopping centres in a region of Cape Town:

ct_shopping_centers = grid_search('-33.823464,18.340948', '-34.005645,18.706288', api_key, 'shopping center')
Top-left: -33.823464,18.340948
Bottom-right: -34.005645,18.706288
Total POIs found: 100
New POIs discovered: 100
Subdividing...

Top-left: -33.823464,18.340948
Bottom-right: -33.9145545,18.523618
Total POIs found: 42
New POIs discovered: 13

Top-left: -33.823464,18.523618
Bottom-right: -33.9145545,18.706288
Total POIs found: 100
New POIs discovered: 88
Subdividing...

Top-left: -33.823464,18.523618
Bottom-right: -33.869009250000005,18.614953
Total POIs found: 8
New POIs discovered: 0
df = pd.DataFrame.from_dict(ct_shopping_centers, orient='index')
df.head()
lon freeformAddress name lat categories countrySubdivision
ZA/POI/p0/11037 18.63308 Carl Cronje Drive, Tygervalley Waterfront, Cit... Builders Express - Willowbridge -33.86798 shop Western Cape
ZA/POI/p0/11840 18.51077 Orion Road, Nerissa Estate, City of Cape Town,... Berco Indoor House -33.99021 house garden: garden centers services Western Cape
ZA/POI/p0/120847 18.68818 Langeberg Smallholdings, City of Cape Town, We... Bonsai Centre Olive Grove -33.84047 house garden: garden centers services Western Cape
ZA/POI/p0/127716 18.43191 Zonnebloem, City of Cape Town, Western Cape, 8001 Oriental Plaza -33.92910 shopping center Western Cape
ZA/POI/p0/127884 18.69754 Kiewiet Close, Okavango Park, City of Cape Tow... Timbercity - Okavango Park -33.85953 shop Western Cape

TomTom used the ‘shopping center’ category loosely in the search.

df.categories.unique()
array(['shop', 'house garden: garden centers services', 'shopping center'],
      dtype=object)
df = df.query('categories=="shopping center"')[['name', 'categories', 'lat', 'lon', 'freeformAddress', 'countrySubdivision']].rename(columns={'freeformAddress': 'Address', 'countrySubdivision': 'Province'})

And finally, write the dataframe to a .csv

df.to_csv('cape town shopping centers.csv', index=False)