Turning Addresses into Coordinates

Turning Addresses into CoordinatesUsing Google Maps Geocode API with Python.

Daniel Martinez BielostotzkyBlockedUnblockFollowFollowingJan 27Photo by Robert Penaloza on UnsplashFor my last year at university, I am working on a project as a data scientist (for the first time working as an official data scientist :D).

During the data transformation phase, a critical step was to convert the addresses into coordinates since the primary goal of the project is to create a time series model that can predict the number of monthly road traffic accidents (in different areas of the city).

Looking for a solution to this problem I found the Geocoding API by Google.

The intention of this article is to explain how to use this powerful tool using Python.

Let’s get started!Installing the Google Maps libraryThe first step is to install the library, for python this process is the same as for any other popular Python module, you can install the Google Maps library via pip or conda.

using pip:pip install googlemapsusing Anaconda:conda install -c conda-forge googlemapsOnce the installation is done, you will need an API key.

For this you need a Google Cloud account, the steps are very clearly described by Google here.

Using the APIThe cost of use is $0.

005 USD for each request, so 1,000 requests will cost you $5 USD but each month the first 40,000 requests are free.

sourceIn Python, you only need to import the googlemaps module and create a client using the API key.

import googlemapsKEY = 'INSERT_MAPS_API_KEY_HERE'gmaps = googlemaps.

Client(key=KEY)Now, using the function geocode of the Google Maps client passing the address as a string one will receive a JSON with the following structure:{ "results" : [ { "address_components" : [ { "long_name" : "277", "short_name" : "277", "types" : [ "street_number" ] }, { "long_name" : "Bedford Avenue", "short_name" : "Bedford Ave", "types" : [ "route" ] }, { "long_name" : "Williamsburg", "short_name" : "Williamsburg", "types" : [ "neighborhood", "political" ] }, { "long_name" : "Brooklyn", "short_name" : "Brooklyn", "types" : [ "sublocality", "political" ] }, { "long_name" : "Kings", "short_name" : "Kings", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "New York", "short_name" : "NY", "types" : [ "administrative_area_level_1", "political" ] }, { "long_name" : "United States", "short_name" : "US", "types" : [ "country", "political" ] }, { "long_name" : "11211", "short_name" : "11211", "types" : [ "postal_code" ] } ], "formatted_address" : "277 Bedford Avenue, Brooklyn, NY 11211, USA", "geometry" : { "location" : { "lat" : 40.

714232, "lng" : -73.

9612889 }, "location_type" : "ROOFTOP", "viewport" : { "northeast" : { "lat" : 40.

7155809802915, "lng" : -73.

9599399197085 }, "southwest" : { "lat" : 40.

7128830197085, "lng" : -73.

96263788029151 } } }, "place_id" : "ChIJd8BlQ2BZwokRAFUEcm_qrcA", "types" : [ "street_address" ] }, .

Additional results truncated in this example[] .

], "status" : "OK"}The coordinates in latitude and longitude will be in the geometry section.

ExampleIn order to illustrate how I use the API in my final project I will use 10 addresses of the data I’m working with.

The data set comes from a city where streets can be called by name or number.

As you can see, there are very different formats.

The address in position 4 can be written as ‘Murillo con 19’ or ‘CALLE 45 CARRERA 19’ or ‘CLLE 45 CRA 19’, etc.

And some addresses are actually written using the ‘CRA’ first.

I wrote a function to return a list of latitude and longitude values for a given address if the API response is successful and missing value NaN if it is not.

Then, calling apply the list is added to the Pandas data frame as two different columns.

def get_coordinates(address): city = '<City Name>, <Country>' geocode_result = gmaps.

geocode(str(address) +' '+ city) if len(geocode_result) > 0: return list(geocode_result[0]['geometry']['location'].

values()) else: return [np.

NaN, np.

NaN]Including the name of the city and country is the best way to increase the performance of the API.

coordinates = df2['Address'].

apply(lambda x: pd.

Series(get_coordinates(x), index=['LATITUDE', 'LONGITUDE']))df2 = pd.

concat([df2[:], coordinates[:]], axis="columns")So we’re done!.All the addresses have been transformed into coordinates, the rest of the job is to check how many NaN values you have after this process.

(In my case it was almost 160 in a total of 25,000 addresses) and I suggest to plot the points to check if all of them are in the expected city.

Basemap is an easy way to do this.

As a final suggestion, with a large number of addresses, the process may take several hours to complete.

I executed the code using a kernel in Kaggle that generates df2 as a CSV file, the process runs on Kaggle servers and you can continue working on other parts of the project.

.

. More details

Leave a Reply