The only caveat is that detailed locations like address names are not always available.
Below is a sample of how to utilize the revgeo package:library(revgeo)revgeo(longitude=-77.
0229529, latitude=38.
89283435, provider = 'photon', output=’frame’)So where is the problem? Well, as stated on Photons webpage:You can use the API for your project, but please be fair — extensive usage will be throttled.
We do not guarantee for the availability and usage might be subject of change in the future.
I am not certain how many queries it takes before the Photon API slows down but it is important to be mindful in how many requests we send to their server.
I decided to start with 500,000 coordinates to reverse geocode but this didn’t work well.
I ran the code and walked away for some time and when I came back I saw the throttling had begun so I needed to tweak the code.
In addition, R was throwing an error cannot allocate vector of size x.
x Gb, which means that my available RAM has been exhausted.
At this point I had two issues: 1) Throttling and 2) Memory Allocation.
For issue 1, I needed to incorporate sleep times in the code and work with smaller subsets of my already subsetted dataframe.
For issue 2, I found a thread on stackoverflow that had useful advice:Memory Allocation "Error: cannot allocate vector of size 75.
1 Mb"In the course of vectorizing some simulation code, I've run into a memory issue.
I'm using 32 bit R version 2.
15.
0 (via…stackoverflow.
comA solution that helped me was running memory.
limit(size = _ _ _ _ _ _).
In addition I used the rm() command to remove any dataframes I no longer needed within my code and the gc() command for garbage collection.
Shown below, I loaded in the dataframe with ~1 million coordinates called main.
I subsetted the data to only 100,000 rows.
As you will see later, I subset the data even further in a while loop to avoid memory allocation issues.
library(revgeo)# the dataframe called 'main' is where the 1 million coordinate points reside.
main <- readRDS("main.
rds"))main_sub <- main[0:100000,] # Working with a smaller initial subsetrm(main)gc()Below is the full code.
The script incorporates other actions not relating to this posts subject matter but I wanted to publish it here so that you may see the whole picture and hopefully take away some helpful tips in reverse geocoding.
# Step 1: Create a blank dataframe to store results.
data_all = data.
frame()start <- Sys.
time()# Step 2: Create a while loop to have the function running until the # dataframe with 100,000 rows is empty.
while (nrow(main_sub)>0) {# Step 3: Subset the data even further so that you are sending only # a small portion of requests to the Photon server.
main_sub_t <- main_sub[1:200,]# Step 4: Extracting the lat/longs from the subsetted data from# the previous step (Step 3).
latlong <- main_sub_t %>% select(latitude, longitude) %>% unique() %>% mutate(index=row_number()) # Step 5: Incorporate the revgeo package here.
I left_joined the # output with the latlong dataframe from the previous step to add # the latitude/longitude information with the reverse geocoded data.
cities <- revgeo(latlong$longitude, latlong$latitude, provider = 'photon', output = 'frame')) %>% mutate(index = row_number(),country = as.
character(country)) %>% filter(country == 'United States of America') %>% mutate(location = paste(city, state, sep = ", ")) %>% select(index, location) %>% left_join(latlong, by="index") %>% select(-index) # Removing the latlong dataframe because I no longer need it.
This # helps with reducing memory in my global environment.
rm(latlong) # Step 6: Adding the information from the cities dataframe to # main_sub_t dataframe (from Step 3).
data_new <- main_sub_t %>% left_join(cities, by=c("latitude","longitude")) %>% select(X, text, location, latitude, longitude) # Step 7: Adding data_new into the empty data_all dataframe where # all subsetted reverse geocoded data will be combined.
data_all <- rbind(data_all,data_new) %>% na.
omit() # Step 8: Remove the rows that were used in the first loop from the # main_sub frame so the next 200 rows can be read into the while # loop.
main_sub <- anti_join(main_sub, main_sub_t, by=c("X")) print(nrow(main_sub)) # Remove dataframes that are not needed before the while loop closes # to free up space.
rm(data_sub_t) rm(data_new) rm(latlong_1) rm(cities) print('Sleeping for 10 seconds') Sys.
sleep(10) }end <- Sys.
time()After implementing this code, it took about 4 hours to reverse geocode 100,000 coordinates.
In my opinion, that’s not a viable option if I have 1 million coordinates to convert.
I may have to find another method to achieve my goal but I figured this would be helpful to some of you who have smaller datasets.
Thanks for reading and happy coding!.