Citizen Data Science: Analyze Nature Without Programming

Citizen Data Science:Analyze Nature Without ProgrammingUsing Excel with iNaturalist and eBirdJohn HurleyBlockedUnblockFollowFollowingMay 15Steven Wright @stevenwright via UnsplashI recently gave an informal talk to a class of botany students at Gavilan College.

The original topic was nature photography, but I also talked about the data science techniques that I used to create my recently completed photo book, Portraits of Birds: Shoreline Park.

The concept for the book was to try to personally take photos of all of the bird species in a particular area, in this case Shoreline at Mountain View Park in Mountain View, California, which I later expanded to include the Palo Alto Baylands.

To enumerate the species that have been seen in this area, I turned to two citizen science sites, iNaturalist and eBird, both of which have application programmatic interfaces (APIs).

Note that while eBird is specific to birds, iNaturalist contains data on plants and other animals as well.

A future set of articles will cover the different techniques and libraries that I used to research and create the book, including Folium for map making.

These will also reference the GitHub repository containing the Python code that I used.

Sans ProgrammingAll of this is well and good for someone who does data science for a living and prefers Pandas dataframes to Excel tables.

However, even if you are not a programmer, you may be comfortable with Microsoft Excel, which can be used effectively to do some investigations using iNaturalist and eBird data.

A little bit of creative cutting and pasting of URLs can go a long way towards using this valuable data without having to learn Python or R.

After all, this is what I did to figure out how some of the APIs worked, or to replicate them when they didn’t work as advertised.

MethodOne of the students in the botany class presented a report on Muhlenbergia rigens, commonly known as deergrass.

Although it is a fascinating plant, I will leave the analysis of Muhlenbergia rigens as an exercise, and will instead consider one of my favorite birds, the American Kestrel (Falco sparverius).

American Kestrel (Falco sparverius), photo by authorTo illustrate the technique, we will search for American Kestrel observations on iNaturalist.

I will walk through the steps in detail to make it clear how to generalize to other applications.

Start by going to the iNaturalist home page here, and click on the magnifying glass in the upper left hand corner.

Enter ‘American Kestrel’ in the search field to get to this page:https://www.

inaturalist.

org/observations?place_id=any&taxon_id=4665We can see that there are 13269 observations (to date).

You can also see from this that the taxon id that they use for Falco sparverius is 4665.

Since I am interested in local sightings, I also filled in the location field with “Mountain View, CA, USA”, limiting our results to 238 observations.

iNaturalist helpfully shows possible choicesNext, click on Filters, then click on Download.

Lots of filtering options hereThis leads you to a new page with a plethora of options to select what you would like in the final exported CSV (comma-separated values) file.

The key part of this page for later hacking is this:By appending the query fragment in the box to this:https://www.

inaturalist.

org/observations?you will get a full URL for the download.

Changing options in the Create Query pane will change the URL fragment.

I clicked on “Places of Interest” and restricted the area to Mountain View; we only get 238 observations.

From looking at the query, we see that 76255 is the place_id for Mountain View.

After you are done filtering, scroll down to Step 4 and click “Create Export”.

If the number of observations is small, it will complete quickly.

Scroll back to the top and above “Export complete” click the “Download” button.

On a Mac, this will go into your Downloads folder with a name like “observations-53038.

csv”, which you can now open with Excel.

When you have a search query that you like, save the full URL in your notes for quick and reproducible access to the data, always an important part of data science.

On to analysis!There are 36 columns in the default download, and looking through them immediately gives us some ideas for data analysis.

For example, we could use the latitude and longitude columns to plot the sighting on a map, or examine the Description column with natural language processing — hold on a second, this is Excel, not a Jupyter Python notebook, so those columns are not easily usable.

If we are planning to find an American Kestrel in the wild, it would be very useful to know if it is seasonal or year-round.

A column that helps is “observed_on”, which appears to be in MM/DD/YY format.

Using Excel’s “Pivot Table” feature, we can summarize observation counts by month to find the best time of year to look for it.

This isn’t a tutorial on pivot tables, so we will follow a StackOverflow post [1].

Add three columns after the “observed_on” column: obs_month, obs_year and obs_countUse the formula =TEXT(C2, “MMM”) for obs_month and =TEXT(C2, “YYYY”) for obs_year, the number 1 for obs_count and fill down for all observations (beware smart quotes if copying and pasting!)Select the three new columns and create a pivot table (on the Mac version of Excel, click on the Data tab to find it).

Drag obs_year under Column Labels, obs_month under Row Labels and the obs_count field under Values as shown below.

This should create a pivot table that looks something like this:To answer the question of when to see an American Kestrel, the column of interest is the rightmost one; we see that the winter months are when is it most frequently reported (although not necessarily when it is most abundant).

The bottom row is also interesting and likely is due to the increasing popularity of the iNaturalist mobile app.

eBirdAlthough this article uses iNaturalist examples, much of the data from eBird is accessible by poking around the API reference.

For example, this URL:https://ebird.

org/ws2.

0/ref/hotspot/geo?lat=37.

4407&lng=-122.

0936&fmt=csv&dist=6will download CSV data for the birding hotspots within 6km of Shoreline Park in Mountain View.

By reading through the API documentation here, you can construct other useful query URLs.

ConclusionsAlthough this article just scratches the surface on what is possible with this treasure trove of citizen science data, it shows that it is possible to gain useful insights even without programming.

Of course, I am also hopeful that it will inspire some of you to go beyond what is possible with Excel and learn a programming language such as Python or R to help automate your analysis tasks.

I have included some article links below to help jumpstart your efforts.

ReferencesExcel: Count number of occurences by month, Matthew Lock and Stepan1010, StackOverflow.

iNaturalist API Reference, California Academy of Sciences and National Geographic SocietyeBird API 2.

0 Reference, Cornell Lab of OrnithologyWhen Excel isn’t enough, Andres VourakisPython Excel Tutorial: The Definitive Guide, Karlijn WillemsSeven Clean Steps To Reshape Your Data With Pandas Or How I Use Python Where Excel Fails, Tich MangonoAmerican Kestrel observation, John Hurley.. More details

Leave a Reply