Nature’s Patterns are Changing: Explore 32 Years of Community Science Data

Over a 32 year span, Latimer manually collected 10,484 phenology observations.

Along with the date recorded, Latimer and colleague Nanninga added 9 other variables (e.


, genus, species, common name, lifeform, etc.

) for a total of 104,840 total observations.

This is just the start, our dataset stops at 2016 while Latimer continues to record phenology observations to this day!The data is in typical ‘date’ format.

However, scientists typically use Julian day format to provide a continuous count of the days starting at January 1st each year.

Each year will contain 365 days or 366 days if a leap year.

datphen$date<- as.

Date(datphen$date, "%m/%d/%y")A quick look at the data shows that Latimer did not record observations daily.

This makes sense as plants are dormant in the winter and birds migrate seasonally.

The data include observations on plants, birds and other animals.

To get a feel for the plant species observed use the following:datphen2 %>% select(common_name, lifeform, event) %>% filter(lifeform == "PLANTS" & event =="FLOWERING") %>% group_by(common_name) %>% tally() %>% arrange(-n)This produces a table with plant common name and the number of observations Latimer made for that species.

Below is the table for the first 10 entries.

We can see from the above table that some species have numerous observation while others less so.

Below we use the following code to visualize the number of observations Latimer made per species throughout the 32 years of data collections.

a<-datphen2 %>% select(common_name, lifeform, event) %>% filter(lifeform == "PLANTS" & event =="FLOWERING") %>% group_by(common_name) %>% tally() %>% arrange(-n)ggplot(a, mapping = aes(x = reorder(common_name, -n), y = n)) + geom_bar(stat = "identity") + geom_hline(yintercept = 15, color="lightblue") + theme(axis.


x=element_blank(), axis.


y=element_text(size=16), axis.

title=element_text(size=16,face="bold"), plot.

title = element_text(size=22), axis.


x=element_blank()) + labs(title = "Observations per plant species over 32 years", x = "Plant Species", y = "Number of observations per species")Just for plants alone, Latimer recorded an impressive 392 species — seen as the number of bars along the horizontal x-axis above.

However, the number of observations per species varies considerably.

To ensure that we have enough observations per species to detect phenology changes, we are restricting the analysis to those species with 15 or more observations (Species with observations above the blue line in above graph), which results in 76 species.

From the selected 76 species, we will fit a linear regression model to each species to determine if a change in flowering time phenology coincides with a change in year.

Preparing data for linear regression loop and graphs:#subset species that have 15 or more years of flowering observations # Prep or tidy the data for lm model fittingb<- a %>% filter(n >= 15)c<-b %>% select(common_name)lm.

data<-inner_join(c, datphen2, by = "common_name") %>% filter(event =="FLOWERING") #%>% group_by(common_name) %>% distinct(year, .

keep_all = TRUE)lm.



data$year)The below code shows how to run a linear model consecutively (i.


, loop) through all 76 species, and neatly extract the model slope and p— value to identify candidate species for further investigation.

library(broom)#run an lm model across all rows, genotypeslm.

results<- lm.

data %>% group_by(common_name) %>% do(fitdata = lm(julian ~ year, data = .

))#getting tidy data output from model runlmSlopePvalue <- tidy(lm.

results, fitdata) %>% select(common_name, term, estimate, p.

value) %>% filter(term =="year")lmRsquare <- glance(lm.

results, fitdata) %>% select(common_name, r.

squared)lmtidyoutput<-left_join(lmSlopePvalue, lmRsquare, by = c("common_name" = "common_name"))The regression analysis shows that 10 of the 76 species have an p — value < 0.


Meaning that there is a significant relationship between when Latimer noticed first flowering (Julian day) for a given species, and the year the observations were recorded (year).

Interpreting significance and ensuring that model assumptions are met should be further investigated and beyond the scope of this post.

To plot all the significant species, prep the data as shown in R notebook and run the following function:#Looping graph function for phenology relationshipplant.

species <- function(df, na.

rm = TRUE, .

) # create list to loop over plant_list <- unique(df$common_name) # create for loop to produce ggplot2 graphs for (i in seq_along(plant_list)) { # create plot for each specis in df plot <- ggplot(subset(df, df$common_name==plant_list[i]), aes(y= julian, x= year, group = common_name)) + geom_point(size=4, colour = "dark blue")+ geom_smooth(method = lm) + theme(axis.

text=element_text(size=12), axis.

title=element_text(size=16,face="bold"), axis.


x = element_text(angle = 90, vjust = 0.

5), legend.

text=element_text(size=16)) + ggtitle(paste(plant_list[i])) print(plot) }}Results:The code examples above and as detailed in the R notebook can be used to investigate other plant and animal species within Latimer’s dataset.

The three graphs below are chosen to show various trends in phenology.

The Dark Eyed Junco, for example, seems to appear at the same time each year despite prevailing weather conditions.

Keep in mind that weather changes appear locally over short timeframes, from minutes to hours to days; while climate is longe-term regional and global averages.

So if northern MN is experiencing climate change, the early phase of Dark Eyed Junco migration does not seem to be influenced.

In contrast to the Junco, the plants shown below display trends in flowering time.

With False Solomon’s Seal showing a delayed flowering day response at this site for the years observed, while aspen displays an earlier flowering day phenology.

Notice that the left side of the aspen graph has less observations than the right.

What happens when you exclude the earlier years?.Do you get the same trend — hint, see R notebook.

Final Thoughts:Forget the political undertones that somehow seem to accompany any post that mentions the word ‘climate.

’ Instead, this is meant to introduce Latimer’s life long work to the data science community.

I encourage you to form your own opinions on these data.

Are the data complete enough to make robust models?.What models should be used and what additional data would help — e.


, weather related variables?.Or, should we leave phenology to professional scientists?.Are you ready to contribute to the cause and start recording your own phenology data?And for the community scientists out there, what can be learned by engaging the global data science community?.Are you ready to look at your data with the same enthusiasm that you have for recording phenology?Towards the end of our interview, I had to ask Latimer what drives him.

Why has he recorded data for over three decades regardless of the harsh sub-boreal Minnesota conditions.

His response was quite simple:The joy of discovery, and the anticipation of noticing something new every day.

Perhaps community based data analytics is the next major transition in science — only time will tell, but for Latimer it doesn’t really matter.

He is going to get out of bed, walk the woods and continue to record nature’s secrets.

About the Author:Weston’s earliest memories are littered with images from nature.

Professionally he is fascinated by host-microbiome interactions, passionate about energy solutions and the environment, and all things related to open analytics.

Personally he is a wannabe farmer raising three awesome children with an amazing spouse.

Acknowledgements:This story is made possible by the hard work of John Latimer, who eagerly recorded nature’s secrets throughout the last three and half decades.

The painstaking task of translating and annotating Latimer’s data into a spreadsheet was accomplished by Claudia Nanninga — the real phenology expert.

Many thanks to those who made the poplar, False Solomon Seal and Junco photographs available under the creative commons license.

Thanks to Kristine Cabugao, for the fine edits and kind words.


. More details

Leave a Reply