Animating regression models in R using broom and ggplot2

As with the animated Scottish rugby champions graph, this example doesn’t really benefit from adding the animation as another dimension to the plot.The graph is simply to show the trends for some metrics to do with UK university fundraising over time..I only really need x and y to represent the value and the year, but where’s the fun in that?.That’s the sort of thing we can plot ridiculously easily using ggplot2:ggplot(fund_tidy, aes(x = year, y = value, colour = kpi)) + geom_line()Why not use this as a bit more of a learning exercise?.Absolutely, but the data has to come first.A CASE of trendsI had been doing some university fundraising work looking at historic Ross-CASE reports, and thought it would be interesting to look at how some of the key performance indicators had changed over time..I then gathered the data into a tidy, ‘long’, format:# create contactable alumni x100 variable to place values on equivalent scalefund_df <- fund_df %>% mutate(contact_alum_x100 = contactable_alumni * 100)# create tidy dataframelibrary(tidyr)fund_tidy <- fund_df %>% gather(kpi, value, – year) %>% mutate(kpi = as.factor(kpi))glimpse(fund_tidy)Observations: 60Variables: 3$ year <int> 2005, 2006, 2007, 2008, 2009, 2010, 2011, 20…$ kpi <fct> new_funds_raised, new_funds_raised, new_fund…$ value <dbl> 452, 548, 682, 532, 600, 693, 774, 681, 807,…With the data transformed, we were ready to create our first animated plot, remembering to start by filtering out out original contactable_alumni variable:# create animated plotlibrary(gganimate)library(transformr)first_animate <- fund_tidy %>% filter(kpi != "contactable_alumni") %>% ggplot(aes(x = year, y = value, colour = kpi)) + geom_line() + # this next line is where the magic happens: transition_reveal(kpi, year) + labs(title = "Trends in University Fundraising KPIs Over Time", subtitle = "Data from Ross-CASE reports", x = "Year", y = 'Value', caption = "y axis labelling omitted due to differences in scale between KPIs", colour = "KPI") + scale_colour_discrete(labels = c("Cash received", "Contactable alumni", "Fundraising staff", "New funds raised")) + scale_y_discrete(labels = NULL) + theme_chris()And we’re off..How to go about that…?To do that in a non-animated way, we’d simply add a geom_smooth() to our plotting code:# create non-animated plot with trendlinesfund_tidy %>% filter(kpi != "contactable_alumni") %>% ggplot(aes(x = year, y = value, colour = kpi)) + geom_line() + geom_smooth(method = "lm", linetype = "dashed", se = FALSE) + labs(title = "Trends in University Fundraising KPIs Over Time", subtitle = "Data from Ross-CASE reports", x = "Year", y = 'Value', caption = "y axis labelling omitted due to differences in scale between KPIs", colour = "KPI") + scale_colour_discrete(labels = c("Cash received", "Contactable alumni", "Fundraising staff", "New funds raised")) + scale_y_discrete(labels = NULL) + theme_chris()But can we simply do that and add the transition_reveal() line to animate that in the same way?.My next thought was to create the trendlines as a separate stage in the process, building another dataframe from which to build my animated plot:#—- create linear model and augmented dataframe —-# build pre-filtered dataframefund_tidy2 <- fund_tidy %>% filter(kpi != "contactable_alumni")# build linear modellin_mod <- lm(value ~ year + kpi, data = fund_tidy2)# augment linear model to produce tidy dataframe with fitted valueslibrary(broom)aug_mod <- augment(lin_mod)# create animated graphaug_animate <- aug_mod %>% ggplot(aes(x = year, y = value, colour = kpi)) + geom_line(aes(group = kpi, y = .fitted), size = 0.5, linetype = "dashed") + geom_point(size = 2) + geom_line(aes(group = kpi)) + transition_reveal(kpi, year) + labs(title = "Trends in University Fundraising KPIs Over Time", subtitle = "Data from Ross-CASE reports", x = "Year", y = 'Value', caption = "y axis labelling omitted due to differences in scale between KPIs", colour = "KPI") + scale_colour_discrete(labels = c("Cash received", "Contactable alumni", "Fundraising staff", "New funds raised")) + theme_chris()# animate and saveaug_animated <- animate(aug_animate, height = 500, width = 800)anim_save("aug_animated.gif", animation = aug_animated)Oh dear, of course, that hasn’t worked..That said, it’s what I did first before I decided to bite the bullet and try and do it the ‘proper’ way:#—- build multiple models for animated plot with trendlines —-# build nested tibblefund_nested <- fund_tidy2 %>% group_by(kpi) %>% nest()# build separate regression modelsfund_models <- fund_nested %>% mutate(lm_mod = map(data, ~lm(formula = value ~ year, data = .x)))# augment models and unnest tibblefund_models_aug <- fund_models %>% mutate(aug = map(lm_mod, ~augment(.x))) %>% unnest(aug)case_animate <-fund_models_aug %>% ggplot(aes(x = year, y = value, colour = kpi)) + geom_line(aes(group = kpi, y = .fitted), size = 0.5, linetype = "dashed") + geom_point(size = 2) + geom_line(aes(group = kpi)) + transition_reveal(kpi, year) + labs(title = "Trends in University Fundraising KPIs Over Time", subtitle = "Data from Ross-CASE reports", x = "Year", y = 'Value', caption = "y axis labelling omitted due to differences in scale between KPIs", colour = "KPI") + scale_colour_discrete(labels = c("Cash received", "Contactable alumni", "Fundraising staff", "New funds raised")) + scale_fill_discrete() + theme_chris()And that’s about what we want.. More details

Leave a Reply