The force is strong in you if you abstract your R code

The force is strong in you if you abstract your R codeBy writing a function to analyze Star Wars characters, learn the powerful abstraction capabilities of RKeith McNultyBlockedUnblockFollowFollowingFeb 20Any decent mathematician or computer programmer will tell you that if a task is being repeated again and again, it should be made into a function.

This has always been true, and if you are still coding repeated tasks again and again while just changing a variable or two — if you are just copy/pasting code for example — then you need to stop right now and learn how to write functions.

But recent developments mean that there are more and more incentives to always consider what parts of your code can be abstracted.

Developments in R packages to get around non-standard evaluation challenges and to enhance abstraction power through quosures and related expressions means that amazing powers are within our reach.

Functions in RLet’s start simply.

A function is useful to create if you are doing identical analysis but just changing variable values.

Let’s work with the starwars dataset in dplyr.

If we wanted a list of all human characters we might use this:starwars_humans <- starwars %>% dplyr::filter(species == "Human") %>% dplyr::select(name)This will return the names of 35 characters.

Now if we want a the same list but for several other species, we could just copy and paste and change the value for species.

Or we would write this function for future use:species_search <- function(x) { starwars %>% dplyr::filter(species == x) %>% dplyr::select(name)}Now if we run species_search("Droid") we get a list of four characters and are reassured to see our buddy R2-D2 in there.

We can of course extend this to make it a function with more than one variable to help us search based on various conditions.

Abstracting the search further using features of rlangThe problem above is that this function has limited flexibility.

It is defined in a way that you have no control over which variable you want to filter on.

What if we wanted to redefine this function so that it will return a list based on any arbitrary condition that we set.

Here we can now set two arguments to the function, one to represent the column on which to filter, and another the value to filter against.

We can use the enquo function in rlang to capture the column name for use in dplyr::filter().

Like this:starwars_search <- function(filter, value) { filter_val <- rlang::enquo(filter) starwars %>% dplyr::filter_at(vars(!!filter_val), all_vars(.

== value)) %>% dplyr::select(name)}Now if we evaluate starwars_search(skin_color, "gold") we are reassured to see our anxious but loveable friend C-3PO returned.

Even further to allow arbitrary filter conditions using purrrSo even with our step above we have made our search functionality more abstract and powerful, but it’s still somewhat limited.

For example, it only deals with one filter and will only find characters that match that single value.

Lets imagine that we have a set of filters in the form a of a list.

We can use the map2 function in purrr to take that list and break it into a series of quosure expressions that can be passed as individual statements into dplyr::filter, using a new function that acts on a dataframe:my_filter <- function(df, filt_list){ cols = as.

list(names(filt_list)) conds = filt_list fp <- purrr::map2(cols, conds, function(x, y) rlang::quo((!!(as.

name(x))) %in% !!y)) dplyr::filter(df, !!!fp)}Now this allows us to further abstract our starwars_search function to receive an arbitrary set of filter conditions in a list, and those conditions can be set to either match a single value of a set of values expressed in a vector:starwars_search <- function(filter_list) { starwars %>% my_filter(filter_list) %>% dplyr::select(name)}Now we can, for example, look for all characters who have blue or brown eyes, are human and hail from Tatooine or Alderaan, using starwars_search(list(eye_color = c("blue", “brown"), species = “Human", homeworld = c("Tatooine", “Alderaan"))) which will return the following:# A tibble: 10 x 1 name <chr> 1 Luke Skywalker 2 Leia Organa 3 Owen Lars 4 Beru Whitesun lars 5 Biggs Darklighter 6 Anakin Skywalker 7 Shmi Skywalker 8 Cliegg Lars 9 Bail Prestor Organa10 Raymus AntillesNow you are ready to unleash the full power of the force, by developing functions that abstract multiple elements of your dplyr code.

For example, here’s a function that allows you to find any grouped averages you wish of certain Star Wars characters:starwars_average <- function(mean_col, grp, filter_list) { calc_var <- rlang::enquo(mean_col) grp_var <- rlang::enquo(grp) starwars %>% my_filter(filter_list) %>% dplyr::group_by(!!grp_var) %>% summarise(mean = mean(!!calc_var, na.

rm = TRUE))}So if you wanted to find the average height of all humans according to their home worlds, this can be accomplished using starwars_average(height, homeworld, list(species = "Human")) which will return this table:# A tibble: 16 x 2 homeworld mean <chr> <dbl> 1 Alderaan 176.

2 Bespin 175 3 Bestine IV 180 4 Chandrila 150 5 Concord Dawn 183 6 Corellia 175 7 Coruscant 168.

8 Eriadu 180 9 Haruun Kal 188 10 Kamino 183 11 Naboo 168.

12 Serenno 193 13 Socorro 177 14 Stewjon 182 15 Tatooine 179.

16 <NA> 193Although this has been a somewhat trivial example, I hope this helps you better grasp the potential that is available in R functions nowadays.

As you look at your day to day work, you may find that there are opportunities to abstract out some of your most common manipulations into functions which could save you a lot of time and effort.

Really, what I have demonstrated here is only the tip of the iceberg in terms of what is possible.

Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist.

I am passionate about applying the rigor of all those disciplines to complex people questions.

I’m also a coding geek and a massive fan of Japanese RPGs.

Find me on LinkedIn or on Twitter.

Many thanks to Sai Im on my team for inspiring some of the ideas here with his functional programming wizardry.


. More details

Leave a Reply