Unit Testing in R

My code is working!”.

Unit testing helps to create robust code.

After a short motivation what robust code is, I give a survey of the basic unit testing idea.

Finally, I show how to use them quickly in R, even for simple scripts (no burden to create R packages).

Entangled software breaks upon tiny changes — like these wirings (Photo by Pexel on Pixabay)Robust codewill not break easily upon changes (e.

g.

new R version, package updates, bug fixes, new features, etc.

)can be refactored simplycan be extended without breaking the restcan be testedUnit tests are important for writing robust code — they allow to be more confident that some changes will not break the code — at least the code can be quickly fixed than in a tightly coupled codebase.

There is a lot of wonderful literature about unit testing¹.

In this article, I want to share the basics behind unit testing and apply those to the R scripting language with a recommendation of packages and concepts to use for your daily R programming.

Be sure: in a growing codebase, you will need unit tests.

Sometimes even small scripts can sometimes produce a lot of distress.

I promise a lot of time in the debugger without tests.

On the long term unit tests save a lot of your precious time, even if you think that it might be extra work.

I do not favor the test-driven design (TDD) — tests should, in my opinion, be just a tool to support your programming.

They shouldn’t rule you!MotivationUnit tests are particularly useful in R and Python (and other dynamically typed script languages) as there is no assistance from a compiler showing you places where functions could be called with invalid arguments.

There are some helper packages for like lintr for R or pylint for Python trying to ease that a bit.

However, in the past I often experienced difficulties — code broke e.

g.

as some intermediate list suddenly became empty or just contained a single item where a list of multiple items as expected.

At these places, a test will help to prevent this issue in the future.

In data science, some computations last a long time.

Bad if the computation is aborted due to an error after a couple of hours.

Unit Testing BasicsTesting Simple FunctionsA simple function takes an input and generates an output like thisIn unit testing, we want to verify if the output y has the expected value for a specific input x when calling the function f.

Usually, different (x,y) pairs are tested.

An example could be a function that sorts a vector of values in ascending order.

Boundary tests are of major importance.

In this example, we could test e.

g.

an empty list as inputa list with a single valuean already sorted listan unsorted listthrows the function an error if an invalid argument is supplied?Does the function handle all those cases correctly?When creating a complex application it is a good habit to have unit tests for functions that could fail, or where a bug occurred in the past.

Write a test for the bug, fix the bug, and see if the unit test succeeds.

Testing Functions with Side EffectsIt is not always as simple as in the last section.

Sometimes a function has side effects which could be the reading/writing of files, access to databases, and much more.

In that case, the preparation of the test is more involved.

It could comprise just a bunch of mock objects of functions for simulating access to a database.

That is influencing the programming style — abstraction layers might become necessary for that (see e.

g.

the R Database Interface — DBI).

In some cases, input files need to be generated before executing the test and output files are to be checked after the test.

Testing ClassesObject-oriented programming (OOP) is a bit strange in R and feels awkward.

If you ever had the chance to write software in other languages: I would strongly recommend that to get a better feeling of what object orientation is — give e.

g.

Kotlin a try — that is fun.

The basic idea behind object orientation is that you put together data (member variables) with code (called methods) working on that.

That is declared within a class definition.

The big idea with object-orientation is that you can derive classes from another class by inheritance, thereby extending its functionality and data.

A simple example: a graphical shape could e.

g.

have an (x,y) offset from the origin.

From that, we derive a rectangle which has in addition to that a width and a height (w,h).

A class definition is only the blueprint of a concrete object instance which is generated by a constructor.

The state (member variables) of an instance is then modified step by step by calling class methods like that:In R there are different OOP systems like S3 and S4.

Most of the old functionality is written in S3.

Functions like summary() or print() are examples for that.

These “functions” are in reality methods that are dispatched to the respective class methods for the different object types put as an argument.

See e.

g.

the R Tutorial Sec.

16 by Kelly Black if you would like to know more about this.

So a test will usually consist of a series of operations on an object instance, thereby verifying if the result is expected after some steps.

Unit Tests in RHere we use the testthat package⁴ which has a concept known from the xUnit tests derived from other languages (Java, C#, Python).

This example shows the first basic function test.

As I often find myself creating more scripts and not R packages (which is a lot more tedious) this example shows how to use testthat without necessarily creating a package.

Example: A simple functionAssume we write a function which just increments its argument (which could be a vector of numbers) by one within the my_code.

R file.

That seems to be primitive.

But you will see in the sequel that even this function may fail.

Create a directory called tests and put there one ore more R scripts all starting with test_ as a file name.

After this you could just start the unit testing code by calling testthat::test_dir(“tests”) within R and you will see an output similar to that.

The output is shown after calling the tests.

So the function did not work with an empty list c().

As it is not a package the test file must contain a source() command for importing your script.

Tests are declared using the testthat::test_that(name, expression) function.

The first argument assigns the test a name for identifying it.

The second argument is an R expression which shall use the expect_* assertions.

Whenever an assertion does not hold the test is aborted and marked as failed.

The good thing is: whenever you decide to move on to create your R package you could just let your tests in place.

It is then only necessary to remove those source() commands from the test_xxx.

R file.

See my GitHub repository for this basic snippet.

ConclusionUnit tests make you lose your fear of altering the source code.

Your programming style will change when you have testability on your mind while writing code.

If functions cannot be simply tested it smells (see Martin Fowler²) that you have entangled code!.In that case, there might be an excessive effort to prepare the environment (variables, files, database connection, etc.

) for making a test.

Unit testing should be easy for functions with a well-defined purpose!I recommend getting inspired by the great books of Fowler² and Martin³.

Detangle your code, write unit tests, and have fun!Spend more time with interesting features — and less with debugging.

References[1]: G.

J.

Myers, T.

Badgett, T.

M.

Thomas, and C.

Sandler, The art of software testing, vol.

2 (2004), Wiley Online Library[2]: M.

Fowler, Refactoring: improving the design of existing code (2000), Addison-Wesley Professional[3]: R.

C.

Martin, Clean code, A handbook of agile software craftsmanship (2009), Prentice Hall[4]: testthat: Unit Testing for R.

Hadley Wickham.

.. More details

Leave a Reply