Dealing with Apply functions in R

Dealing with Apply functions in Rvikashraj luhaniwalBlockedUnblockFollowFollowingMar 27Iterative control structures (loops like for, while, repeat etc.

) allow repetition of instructions for several numbers of times.

However, at large scale data processing usage of these loops can consume more time and space.

R language has a more efficient and quick approach to perform iterations with the help of Apply functions.

In this post, I am going to discuss the efficiency of apply functions over loops from a visual perspective and then further members of apply family.

Before proceeding further with apply functions let us first see how code execution takes less time for iterations using apply functions compared to basic loops.

Consider the FARS(Fatality Analysis Recording System) dataset available in gamclass package of R.

It contains 151158 observations of 17 different features.

The dataset includes every accident in which there was at least one fatality and the data is limited to vehicles where the front seat passenger seat was occupied.

Now let us assume we want to calculate the mean of age column.

This can be done using traditional loops and also using apply functions.

Method 1: Using for looplibrary("gamclass")data(FARS)mean_age <- NULLtotal <- NULLfor(i in 1:length(FARS$age)){ total <- sum(total, FARS$age[i]) }mean_age <- total/length(FARS$age)mean_ageMethod 2: Using apply() functionapply(FARS[3],2, mean)Now let us compare both the approaches through visual mode with the help of Profvis package.

Profvis is a code-profiling tool, which provides an interactive graphical interface for visualizing the memory and time consumption of instructions throughout the execution.

To make use of profvis, enclose the instructions in profvis(), it opens an interactive profile visualizer in a new tab inside R studio.

# for method 1profvis({mean_age <- NULLtotal <- NULLfor(i in 1:length(FARS$age)){ total <- sum(total, FARS$age[i]) }mean_age <- total/length(FARS$age)mean_age})Output using method 1Under Flame Graph tab we can inspect the time taken (in ms) by the instructions.

#for method 2profvis({ apply(FARS[3],2, mean)})Output using method 2Here, one can easily notice that the time taken using method 1 is almost 1990 ms (1960 +30) whereas for method 2 it is only 20 ms.

So this is the actual power of apply() functions in terms of time consumption.

Benefits of apply functions over traditional loopsMuch more efficient and faster in execution.

Easy to follow syntax (rather than writing a block of instructions only one line of code using apply functions)Apply family in RApply family contains various flavored functions which are applicable to different data structures like list, matrix, array, data frame etc.

The members of the apply family are apply(), lapply(), sapply(), tapply(), mapply() etc.

These functions are substitutes/alternatives to loops.

Each of the apply functions requires a minimum of two arguments: an object and another function.

The function can be any inbuilt (like mean, sum, max etc.

) or user-defined function.

Explore the members1.

apply() functionThe syntax of apply() is as followswhere X is an input data object, MARGIN indicates how the function is applicable whether row-wise or column-wise, margin = 1 indicates row-wise and margin = 2 indicates column-wise, FUN points to an inbuilt or user-defined function.

The output object type depends on the input object and the function specified.

apply() can return a vector, list, matrix or array for different input objects as mentioned in the below table.

#———- apply() function ———- #case 1.

matrix as an input argumentm1 <- matrix(1:9, nrow =3)m1result <- apply(m1,1,mean) #mean of elements for each rowresultclass(result) #class is a vectorresult <- apply(m1,2,sum) #sum of elements for each columnresultclass(result) #class is a vectorresult <- apply(m1,1,cumsum) #cumulative sum of elements for each rowresult #by default column-wise orderclass(result) #class is a matrixmatrix(apply(m1,1,cumsum), nrow = 3, byrow = T) #for row-wise order #user defined function check<-function(x){ return(x[x>5])}result <- apply(m1,1,check) #user defined function as an argumentresultclass(result) #class is a list#case 2.

data frame as an inputratings <- c(4.

2, 4.

4, 3.

4, 3.

9, 5, 4.

1, 3.

2, 3.

9, 4.

6, 4.

8, 5, 4, 4.

5, 3.

9, 4.

7, 3.

6)employee.

mat <- matrix(ratings,byrow=TRUE,nrow=4,dimnames = list(c("Quarter1","Quarter2","Quarter3","Quarter4"),c("Hari","Shri","John","Albert")))employee <- as.

data.

frame(employee.

mat)employeeresult <- apply(employee,2,sum) #sum of elements for each columnresultclass(result) #class is a vectorresult <- apply(employee,1,cumsum) #cumulative sum of elements for each rowresult #by default column-wise orderclass(result) #class is a matrix#user defined function check<-function(x){ return(x[x>4.

2])}result <- apply(employee,2,check) #user defined function as an argumentresultclass(result) #class is a list2.

lapply() functionlapply() always returns a list, ‘l’ in lapply() refers to ‘list’.

lapply() deals with list and data frames in the input.

MARGIN argument is not required here, the specified function is applicable only through columns.

Refer to the below table for input objects and the corresponding output objects.

#———- lapply() function ———- #case 1.

vector as an input argumentresult <- lapply(ratings,mean)resultclass(result) #class is a list#case 2.

list as an input argumentlist1<-list(maths=c(64,45,89,67),english=c(79,84,62,80),physics=c(68,72,69,80),chemistry = c(99,91,84,89))list1result <- lapply(list1,mean)resultclass(result) #class is a list#user defined functioncheck<-function(x){ return(x[x>75])}result <- lapply(list1,check) #user defined function as an argumentresultclass(result) #class is a list#case 3.

dataframe as an input argumentresult <- lapply(employee,sum) #sum of elements for each columnresultclass(result) #class is a listresult <- lapply(employee,cumsum) #cumulative sum of elements for each rowresult class(result) #class is a list#user defined function check<-function(x){ return(x[x>4.

2])}result <- lapply(employee,check) #user defined function as an argumentresultclass(result) #class is a listapply() vs.

lapply()lapply() always returns a list whereas apply() can return a vector, list, matrix or array.

No scope of MARGIN in lapply().

3.

sapply() functionsapply() is a simplified form of lapply().

It has one additional argument simplify with default value as true, if simplify = F then sapply() returns a list similar to lapply(), otherwise, it returns the simplest output form possible.

Refer to the below table for input objects and the corresponding output objects.

#———- sapply() function ———- #case 1.

vector as an input argumentresult <- sapply(ratings,mean)resultclass(result) #class is a vectorresult <- sapply(ratings,mean, simplify = FALSE)result class(result) #class is a listresult <- sapply(ratings,range)resultclass(result) #class is a matrix#case 2.

list as an input argumentresult <- sapply(list1,mean)resultclass(result) #class is a vectorresult <- sapply(list1,range)resultclass(result) #class is a matrix#user defined functioncheck<-function(x){ return(x[x>75])}result <- sapply(list1,check) #user defined function as an argumentresultclass(result) #class is a list#case 3.

dataframe as an input argumentresult <- sapply(employee,mean)resultclass(result) #class is a vectorresult <- sapply(employee,range)resultclass(result) #class is a matrix#user defined functioncheck<-function(x){ return(x[x>4])}result <- sapply(employee,check) #user defined function as an argumentresultclass(result) #class is a list4.

tapply() functiontapply() is helpful while dealing with categorical variables, it applies a function to numeric data distributed across various categories.

The simplest form of tapply() can be understood astapply(column 1, column 2, FUN)where column 1 is the numeric column on which function is applied, column 2 is a factor object and FUN is for the function to be performed.

#———- tapply() function ———- salary <- c(21000,29000,32000,34000,45000)designation<-c("Programmer","Senior Programmer","Senior Programmer","Senior Programmer","Manager")gender <- c("M","F","F","M","M")result <- tapply(salary,designation,mean)resultclass(result) #class is an arrayresult <- tapply(salary,list(designation,gender),mean)resultclass(result) #class is a matrix5.

by() functionby() does a similar job to tapply() i.

e.

it applies an operation to numeric vector values distributed across various categories.

by() is a wrapper function of tapply().

#———- by() function ———- result <- by(salary,designation,mean)resultclass(result) #class is of "by" typeresult[2] #accessing as a vector elementas.

list(result) #converting into a listresult <- by(salary,list(designation,gender),mean)resultclass(result) #class is of "by" typelibrary("gamclass")data("FARS")by(FARS[2:4], FARS$airbagAvail, colMeans)6.

mapply() functionThe ‘m’ in mapply() refers to ‘multivariate’.

It applies the specified functions to the arguments one by one.

Note that here function is specified as the first argument whereas in other apply functions as the third argument.

#———- mapply() function ———- result <- mapply(rep, 1:4, 4:1)resultclass(result) #class is a listresult <- mapply(rep, 1:4, 4:4)class(result) #class is a matrixConclusionI believe I have covered all the most useful and popular apply functions with all possible combinations of input objects.

If you think something is missing or more inputs are required.

Let me know in the comments and I’ll add it in!.

. More details

Leave a Reply