8 Useful R Packages for Data Science You Aren’t Using (But Should!)

Go ahead and play around with different types of plots – it’s an eye-opening experience.

  Machine Learning Ah, building machine learning models in R.

The holy grail we data scientists strive for when we take up new machine learning projects.

You might have used the ‘caret’ package for building models before.

Now, let me introduce you to a few under-the-radar R packages that might change the way you approach the model building process.

  MLR – Machine Learning in R One of the biggest reasons Python surged ahead of R was thanks to its machine learning focused libraries (like scikit-learn).

For a long time, R lacked this ability.

Sure you could use different packages for performing different ML tasks but there was no one package that could do it all.

We had to call three different libraries for building three different models.

Not ideal.

And then the MLR package came along.

It is an incredible package which allows us to perform all sorts of machine learning tasks.

MLR includes all the popular machine learning algorithms we use in our projects.

I strongly recommend going through the below article to deep dive into MLR: Practicing Machine Learning Techniques in R with MLR Package Let’s see how to install MLR and build a random forest model on the iris dataset: install.

packages(“mlr”) library(mlr) # Load the dataset data(iris) # create task task = makeClassifTask(id = ”iris”, iris, target = ”Species”) # create learner learner = makeLearner(”classif.

randomForest”) # build model and evaluate holdout(learner, task) # measure accuracy holdout(learner, task, measures = acc) Output: Resample Result Task: iris Learner: classif.

randomForest Aggr perf: acc.



9200000 # 92% accuracy – not bad!.Runtime: 0.

0239332   parsnip A common issue with different functions available in R (that do the same thing) is that they can have different interfaces and arguments.

Take the random forest algorithm for example.

The code you would use in the randomforest package and the caret package are different, right?.Like MLR, parsnip removes the problem of referring to multiple packages for a certain machine learning algorithm.

It successfully imitates Python’s scikit-learn package in R.

Let’s look at the below simple example to give you an insight into how parsnip works for a linear regression problem: install.

packages(“parsnip”) library(parsnip) #Load the dataset data(mtcars) #Build a linear regression model fit <- linear_reg(“regression”) %>% set_engine(“lm”) %>% fit(mpg~.

,data=mtcars) fit #extracts the coefficient values Output: parsnip model object Call: stats::lm(formula = formula, data = data) Coefficients: (Intercept) cyl disp hp drat wt qsec 12.

30337 -0.

11144 0.

01334 -0.

02148 0.

78711 -3.

71530 0.

82104 vs am gear carb 0.

31776 2.

52023 0.

65541 -0.

19942   Ranger Ranger is one of my favorite R packages.

I regularly use random forests to build baseline models – especially when I’m participating in data science hackathons.

Here’s a question – how many times have you encountered slow random forest computation for huge datasets in R?.It happens way too often on my old machine.

Packages like caret, random forests and rf take a lot of time to compute the results.

The ‘Ranger’ package accelerates our model building process for the random forest algorithm.

It helps you swiftly create a large number of trees in less amount of time.

Let’s code a random forest model using Ranger: install.

packages(“ranger”) #Load the Ranger package require(ranger) #Load the dataset data(iris) ## Classification forest ranger(Species ~ .

, data = iris,num.

trees=100,mtry=3) ## Prediction train.

idx <- sample(nrow(iris), 2/3 * nrow(iris)) iris.

train <- iris[train.

idx, ] iris.

test <- iris[-train.

idx, ] rg.

iris <- ranger(Species ~ .

, data = iris.

train) pred.

iris <- predict(rg.

iris, data = iris.

test) #Build a confusion matrix table(iris.

test$Species, pred.

iris$predictions) Output: setosa versicolor virginica setosa 16 0 0 versicolor 0 16 2 virginica 0 0 16 Quite an impressive performance.

You should try out Ranger on more complex datasets and see how much faster your computations become.

  Other Awesome R Packages Let’s look at some other packages that don’t necessarily fall under the ‘machine learning’ umbrella.

I have found these useful in terms of working with R in general.

  rtweet Sentiment analysis is one of the most popular applications of machine learning.

It’s an inescapable reality in today’s digital world.

And Twitter is a prime target for extracting Tweets and building models to understand and predict sentiment.

Now, there are a few R packages for extracting/scraping Tweets and performing sentiment analysis.

The ‘rtweet’ package does the same.

So how is it different from the other packages out there?.‘rtweet’ also helps you post or delete a bunch of tweets from R itself.

Awesome!.# install rtweet from CRAN install.

packages(“rtweet”) # load rtweet package library(rtweet) You don’t even to obtain a developer account to access Twitter’s API.

You can search for tweets with certain hashtags simply by the line of code mentioned below.

Let’s try and search for all the tweets with the hashtag  #avengers since Infinity War is all set for release.

#1000 tweets with hashtag avengers tweets <- search_tweets( “#avengers”, n = 1000, include_rts = FALSE) You can even access the user IDs of people following a certain page.

Let’s see an example: ## get user IDs of accounts following marvel marvel_flw <- get_followers(“marvel”, n = 20000) You can do a whole lot more with this package.

Try it out and do not forget to update the community if you find something exciting.

  Installr Do you update your R packages individually?.It can be a tedious task, especially when there are multiple packages at play.

The ‘InstallR’ package allows you to update R and all its packages using just one command! Instead of checking the latest version of every package, we can use InstallR to update all the packages in one go.

# installing/loading the package: if(!require(installr)) { install.

packages(“installr”); require(installr)} #load / install+load installr   # using the package: updateR() # this will start the updating process of your R installation.

# It will check for newer versions, and if one is available, will guide you through the decisions youd need to make   GitHubInstall – An Easy Way to Install R Packages from GitHub Which package do you use for installing libraries from GitHub?.Most of us relied on the ‘devtools’ package for a long time.

It seemed to be the only way.

But there was a caveat – we needed to remember the developer’s name to install a package: install_github(“DeveloperName/PackageName”) With the ‘githubinstall’ package, the developer name is no longer required.


packages(“githubinstall”) #Install any GitHub package by supplying the name githubinstall(“PackageName”) #githubinstall(“AnomalyDetection”) The package also provides a few helpful functions for R packages hosted on GitHub.

I suggested checking out the package documentation (linked above) for more details.

  End Notes This is by no means an exhaustive list.

There are plenty of other R packages which serve useful functions but have been overlooked by the majority.

Do you know of any packages that I have missed in this article?.Or have you used any of the above-mentioned ones for your project?.I would love to hear from you!.Connect with me in the comments section below and let’s talk R!.You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.

adsbygoogle || []).

push({});.. More details

Leave a Reply