A Not So Short Introduction to Object Oriented Programming using R

A Not So Short Introduction to Object Oriented Programming using RKoushik KhanBlockedUnblockFollowFollowingApr 10Part: IIntroductionObject Oriented Programming or simply called as OOP, is a different style of designing and solving problems using computer programs.

While designing the solution of any problem, OOP helps us to relate various components of our program with real world entities.

It makes the code more usable and easily maintainable, not only that, OOP helps to create composite data structures like linked list, trie etc.

using primitive ones like integer, float, string etc.

I will be starting this post with some theoretical basics of OOP followed by their implementations and some examples from Statistical Distributions, so that people like me who are coming from quantitative background (no so smart in Coding!) can easily relate OOP with their fundamentals.

Class – The Template of your imagination: Let’s assume, you want to build your own house and the house plan is ready.

Based on the plan, the builder will build the house.

Here the actual house where you are going to stay is the realization coming out from that plan.

In the world of programming, while developing any kind of software projects, people do have their plans as well.

These are generally plans to receive various type of data as inputs, functions to manipulate those data and also the ways to let the human interact with the software.

Such plans are called templates or classes.

Object – Meet your imagination: You will definitely meet your imagination when the builder will make the house available for your stay.

Similarly, in programming world such realization is available, which is called object.

Object is very very important in programming without which you would never meet your imagination.

Encapsulation – Wrapping up the units of your realization: Your house, the realization, typically contains various small units like rooms, doors, windows etc.

, and you can consider it as if the house is wrapping up those units or in other words those units are somehow related to each other by the entire house.

Now, if you think about any moderate to complex program, it generally contains many many variables containing data as well as functions to perform computations with those variables.

By the help of a class, these variables and functions are related to each other just like doors and windows and they all are the parts of the class.

This is known as Encapsulation.

Variables and functions defined within a class are called attributes (properties) and methods.

When you create an object using the class, all of those attributes and methods will be available for it.

Inheritance: Generally, in a real life scenario you can have multiple houses, but many of them can contain same designs like same colour of the bedroom (your favorite colour of course), same floor area of kitchen etc.

This actually means that you are following (rather mimicking) some (or all) of the designs of your past house into future ones.

In other words, you can say, the houses you have built later on, are inheriting some (or all) of the designs from the initial house.

While creating multiple classes, the programmer try to avoid writing same functions or incorporating same variables again and again.

The concept inheritance allows a programmer to have all the attributes or methods available in parent class, in the child class as well.

Polymorphism: The literal meaning of the word is something that is available in various forms but with the same name or any identifier.

For example, your house could have multiple washroom, they might be different in terms of internal designs but they all are washrooms.

Similarly, two or multiple classes can can contain one or more functions with same name but performing different activities.

This idea is known as Polymorphism.

ImplementationAs the title suggests, I will be covering the aforesaid concepts using R.

Although R is not a language where the analytics people generally implement OOP mostly because of the lack of awareness or simply not to follow the hard way.

But R has several ecosystem for implementing OOP and eventually designing analytics solutions.

One of the popular OOP ecosystems in R is R6.

It’s a third party package developed by Winston Chang from the RStudio Team.

It provides ways to create classes in R.

Before implementation, let’s discuss about the problem —“Suppose, for curiosity, you want to generate random samples from various Gaussian Distribution parametrized by mean (μ) and standard deviation (σ).

For example: if you put some specific values of μ and σ, then you have a specific Gaussian population and you want to have a random sample out of it.

”Let’s design the class to solve the problem:library(R6)library(ggplot2)Simulator <- R6Class("SimulatorClass", public = list( mu = NA, sigma = NA, n_sample = NA, sample = NA, initialize = function(mu, sigma, n_sample) { # Object Initiator # Args # mu: mean of the distribution # sigma: std.

dev.

of the distribution # n_sample: size of the sample self$mu <- mu self$sigma <- sigma self$n_sample <- n_sample }, get_sample_size = function() { # getter method return (self$n_sample) }, set_sample_size = function(size) { # setter method self$n_sample = size }, generate_sample = function() { # Generates the random sample self$sample <- rnorm(self$n_sample, self$mu, self$sigma) return (self$sample) }, compute_stats = function() { # Computes basic statistics of the sample r <- list(sample.

mean = mean(self$sample), sample.

sd = sd(self$sample)) return (r) }, plot_histogram = function(binwidth) { p <- qplot(self$sample, geom = "histogram", breaks = seq(130, 200, binwidth), colour = I("black"), fill = I("white"), xlab = "X", ylab = "Count") + stat_function( fun = function(x, mean, sd, n, bw){ dnorm(x = x, mean = mean, sd = sd) * n * bw }, args = c(mean = self$mu, sd = self$sigma, n = self$n_sample, bw = binwidth) ) return (p) } ))Here I have the class ready.

Now let’s discuss on each and every units (variables and functions) within it.

Class name: First thing is the usage of the function R6Class(), it allows us to create the class and define it’s units.

The name of the class is “SimulatorClass” and the variable “Simulator” is used to store the address of the class created somewhere in the memory.

‘public’ keyword: The public keyword is used to allow the user to interact with the components that it contains, like mu, sigma, n_sample, sample and all the functions.

On the contrary, if we would have used private, then the user won’t be able to interact / use with them.

It is customary to use private keyword to hide important components of the program from the user.

It avoids unintentional fatal error during execution.

Attributes: We have used four attributes in our class and these are mu, sigma, n_sample and sample.

The Constructor: Each class does have it’s constructor.

A constructor is a simple function that is first executed while creating an object of the class.

So if you provide some of the attributes while creating the object, it will pass those attributes to the internal attributes.

Like, we have provided mu, sigma and n_sample through our constructor to initiate a specific Gaussian Distribution.

Note one thing, if you don’t write your constructor explicitly, the OOP system creates a default constructor for it.

‘self’ keyword: ‘self’ keyword is used to represent a placeholder of our object while we are designing the class itself.

When you create an object of the class, internally ‘self’ is replaced by the object name.

Getter and Setter methods: These are the utility functions to let the user know about internal data (attributes) or to change them as per users’ choice externally.

For example, after creating an object, we can get the n_sample using get_sample_size(), that is currently being used or we can change n_sample by using set_sample_size().

‘generate_sample’ method: This is one of the functions to achieve our goal.

It generates the random sample of a specified size from a specified Gaussian Distribution.

‘compute_stats’ method: This function is used to compute the sample mean and sample standard deviation.

‘plot_histogram’ method: This function is used to create a frequency type histogram from the sample you have generated.

It has a parameter, binwidth, which is used to set the width of bins in the histogram.

Usage of the classSo finally we are in a position to use our class.

sm <- Simulator$new(mu=165, sigma=6.

6, n_sample=1000) #creates a new object of the classr <- sm$generate_sample() #generates a random sample basic.

stats <- sm$compute_stats() #computes basic statistics of the samplep <- sm$plot_histogram(binwidth=2) #creates the histogram of the sampleThe histogram above is showing the nature of the sample that we have just generated.

That’s all.

We have solved our problem using OOP and I hope now you can relate the steps of solving problem with the real world.

I have kept the inheritance and polymorphism parts separated, will share it in the next post.

Please share your thoughts through the comments section.

It will help me as well to add or do more research if I am missing something.

Happy Coding!.

. More details

Leave a Reply