how about if the distribution is Poisson?How to calculate the inter-quartile range of a series of data points?How to generate few random numbers following a Student’s t-distribution?R programming environment allows you do just that.
On the other hand, Python scripting ability allows an analyst to use those statistics in a wide variety of analytics pipeline with limitless sophistication and creativity.
To combine the advantage of both worlds, one needs a simple Python-based wrapper library which contains most commonly used functions pertaining to probability distributions and descriptive statistics defined in R-style so that users can call those functions real fast without having to go to the proper Python statistical libraries and figure out the whole list of methods and arguments.
A Python wrapper script for most convenient R-functionsI wrote a Python script to define the most convenient and widely used R-functions in simple statistical analysis — in Python.
After importing this script you will be able to use those R-functions naturally just like in a R programming environment.
Goal of this script is to provide simple Python sub-routines mimicking R-style statistical functions for quickly calculating density/point estimates, cumulative distributions, quantiles, and generating random variates for various important probability distributions.
To maintain the spirit of R styling, no class hierarchy was used and just raw functions are defined in this file so that user can import this one Python script and use all the functions whenever he/she needs them with a single name call.
Note, I use the word mimic.
Under no circumstance, I am claiming to emulate the true functional programming paradigm of R which consists of deep environmental setup and complex inter-relationships between those environments and objects.
This script just allows me (and I hope countless other Python users too) to quickly fire up a Python program or Jupyter notebook, import the script, and start doing simple descriptive statistics in no time.
That’s the goal, nothing more, nothing less.
Or, you may have coded in R in your grad school and just starting out to learn and use Python for data analysis.
You will be happy to see and use some of the same well-known functions in your Jupyter notebook in the similar manner that you have used in R environment.
Whatever the reason may be, it is fun :-)Simple ExamplesTo start just import the script and start working with lists of numbers as if they were data vectors in R.
from R_functions import *lst=[20,12,16,32,27,65,44,45,22,18]<more code, more statistics.
>For example, you want to calculate Tuckey five number summary from a vector of data points.
You just call one simple function fivenum and pass on the vector.
It will return the five-number summary in a Numpy array.
lst=[20,12,16,32,27,65,44,45,22,18]fivenum(lst)> array([12.
, 18.
5, 24.
5, 41.
, 65.
])Or, you want to know the answer to the following question.
Suppose a machine outputs 10 finished goods per hour on average with a standard deviation of 2.
The output pattern follows a near normal distribution.
What is the probability that the machine will output at least 7 but no more than 12 units in the next hour?The answer is essentially this,You can obtain the answer with just one line of code using pnorm…pnorm(12,10,2)-pnorm(7,10,2)> 0.
7745375447996848Or, the following,Suppose you have a loaded coin with probability of turning head up 60% every time you toss it.
You are playing a game of 10 tosses.
How do you plot and map out the chances of all the possible number of wins (from 0 to 10) with this coin?You can obtain a nice bar chart with just few lines of code and using just one function dbinom…probs=[]import matplotlib.
pyplot as pltfor i in range(11): probs.
append(dbinom(i,10,0.
6))plt.
bar(range(11),height=probs)plt.
grid(True)plt.
show()Simple interface for probability calculationsR is amazing to offer an extremely simplified and intuitive interface for quick calculation from essential probability distributions.
The interface goes like this…d{distirbution} — gives the density function value at a point xp{distirbution} — gives the cumulative value at a point xq{distirbution} — gives the quantile function value at a probability pr{distirbution} — generates one or multiple random variateIn our implementation, we stick to this interface and associated argument list so that you can execute these functions exactly like in a R environment.
Currently implemented functionsCurrently, following R-style functions are implemented in the script for fast calling.
Mean, median, variance, standard deviationTuckey five-number summary, IQRCovariance of a matrix or between two vectorsDensity, cumulative probability, quantile function, and random variate generation for following distributions — normal, uniform, binomial, Poisson, F, Student’s-t, Chi-square, Beta, and Gamma.
Work in progress…Obviously, this is a work in progress and I plan to add some more convenient R-functions to this script.
For example, in R single line of command lm can get you a ordinary least-square fitted model to a numerical data set with all the necessary inferential statistics (P-values, standard error, etc.
).
This is powerfully brief and compact! On the other hand, standard linear regression problems in Python is often tackaled using Scikit-learn which needs bit more scripting to accomplish this.
I plan to incorporate this single function linear model fitting feature using Python’s statsmodels backend.
If you like this script and find use for it in you work, please star/fork my GitHub repo and spread the news.
If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.
com.
Also, you can check author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources.
If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.
If you liked this article, please don’t forget to leave a clap :-).