Statistics is the Grammar of Data Science — Part 2

To visualise the probability, we plot the dataset as a curve.

The area under the curve between two points corresponds to the probability that the variable falls between those two values.

Courtesy: scipy.

stats.

gennorm✏️ The ‘infamous’ bell-shaped Standard Normal DistributionTo understand this a bit more, we will examine a special case of PDF:Courtesy: Princeton UniversityBetween the mean and one standard deviation (1σ) there is 34.

1% possibility of a value landing in that range.

So for a given value there is 68.

2% chance to fall between -1σ and 1σ — which is very likely!!!What this means is that there is a concentration of values near the mean and as we get out beyond the one standard deviations (+-), the probability gets smaller and smaller.

Probability Mass Function (PMF)When it comes to discrete data, the PMF is the measure that gives us the probability of a given value occurring.

To visualise the probability, we plot the dataset as a histogram.

Courtesy: scipy.

stats.

rv_discreteContinuous Data DistributionsNow that we understand the distinction between PDF and PMF, we will see the most frequently occurring distribution types, starting with the continuous ones.

#PDF-1: Uniform / Rectangular DistributionA uniform distribution means there is a flat constant probability of a value occurring within a given range, and is concerned with events that are equally likely to occur.

Courtesy: scipy.

stats.

uniformIn the plot above, we do not expect to see anything outside the range of under 0.

0 or beyond 1.

0.

But within this range, we have a flat line because there is a constant probability of any one of those values occurring.

#PDF-2: Normal / Gaussian distributionWe saw a standard normal distribution when we explored what PDF is.

If we introduce the ‘random’ element, a normal distribution does not look like a perfect curve, but more like the following example.

Courtesy: scipy.

stats.

normThe mean for the standard normal distribution is zero, and the standard deviation is one.

#PDF-3: Exponential Probability DistributionAnother distribution function that is often encountered is the exponential probability distribution function, where things fall off in an exponential manner.

Courtesy: scipy.

stats.

exponThere are fewer large values and more small values, i.

e.

it is very likely for something to happen, near zero, but then, as we get further away from it, it drops off dramatically.

An every-day example is the amount of money customers spend in one trip to the supermarket: there are more people who spend small amounts of money and fewer people who spend large amounts of money.

It is widely used to model the time elapsed between events as well as the reliability, which deals with the amount of time a product lasts, e.

g.

the amount of time (beginning now) in months, a car battery lasts.

Discrete Data DistributionsAs for the discrete probability distributions the two main categories include:#PMF-1: Binomial DistributionLet’s consider an experiment having two possible outcomes: either success or failure.

Suppose the experiment is repeated several times and the repetitions are independent of each other.

The total number of experiments where the outcome turns out to be a success is a random variable whose distribution is called binomial distribution.

Courtesy: scipy.

stats.

binom#PMF-2: Poisson DistributionThe Poisson distribution gives the probability of a number of events occurring in a fixed interval of time, if these events happen:with a known average rate andindependently of the time since the last event.

Courtesy: scipy.

stats.

poissonA classic example here is the number of phone calls received by a call centre.

 — Or — If we know the average number of things that happen in a given time period, another use-case can be to predict the odds of getting another value instead, on a given future time.

E.

g.

my Medium posts get an average of 1,000 views per day; I can use the Poisson probability mass function to estimate the probability of having 1,500 visits.

This brings today’s article to an end!.I hope this was useful to understand what the various types of distributions are and how they look like, even if I ‘hid’ all the mathematical types!Thanks for reading!.Part 3 is coming soon…I regularly write about Technology & Data on Medium — if you would like to read my future posts then please ‘Follow’ me!.. More details

Leave a Reply