The probability for a continuous random variable can be summarized with a continuous probability distribution.
Continuous probability distributions are encountered in machine learning, most notably in the distribution of numerical input and output variables for models and in the distribution of errors made by models.
Knowledge of the normal continuous probability distribution is also required more generally in the density and parameter estimation performed by many machine learning models.
As such, continuous probability distributions play an important role in applied machine learning and there are a few distributions that a practitioner must know about.
In this tutorial, you will discover continuous probability distributions used in machine learning.
After completing this tutorial, you will know:Let’s get started.
Continuous Probability Distributions for Machine LearningPhoto by Bureau of Land Management, some rights reserved.
This tutorial is divided into four parts; they are:A random variable is a quantity produced by a random process.
A continuous random variable is a random variable that has a real numerical value.
Each numerical outcome of a continuous random variable can be assigned a probability.
The relationship between the events for a continuous random variable and their probabilities is called the continuous probability distribution and is summarized by a probability density function, or PDF for short.
Unlike a discrete random variable, the probability for a given continuous random variable cannot be specified directly; instead, it is calculated as an integral (area under the curve) for a tiny interval around the specific outcome.
The probability of an event equal to or less than a given value is defined by the cumulative distribution function, or CDF for short.
The inverse of the CDF is called the percentage-point function and will give the discrete outcome that is less than or equal to a probability.
There are many common continuous probability distributions.
The most common is the normal probability distribution.
Practically all continuous probability distributions of interest belong to the so-called exponential family of distributions, which are just a collection of parameterized probability distributions (e.
g.
distributions that change based on the values of parameters).
Continuous probability distributions play an important role in machine learning from the distribution of input variables to the models, the distribution of errors made by models, and in the models themselves when estimating the mapping between inputs and outputs.
In the following sections, will take a closer look at some of the more common continuous probability distributions.
The normal distribution is also called the Gaussian distribution (named for Carl Friedrich Gauss) or the bell curve distribution.
The distribution covers the probability of real-valued events from many different problem domains, making it a common and well-known distribution, hence the name “normal.
” A continuous random variable that has a normal distribution is said to be “normal” or “normally distributed.
”Some examples of domains that have normally distributed events include:The distribution can be defined using two parameters:Often, the standard deviation is used instead of the variance, which is calculated as the square root of the variance, e.
g.
normalized.
A distribution with a mean of zero and a standard deviation of 1 is called a standard normal distribution, and often data is reduced or “standardized” to this for analysis for ease of interpretation and comparison.
We can define a distribution with a mean of 50 and a standard deviation of 5 and sample random numbers from this distribution.
We can achieve this using the normal() NumPy function.
The example below samples and prints 10 numbers from this distribution.
Running the example prints 10 numbers randomly sampled from the defined normal distribution.
A sample of data can be checked to see if it is random by plotting it and checking for the familiar normal shape, or by using statistical tests.
If the samples of observations of a random variable are normally distributed, then they can be summarized by just the mean and variance, calculated directly on the samples.
We can calculate the probability of each observation using the probability density function.
A plot of these values would give us the tell-tale bell shape.
We can define a normal distribution using the norm() SciPy function and then calculate properties such as the moments, PDF, CDF, and more.
The example below calculates the probability for integer values between 30 and 70 in our distribution and plots the result, then does the same for the cumulative probability.
Running the example first calculates the probability for integers in the range [30, 70] and creates a line plot of values and probabilities.
The plot shows the Gaussian or bell-shape with the peak of highest probability around the expected value or mean of 50 with a probability of about 8%.
Line Plot of Events vs Probability or the Probability Density Function for the Normal DistributionThe cumulative probabilities are then calculated for observations over the same range, showing that at the mean, we have covered about 50% of the expected values and very close to 100% after the value of about 65 or 3 standard deviations from the mean (50 + (3 * 5)).
Line Plot of Events vs.
Cumulative Probability or the Cumulative Density Function for the Normal DistributionIn fact, the normal distribution has a heuristic or rule of thumb that defines the percentage of data covered by a given range by the number of standard deviations from the mean.
It is called the 68-95-99.
7 rule, which is the approximate percentage of the data covered by ranges defined by 1, 2, and 3 standard deviations from the mean.
For example, in our distribution with a mean of 50 and a standard deviation of 5, we would expect 95% of the data to be covered by values that are 2 standard deviations from the mean, or 50 – (2 * 5) and 50 + (2 * 5) or between 40 and 60.
We can confirm this by calculating the exact values using the percentage-point function.
The middle 95% would be defined by the percentage point function value for 2.
5% at the low end and 97.
5% at the high end, where 97.
5 – 2.
5 gives the middle 95%.
The complete example is listed below.
Running the example gives the exact outcomes that define the middle 95% of expected outcomes that are very close to our standard-deviation-based heuristics of 40 and 60.
An important related distribution is the Log-Normal probability distribution.
The exponential distribution is a continuous probability distribution where a few outcomes are the most likely with a rapid decrease in probability to all other outcomes.
It is the continuous random variable equivalent to the geometric probability distribution for discrete random variables.
Some examples of domains that have exponential distribution events include:The distribution can be defined using one parameter:Sometimes the distribution is defined more formally with a parameter lambda or rate.
The beta parameter is defined as the reciprocal of the lambda parameter (beta = 1/lambda)We can define a distribution with a mean of 50 and sample random numbers from this distribution.
We can achieve this using the exponential() NumPy function.
The example below samples and prints 10 numbers from this distribution.
Running the example prints 10 numbers randomly sampled from the defined distribution.
We can define an exponential distribution using the expon() SciPy function and then calculate properties such as the moments, PDF, CDF, and more.
The example below defines a range of observations between 50 and 70 and calculates the probability and cumulative probability for each and plots the result.
Running the example first creates a line plot of outcomes versus probabilities, showing a familiar exponential probability distribution shape.
Line Plot of Events vs.
Probability or the Probability Density Function for the Exponential DistributionNext, the cumulative probabilities for each outcome are calculated and graphed as a line plot, showing that after perhaps a value of 55 that almost 100% of the expected values will be observed.
Line Plot of Events vs.
Cumulative Probability or the Cumulative Density Function for the Exponential DistributionAn important related distribution is the double exponential distribution, also called the Laplace distribution.
A Pareto distribution is named after Vilfredo Pareto and is may be referred to as a power-law distribution.
It is also related to the Pareto principle (or 80/20 rule) which is a heuristic for continuous random variables that follow a Pareto distribution, where 80% of the events are covered by 20% of the range of outcomes, e.
g.
most events are drawn from just 20% of the range of the continuous variable.
The Pareto principle is just a heuristic for a specific Pareto distribution, specifically the Pareto Type II distribution, that is perhaps most interesting and on which we will focus.
Some examples of domains that have Pareto distributed events include:The distribution can be defined using one parameter:Values for the shape parameter are often small, such as between 1 and 3, with the Pareto principle given when alpha is set to 1.
161.
We can define a distribution with a shape of 1.
1 and sample random numbers from this distribution.
We can achieve this using the pareto() NumPy function.
Running the example prints 10 numbers randomly sampled from the defined distribution.
We can define a Pareto distribution using the pareto() SciPy function and then calculate properties, such as the moments, PDF, CDF, and more.
The example below defines a range of observations between 1 and about 10 and calculates the probability and cumulative probability for each and plots the result.
Running the example first creates a line plot of outcomes versus probabilities, showing a familiar Pareto probability distribution shape.
Line Plot of Events vs.
Probability or the Probability Density Function for the Pareto DistributionNext, the cumulative probabilities for each outcome are calculated and graphed as a line plot, showing a rise that is less steep than the exponential distribution seen in the previous section.
Line Plot of Events vs.
Cumulative Probability or the Cumulative Density Function for the Pareto DistributionThis section provides more resources on the topic if you are looking to go deeper.
In this tutorial, you discovered continuous probability distributions used in machine learning.
Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.
.