Arithmetic, Geometric, and Harmonic Means for Machine Learning

Calculating the average of a variable or a list of numbers is a common operation in machine learning.

It is an operation you may use every day either directly, such as when summarizing data, or indirectly, such as a smaller step in a larger procedure when fitting a model.

The average is a synonym for the mean, a number that represents the most likely value from a probability distribution.

As such, there are multiple different ways to calculate the mean based on the type of data that you’re working with.

This can trip you up if you use the wrong mean for your data.

You may also enter some of these more exotic calculations of mean values when using performance metrics to evaluate your model, such as the G-mean or the F-Measure.

In this tutorial, you will discover the difference between the arithmetic mean, the geometric mean, and the harmonic mean.

After completing this tutorial, you will know:Let’s get started.

Arithmetic, Geometric, and Harmonic Means for Machine LearningPhoto by Ray in Manila, some rights reserved.

This tutorial is divided into five parts; they are:The central tendency is a single number that represents the most common value for a list of numbers.

More technically, it is the value that has the highest probability from the probability distribution that describes all possible values that a variable may have.

There are many ways to calculate the central tendency for a data sample, such as the mean which is calculated from the values, the mode, which is the most common value in the data distribution, or the median, which is the middle value if all values in the data sample were ordered.

The average is the common term for the mean.

They can be used interchangeably.

The mean is different from the median and the mode in that it is a measure of the central tendency that is calculated from the data.

As such, there are different ways to calculate the mean based on the type of data.

Three common types of mean calculations that you may encounter are the arithmetic mean, the geometric mean, and the harmonic mean.

There are other means, and many more central tendency measures, but these three means are perhaps the most common (e.

g.

the so-called Pythagorean means).

Let’s take a closer look at each calculation of the mean in turn.

The arithmetic mean is calculated as the sum of the values divided by the total number of values, referred to as N.

A more convenient way to calculate the arithmetic mean is to calculate the sum of the values and to multiply it by the reciprocal of the number of values (1 over N); for example:The arithmetic mean is appropriate when all values in the data sample have the same units of measure, e.

g.

all numbers are heights, or dollars, or miles, etc.

When calculating the arithmetic mean, the values can be positive, negative, or zero.

The arithmetic mean can be easily distorted if the sample of observations contains outliers (a few values far away in feature space from all other values), or for data that has a non-Gaussian distribution (e.

g.

multiple peaks, a so-called multi-modal probability distribution).

The arithmetic mean is useful in machine learning when summarizing a variable, e.

g.

reporting the most likely value.

This is more meaningful when a variable has a Gaussian or Gaussian-like data distribution.

The arithmetic mean can be calculated using the mean() NumPy function.

The example below demonstrates how to calculate the arithmetic mean for a list of 10 numbers.

Running the example calculates the arithmetic mean and reports the result.

The geometric mean is calculated as the N-th root of the product of all values, where N is the number of values.

For example, if the data contains only two values, the square root of the product of the two values is the geometric mean.

For three values, the cube-root is used, and so on.

The geometric mean is appropriate when the data contains values with different units of measure, e.

g.

some measure are height, some are dollars, some are miles, etc.

The geometric mean does not accept negative or zero values, e.

g.

all values must be positive.

One common example of the geometric mean in machine learning is in the calculation of the so-called G-Mean (geometric mean) metric that is a model evaluation metric that is calculated as the geometric mean of the sensitivity and specificity metrics.

The geometric mean can be calculated using the gmean() SciPy function.

The example below demonstrates how to calculate the geometric mean for a list of 10 numbers.

Running the example calculates the geometric mean and reports the result.

The harmonic mean is calculated as the number of values N divided by the sum of the reciprocal of the values (1 over each value).

If there are just two values (x1 and x2), a simplified calculation of the harmonic mean can be calculated as:The harmonic mean is the appropriate mean if the data is comprised of rates.

Recall that a rate is the ratio between two quantities with different measures, e.

g.

speed, acceleration, frequency, etc.

In machine learning, we have rates when evaluating models, such as the true positive rate or the false positive rate in predictions.

The harmonic mean does not take rates with a negative or zero value, e.

g.

all rates must be positive.

One common example of the use of the harmonic mean in machine learning is in the calculation of the F-Measure (also the F1-Measure or the Fbeta-Measure); that is a model evaluation metric that is calculated as the harmonic mean of the precision and recall metrics.

The harmonic mean can be calculated using the hmean() SciPy function.

The example below demonstrates how to calculate the harmonic mean for a list of nine numbers.

Running the example calculates the harmonic mean and reports the result.

We have reviewed three different ways of calculating the average or mean of a variable or dataset.

The arithmetic mean is the most commonly used mean, although it may not be appropriate in some cases.

Each mean is appropriate for different types of data; for example:The exceptions are if the data contains negative or zero values, then the geometric and harmonic means cannot be used directly.

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered the difference between the arithmetic mean, the geometric mean, and the harmonic mean.

Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.

by writing lines of code in pythonDiscover how in my new Ebook: Statistical Methods for Machine LearningIt provides self-study tutorials on topics like: Hypothesis Tests, Correlation, Nonparametric Stats, Resampling, and much more.

Skip the Academics.

Just Results.

.

. More details

Leave a Reply