Supervised machine learning is often described as the problem of approximating a target function that maps inputs to outputs.

This description is characterized as searching through and evaluating candidate hypothesis from hypothesis spaces.

The discussion of hypotheses in machine learning can be confusing for a beginner, especially when “hypothesis” has a distinct, but related meaning in statistics (e.

g.

statistical hypothesis testing) and more broadly in science (e.

g.

scientific hypothesis).

In this post, you will discover the difference between a hypothesis in science, in statistics, and in machine learning.

After reading this post, you will know:Let’s get started.

A Gentle Introduction to Hypotheses in Machine LearningPhoto by Bernd Thaller, some rights reserved.

This tutorial is divided into four parts; they are:A hypothesis an explanation for something.

It is a provisional idea, an educated guess that requires some evaluation.

A good hypothesis is testable; it can be either true or false.

In science, a hypothesis must be falsifiable, meaning that there exists a test whose outcome could mean that the hypothesis is not true.

The hypothesis must also be framed before the outcome of the test is known.

… not any hypothesis will do.

There is one fundamental condition that any hypothesis or system of hypotheses must satisfy if it is to be granted the status of a scientific law or theory.

If it is to form part of science, an hypothesis must be falsifiable.

— Pages 61-62, What Is This Thing Called Science?, Third Edition, 1999.

A good hypothesis fits the evidence and can be used to make predictions about new observations or new situations.

The hypothesis that best fits the evidence and can be used to make predictions is called a theory, or is part of a theory.

Much of statistics is concerned with the relationship between observations.

Statistical hypothesis tests are techniques used to calculate a critical value called an “effect.

” The critical value can then be interpreted in order to determine how likely it is to observe the effect if a relationship does not exist.

If the likelihood is very small, then it suggests that the effect is probably real.

If the likelihood is large, then we may have observed a statistical fluctuation, and the effect is probably not real.

For example, we may be interested in evaluating the relationship between the means of two samples, e.

g.

whether the samples were drawn from the same distribution or not, whether there is a difference between them.

One hypothesis is that there is no difference between the population means, based on the data samples.

This is a hypothesis of no effect and is called the null hypothesis and we can use the statistical hypothesis test to either reject this hypothesis, or fail to reject (retain) it.

We don’t say “accept” because the outcome is probabilistic and could still be wrong, just with a very low probability.

… we develop a hypothesis and establish a criterion that we will use when deciding whether to retain or reject our hypothesis.

The primary hypothesis of interest in social science research is the null hypothesis— Pages 64-65, Statistics In Plain English, Third Edition, 2010.

If the null hypothesis is rejected, then we assume the alternative hypothesis that there exists some difference between the means.

Statistical hypothesis tests don’t comment on the size of the effect, only the likelihood of the presence or absence of the effect in the population, based on the observed samples of data.

Machine learning, specifically supervised learning, can be described as the desire to use available data to learn a function that best maps inputs to outputs.

Technically, this is a problem called function approximation, where we are approximating an unknown target function (that we assume exists) that can best map inputs to outputs on all possible observations from the problem domain.

An example of a model that approximates the target function and performs mappings of inputs to outputs is called a hypothesis in machine learning.

The choice of algorithm (e.

g.

neural network) and the configuration of the algorithm (e.

g.

network topology and hyperparameters) define the space of possible hypothesis that the model may represent.

Learning for a machine learning algorithm involves navigating the chosen space of hypothesis toward the best or a good enough hypothesis that best approximates the target function.

Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set.

— Page 695, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

This framing of machine learning is common and helps to understand the choice of algorithm, the problem of learning and generalization, and even the bias-variance trade-off.

For example, the training dataset is used to learn a hypothesis and the test dataset is used to evaluate it.

A common notation is used where lowercase-h (h) represents a given specific hypothesis and uppercase-h (H) represents the hypothesis space that is being searched.

The choice of algorithm and algorithm configuration involves choosing a hypothesis space that is believed to contain a hypothesis that is a good or best approximation for the target function.

This is very challenging, and it is often more efficient to spot-check a range of different hypothesis spaces.

We say that a learning problem is realizable if the hypothesis space contains the true function.

Unfortunately, we cannot always tell whether a given learning problem is realizable, because the true function is not known.

— Page 697, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

It is a hard problem and we choose to constrain the hypothesis space both in terms of size and in terms of the complexity of the hypotheses that are evaluated in order to make the search process tractable.

There is a tradeoff between the expressiveness of a hypothesis space and the complexity of finding a good hypothesis within that space.

— Page 697, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

We can summarize the three definitions again as follows:We can see that a hypothesis in machine learning draws upon the definition of a hypothesis more broadly in science.

Just like a hypothesis in science is an explanation that covers available evidence, is falsifiable and can be used to make predictions about new situations in the future, a hypothesis in machine learning has similar properties.

A hypothesis in machine learning:Did this post clear up your questions about what a hypothesis is in machine learning?.Let me know in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

In this post, you discovered the difference between a hypothesis in science, in statistics, and in machine learning.

Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.

.