Such a likelihood is impossible to measure for frequentist statistics, so when the question (central in science) is posed “How likely is it that this hypothesis is true, given these data?”, a very theoretically weak technique is used: the p-value.A graphical explanation of the p-value (Credit: Repapetilto & Chen-Pan Liao @ Wikipedia)The p-value can be (inaccurately) thought of as the answer to the question “How likely would be the data I collected, given that my hypothesis was wrong?”, the idea being that if it is really small, then maybe the hypothesis is true..To explain the exact nature of the error here, an important theorem in probability is needed, which I will discuss soon.A different approach can be taken, which assumes little to nothing about the nature of uncertainty and probability, and instead focuses its effort on producing the best prediction possible for a given task..This is the focus of supervised learning (SL), a type of machine learning (ML), which focuses on predicting a response variable y given a set of input variables (AKA features) x, observed on a dataset.Mathematically, SL algorithms are trying to estimate the expected value of the response variable given the input variables, as a function of them, by adjusting parameters through observations of these variables..Many powerful methods have been devised to perform this task, and one must choose among a diversity of them depending on the nature of variables, dimensionality, and complexity of the phenomena which produces the data, among other things.An example of an SL task, solved by Linear Regression (Credit: Sewaqu)Because they are designed to do well on this problem, SL algorithms typically can’t deal with another type of questions..For example, one might wish to ask, given the input variables, how likely is it that the response rises above a given threshold.. More details