Machine Learning is a field of computer science concerned with developing systems that can learn from data.
Like statistics and linear algebra, probability is another foundational field that supports machine learning.
Probability is a field of mathematics concerned with quantifying uncertainty.
Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data.
As such, some understanding of probability and tools and methods used in the field are required by a machine learning practitioner to be effective.
Perhaps not initially, but certainly in the long run.
In this post, you will discover some of the key resources that you can use to learn about the parts of probability required for machine learning.
After reading this post, you will know:Let’s get started.
Resources for Getting Started With Probability in Machine LearningPhoto by dragonseye, some rights reserved.
This tutorial is divided into three parts; they are:Probability is a large field of mathematics with many fascinating findings and useful tools.
Although much of the field probability may be interesting to a machine learning practitioner, not all of it is directly relevant.
Therefore, it is important to narrow the scope of the field of probability to the aspects that can directly help a practitioner.
One approach might be to review the topics in probability and select those that might be helpful or relevant.
Wikipedia has many good overview articles on the field that could be used as a starting point.
For example:Another source of topics might be those covered by top textbooks on probability written for advanced undergraduates and graduate students.
For example:This is a good start but challenging, as how can the wealth of interesting topics be effectively filtered to those most relevant to applied machine learning.
The risk of this approach is that too much time would be spent learning probability and developing too broad a foundation in the field (e.
An approach that I prefer is to review the coverage of the field of probability by top machine learning books.
The authors of these books are both experts in the field of machine learning and have used this expertise to filter the field of probability to the points most salient to the field of machine learning.
There are many excellent machine learning textbooks, but in this post, we will review some of the more popular books that you may own or have access to and can reference the relevant sections.
They are:Let’s take a closer look at each in turn.
“Machine Learning” is Tom Mitchell’s seminal 1997 book that defined the field for many practitioners and books that followed.
Probability is the focus of the following chapters of this book:This chapter is dedicated to Bayesian methods relevant to machine learning, including:“Pattern Recognition and Machine Learning” is Christopher Bishop’s masterpiece book on machine learning, building on and broadening his prior book, Neural Networks for Pattern Recognition.
It is very likely the book used by many modern practitioners that came out of a graduate degree program on machine learning.
Probability is the focus of the following chapters of this book:The second chapter is dedicated to the topic and focuses on probability distributions and sets up density estimation, covering the following topics:“Data Mining: Practical Machine Learning Tools and Techniques” by Witten and Frank (and others) has had many editions, and because of its practical nature and the Weka platform, has been many practitioners entry point into the field.
Probability is the focus of the following Chapters of this book:Section 4.
2 provides an introduction, but Chapter 9 goes into depth and covers the following topics:“Machine Learning: A Probabilistic Perspective” by Kevin Murphy from 2013 is a textbook that focuses on teaching machine learning through the lens of probability.
Probability was the focus of the following chapters of this book:Chapters 5 and 6 really focus on machine learning methods that build on Bayesian and Frequentist methods, e.
a focus on distribution estimation.
Chapter 2 is more focused on the foundations in probability required, including the subsections:“Deep Learning” is Ian Goodfellow, et al’s 2016 seminal textbook on the emerging field of deep learning.
Part I of this book is titled “Applied Math and Machine Learning Basics” and covers a range of important foundation topics required to become productive with deep learning neural networks, including probability.
Probability is the focus of the following chapters of this book:This chapter is divided into the following subsections:Reviewing the chapters and sections covered in the top machine learning books, it is clear that there are two main aspects to probability in machine learning.
There are the foundational topics that a practitioner should be familiar with in order to be effective at machine learning generally.
We might call this “probability theory for machine learning.
”Then there are machine learning methods that are explicitly constructed from tools and techniques from the field of probability.
We might call this “probabilistic methods for machine learning.
”It is not a clear division as there is a lot of overlap, but it is a good basis for a division.
These are the topics covered in books like “Deep Learning.
” They are also the basis for cheat sheets and refreshers for machine learning courses like the “Probabilities and Statistics refresher” from Stanford.
Some of the topics in probability theory for machine learning might include: probability axioms, probability distributions, probability moments, Bayes theorem, joint, marginal and conditional probability, etc.
This might also include more advanced and related topics such as: likelihood functions, maximum likelihood estimation, entropy from information theory, Monte Carlo and Gibbs Sampling for distributions, and parameter estimation.
These are the topics covered in the later chapters of “Machine Learning: A Probabilistic Perspective.
”Some topics in probabilistic methods for machine learning might include: density estimation, kernel density estimation, divergence estimation, etc.
This would also include techniques such as Naive Bayes and graphical models such as Bayesian belief networks.
What do you think?.What topics would you place on either side of this split?This section provides more resources on the topic if you are looking to go deeper.
In this post, you discovered some of the key resources that you can use to learn about the parts of probability required for machine learningSpecifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.