By thinking of conditioning as a restriction on the size of the event space, we can measure the conditional probability of A given B asℙ(A| B) = size(A ∩ B)/size(B).We can make this even more intuitive by remembering that the probability of any event is given by the size of that event relative to the whole set, namely:ℙ(B) = size(B)/size(Ω).A slight hand rearrangement gives:size(B) = size(Ω)ℙ(B),which we can do since size(Ω) is always positive..We can get a similar formula for the set A ∩ B..Plugging both formulas back into our expression for the conditional probability givesℙ(A| B) = size(Ω)ℙ(A ∩ B)/size(Ω)ℙ(B),which simplifies toℙ(A| B) = ℙ(A ∩ B)/ℙ(B).Tying Back to IntuitionIntuitively this says that the probability of A given B is the probability of A and B divided by the probability of just B..By deriving this formula in the example, we understand this division as the shrinking of the event space..Mathematically this shrinking action is referred to as the projection of the event space onto the conditioning space.Immediately we know from elementary mathematics that the probability of B cannot be zero to avoid a division error..Indeed this reinforces our intution since we cannot condition on an impossible event!Finally we can view the intersection as a restriction of our event of interest to the conditioning event..Indeed, the intersection of A and B can be thought of as the projection of the set A onto the set B, which is the conditioning space.All in all, we can think of conditional probability as probability which is projected onto some (smaller) conditioning space.Bayes’ Theorem: The Fundamental Property of ProbabilityIt turns out that the intersection is symmetric: the projection of A onto B is identical to the projection of B onto A..Indeed the diagrams from earlier reinforce this.Based on the same procedure we could have just as easily derived the conditional probabilityℙ(B| A) = ℙ(A ∩ B)/ℙ(A).Once again, since ℙ(A) is positive and cannot be zero, we can use a mathematical slight of hand to derive an expression for the probability of the intersection:ℙ(A ∩ B) = ℙ(B| A)ℙ(A).This says that the probability of A and B is the conditional probability of B given A times the probability of just A.Intuitively if we take the conditional probability of B given that A has already happened, and we factor in the probability of A, we must be left with the probability of both.But we know from earlier that if we have the probability of the intersection, we can project the conditioning space of B to calculate the inverse probability: the conditional probability of A given B, which is given by:ℙ(A | B) = ℙ(A ∩ B)/ℙ(B)} = ℙ(B | A)ℙ(A)/ℙ(B).This final result is known as Bayes’ Theorem.In this formulation, if A is our event of interest and B is the conditioning event, then the quantityℙ(A |B)is known as the posterior probability of A given B.The quantityℙ(B |A)is known as the likelihood of A given B.Finally the quantityℙ(A)is simply known as the prior probability of A.Intuitively the prior of A is the raw probability of A before any other event is taken into consideration..The likelihood, although taken as a conditional probability of B, can be thought of as a measure of how likely it is that A depends on B..To understand why, recall that we derived the likelihood’s place in the formula by considering the symmetry of the intersection..Finally the posterior is the final probability of A after conditioning on B.The Importance of Bayes’ TheoremFundamentally, Bayes’ Theorem gives us a way of measuring conditional probability by taking into consideration our total uncertainty..This is the point I promised to get earlier this post.. More details
- Book Review: Python Machine Learning – Third Edition by Sebastian Raschka, Vahid Mirjalili
- Gartner’s 2020 Magic Quadrant for Data Science and Machine Learning Tools – check out the new Leaders!
- What Does it Mean to Deploy a Machine Learning Model?
- Tomorrow’s Machine Learning Today: Topological Data Analysis, Embedding, and Reinforcement Learning