As we flip the coin, we will observe a roughly equal number of heads and tails, and the more we flip, the more confident we become that the coin is fair.

So the probability density function grows sharper and sharper at p = 50%.

The Effect of PriorNow consider the case where the coin is biased 20% towards head, and we start with an uninformative prior.

It instantly emerges up at and centers around p = 20%.

This is not surprising because we are not biased in our prior belief.

Note that I also simulated the complement event (p = 80%).

This is an important property of Beta distribution: each trial can only end up with two possible outcomes.

If we start with a prior belief that the coin is fair, we’ll see that the peak (mode) of Beta distribution converges more slowly to the true values.

The stronger our prior belief, the more slowly we’ll accept the truth if they differ.

In the case below, the prior is strong enough (???? = ???? = 100) that we are unable to converge to the true values in 1000 iterations.

If the prior is not symmetric, we’ll see that the Beta closer to the prior converges faster (and its peak higher) than the one further from prior.

This is intuitive: if the reality is consistent with what we believe, we gladly accept it and become more confident with what we believe.

In contrast, we are slower to accept the truth if it contradicts what we believe.

Baseball Batting StatisticsFor a concrete example of real-world application, let’s consider the baseball batting average (adapted from this post).

The national batting average is 0.

27.

If some new player joins the game with no records on prior performance, we may compare him to the national average to see if he is any good.

The prior is formulated as Beta(⍺=81, β=219) to give the 0.

27 expectation.

As he swings his bat, we update ⍺ and β along the way.

After 1000 bats, we observe 300 hits and 700 misses.

The posterior becomes Beta(⍺=81 + 300, β=219 + 700), with expectation 381 / (381 + 919) = 0.

293.

SummaryThis post introduces Beta distribution and demystifies its basic properties using simulation and visualization.

You should have built up some intuition on what it means to describe probability with a probability distribution, how prior interacts with evidence, and how it relates to real-world scenarios.

Bayesian updating is a very powerful concept and has a wide range of applications in business intelligence, signal filtering, and stochastic process modeling.

I will study some of those applications in the near future.

The next post will be a close inspection on Google Analytics’ multi-armed bandit experiment (the first animation in this post actually comes from an 8-armed bandit experiment).

Stay tuned!.