A Gentle Introduction to Bayesian Belief Networks

Probabilistic models can define relationships between variables and be used to calculate probabilities.

For example, fully conditional models may require an enormous amount of data to cover all possible cases, and probabilities may be intractable to calculate in practice.

Simplifying assumptions such as the conditional independence of all random variables can be effective, such as in the case of Naive Bayes, although it is a drastically simplifying step.

An alternative is to develop a model that preserves known conditional dependence between random variables and conditional independence in all other cases.

Bayesian networks are a probabilistic graphical model that explicitly capture the known conditional dependence with directed edges in a graph model.

All missing connections define the conditional independencies in the model.

As such Bayesian Networks provide a useful tool to visualize the probabilistic model for a domain, review all of the relationships between the random variables, and reason about causal probabilities for scenarios given available evidence.

In this post, you will discover a gentle introduction to Bayesian Networks.

After reading this post, you will know:Discover bayes opimization, naive bayes, maximum likelihood, distributions, cross entropy, and much more in my new book, with 28 step-by-step tutorials and full Python source code.

Let’s get started.

A Gentle Introduction to Bayesian Belief NetworksPhoto by Armin S Kowalski, some rights reserved.

This tutorial is divided into five parts; they are:Probabilistic models can be challenging to design and use.

Most often, the problem is the lack of information about the domain required to fully specify the conditional dependence between random variables.

If available, calculating the full conditional probability for an event can be impractical.

A common approach to addressing this challenge is to add some simplifying assumptions, such as assuming that all random variables in the model are conditionally independent.

This is a drastic assumption, although it proves useful in practice, providing the basis for the Naive Bayes classification algorithm.

An alternative approach is to develop a probabilistic model of a problem with some conditional independence assumptions.

This provides an intermediate approach between a fully conditional model and a fully conditionally independent model.

Bayesian belief networks are one example of a probabilistic model where some variables are conditionally independent.

Thus, Bayesian belief networks provide an intermediate approach that is less constraining than the global assumption of conditional independence made by the naive Bayes classifier, but more tractable than avoiding conditional independence assumptions altogether.

— Page 184, Machine Learning, 1997.

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-CourseA Bayesian belief network is a type of probabilistic graphical model.

A probabilistic graphical model (PGM), or simply “graphical model” for short, is a way of representing a probabilistic model with a graph structure.

The nodes in the graph represent random variables and the edges that connect the nodes represent the relationships between the random variables.

A graph comprises nodes (also called vertices) connected by links (also known as edges or arcs).

In a probabilistic graphical model, each node represents a random variable (or group of random variables), and the links express probabilistic relationships between these variables.

— Page 360, Pattern Recognition and Machine Learning, 2006.

There are many different types of graphical models, although the two most commonly described are the Hidden Markov Model and the Bayesian Network.

The Hidden Markov Model (HMM) is a graphical model where the edges of the graph are undirected, meaning the graph contains cycles.

Bayesian Networks are more restrictive, where the edges of the graph are directed, meaning they can only be navigated in one direction.

This means that cycles are not possible, and the structure can be more generally referred to as a directed acyclic graph (DAG).

Directed graphs are useful for expressing causal relationships between random variables, whereas undirected graphs are better suited to expressing soft constraints between random variables.

— Page 360, Pattern Recognition and Machine Learning, 2006.

A Bayesian Belief Network, or simply “Bayesian Network,” provides a simple way of applying Bayes Theorem to complex problems.

The networks are not exactly Bayesian by definition, although given that both the probability distributions for the random variables (nodes) and the relationships between the random variables (edges) are specified subjectively, the model can be thought to capture the “belief” about a complex domain.

Bayesian probability is the study of subjective probabilities or belief in an outcome, compared to the frequentist approach where probabilities are based purely on the past occurrence of the event.

A Bayesian Network captures the joint probabilities of the events represented by the model.

A Bayesian belief network describes the joint probability distribution for a set of variables.

— Page 185, Machine Learning, 1997.

Central to the Bayesian network is the notion of conditional independence.

Independence refers to a random variable that is unaffected by all other variables.

A dependent variable is a random variable whose probability is conditional on one or more other random variables.

Conditional independence describes the relationship among multiple random variables, where a given variable may be conditionally independent of one or more other random variables.

This does not mean that the variable is independent per se; instead, it is a clear definition that the variable is independent of specific other known random variables.

A probabilistic graphical model, such as a Bayesian Network, provides a way of defining a probabilistic model for a complex problem by stating all of the conditional independence assumptions for the known variables, whilst allowing the presence of unknown (latent) variables.

As such, both the presence and the absence of edges in the graphical model are important in the interpretation of the model.

A graphical model (GM) is a way to represent a joint distribution by making [Conditional Independence] CI assumptions.

In particular, the nodes in the graph represent random variables, and the (lack of) edges represent CI assumptions.

(A better name for these models would in fact be “independence diagrams” …— Page 308, Machine Learning: A Probabilistic Perspective, 2012.

Bayesian networks provide useful benefits as a probabilistic model.

For example:Designing a Bayesian Network requires defining at least three things:It may be possible for an expert in the problem domain to specify some or all of these aspects in the design of the model.

In many cases, the architecture or topology of the graphical model can be specified by an expert, but the probability distributions must be estimated from data from the domain.

Both the probability distributions and the graph structure itself can be estimated from data, although it can be a challenging process.

As such, it is common to use learning algorithms for this purpose; for example, assuming a Gaussian distribution for continuous random variables gradient ascent for estimating the distribution parameters.

Once a Bayesian Network has been prepared for a domain, it can be used for reasoning, e.

g.

making decisions.

Reasoning is achieved via inference with the model for a given situation.

For example, the outcome for some events is known and plugged into the random variables.

The model can be used to estimate the probability of causes for the events or possible further outcomes.

Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and subsequently computing probabilities of interest, conditioned on this evidence.

— Page 13, Bayesian Reasoning and Machine Learning, 2012.

Practical examples of using Bayesian Networks in practice include medicine (symptoms and diseases), bioinformatics (traits and genes), and speech recognition (utterances and time).

We can make Bayesian Networks concrete with a small example.

Consider a problem with three random variables: A, B, and C.

A is dependent upon B, and C is dependent upon B.

We can state the conditional dependencies as follows:We know that C and A have no effect on each other.

We can also state the conditional independencies as follows:Notice that the conditional dependence is stated in the presence of the conditional independence.

That is, A is conditionally independent of C, or A is conditionally dependent upon B in the presence of C.

We might also state the conditional independence of A given C as the conditional dependence of A given B, as A is unaffected by C and can be calculated from A given B alone.

We can see that B is unaffected by A and C and has no parents; we can simply state the conditional independence of B from A and C as P(B, P(A|B), P(C|B)) or P(B).

We can also write the joint probability of A and C given B or conditioned on B as the product of two conditional probabilities; for example:The model summarizes the joint probability of P(A, B, C), calculated as:We can draw the graph as follows:Example of a Simple Bayesian NetworkNotice that the random variables are each assigned a node, and the conditional probabilities are stated as directed connections between the nodes.

Also notice that it is not possible to navigate the graph in a cycle, e.

g.

no loops are possible when navigating from node to node via the edges.

Also notice that the graph is useful even at this point where we don’t know the probability distributions for the variables.

You might want to extend this example by using contrived probabilities for discrete events for each random variable and practice some simple inference for different scenarios.

Bayesian Networks can be developed and used for inference in Python.

A popular library for this is called PyMC and provides a range of tools for Bayesian modeling, including graphical models like Bayesian Networks.

The most recent version of the library is called PyMC3, named for Python version 3, and was developed on top of the Theano mathematical computation library that offers fast automatic differentiation.

PyMC3 is a new open source probabilistic programming framework written in Python that uses Theano to compute gradients via automatic differentiation as well as compile probabilistic programs on-the-fly to C for increased speed.

— Probabilistic programming in Python using PyMC3, 2016.

More generally, the use of probabilistic graphical models in computer software used for inference is referred to “probabilistic programming“.

This type of programming is called probabilistic programming, […] it is probabilistic in the sense that we create probability models using programming variables as the model’s components.

Model components are first-class primitives within the PyMC framework.

— Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference, 2015.

For an excellent primer on Bayesian methods generally with PyMC, see the free book by Cameron Davidson-Pilon titled “Bayesian Methods for Hackers.

”This section provides more resources on the topic if you are looking to go deeper.

In this post, you discovered a gentle introduction to Bayesian Networks.

Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.

Develop Your Understanding of Probability .

with just a few lines of python codeDiscover how in my new Ebook: Probability for Machine LearningIt provides self-study tutorials and end-to-end projects on: Bayes Theorem, Bayesian Optimization, Distributions, Maximum Likelihood, Cross-Entropy, Calibrating Models and much more.

Finally Harness Uncertainty in Your Projects Skip the Academics.

Just Results.

.. More details

Leave a Reply