Skip to content Skip to navigation


You are here: Home » Content » Bayesian Networks


Recently Viewed

This feature requires Javascript to be enabled.

Bayesian Networks

Module by: Jared Flatow. E-mail the author

Summary: This module offers an introduction to Bayesian networks by means of a worked example of computing a bayesian network from a joint probability distribution (JPD).

A Bayesian network is a compression of the joint probability distribution (JPD) of a set of random variables. To illustrate the connection between Bayesian networks and classical JPDs, consider the following example.

Suppose Dr. Foo is an expert in diagnosing two different diseases, call them C and D. Suppose also that there are two different major symptoms, A and B, that Dr. Foo looks for when diagnosing C or D, which he uses to help tell the difference between them.

Figure 1
Figure 1 (jpd_1.png)

Dr. Foo has been collecting data (anonymously) on his patients with diseases C and D since he begun practicing medicine, in order to help him keep track of the number of times each disease occurs with each of the different symptoms. For each patient he sees with disease C or D, he makes a note of the presence or absence of each of the four variables, A, B, C and D. From this he is easily able to come up with a JPD for P(A, B, C, D) P(A, B, C, D) , knowing full well that these are only the relative frequencies of his past observations:

Table 1: Relative Frequency of Dr. Foo's Observations
  No Diseases Disease C Disease D Both Diseases
No Symptoms 0.4192 0.00041958 0.00041958 0.00000042
Symptom A 0.0891 0.0891 0.0009 0.0009
Symptom B 0.0277 0.00028 0.2495 0.0025
Both Symptoms 0.0324 0.0756 0.0036 0.0084

Dr. Foo, being a clever and experienced doctor, suspects that he should be able to independently infer the probability of having either disease C or D only from the presence or absence of symptoms A and B. In order to confirm his suspicions, he does some quick calculations at his desk:

Since he believes that the probability of having disease C only depends on symptoms A and B, he first checks that P(C | A, B, D) = P(C | A, B, ~D) P(C | A, B, D) P(C | A, B, ~D) He remembers from his class in Bayesian inference that

P(C | A, B, D) = P(A, B, C, D)P(A, B, C, D)+P(A, B, ~C, D) = .0084.0084+.0036 =.7 P(C | A, B, D) P(A, B, C, D) P(A, B, C, D) P(A, B, ~C, D) .0084 .0084 .0036 .7

P(C | A, B, ~D) = P(A, B, C, ~D)P(A, B, C, ~D)+P(A, B, ~C, ~D) = .0756.0756+.0324 =.7 P(C | A, B, ~D) P(A, B, C, ~D) P(A, B, C, ~D) P(A, B, ~C, ~D) .0756 .0756 .0324 .7

Excited to see that his suspicion so far is holding up, he immediately checks the same thing for all the other possible combinations of the symptoms and finds that he was in fact statistically justified in claiming that diseases C and D were independent. Seeing that he is on a roll, he decides to test another suspicion that he has, namely that the presence or absence of each symptom does not seem to influence the presence or absence of the other symptom. He does indeed confirm that

P(A | B) = P(A | ~B) =.3 P(A | B) P(A | ~B) .3


P(B | A) = P(B | ~A) =.4 P(B | A) P(B | ~A) .4

Thus, he removes all of these redundancies from his model, and represents each of the variables only in terms of their conditional probabilites. He has performed a reduction of the model without losing any of the information he started with. This is the Bayesian network paradigm, which is to say it is the compression of the JPD through the use of conditional independence assumptions and conditional probabilities of each variable given only it's 'parents'. Here is Dr. Foo's new Bayesian network representation of his data:

Figure 2
Figure 2 (bnet_1.png)

It is important to note that this representation is not unique. Namely, the orientation of the arrows connecting the variables cannot be uniquely determined from the data, since Bayes' rule states:

P(X | Y) = P(Y | X) P(X) P(Y) P(X | Y) P(Y | X) P(X) P(Y)

The only restriction on the orientation of arcs in a Bayesian network is that there be no cycles, which means that if you pick any node in the network, and follow any path along the directions of the arrows, it is not possible to end up back at the node you started at. In this case, Dr. Foo has chosen the above orientations for the arcs because of his knowledge of medicine. It will be most useful in diagnosing patients if he is able to immediately see the probability of each disease given the observed symptoms, though it is clear that using Bayes' rule he could with a little more effort determine the desired probability even if the network specified them as P(symptoms | diseases). For more of a discussion of inferring causality from data, please refer to Judea Pearl's online text "Causality".

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks