Summary: This module offers an introduction to Bayesian networks by means of a worked example of computing a bayesian network from a joint probability distribution (JPD).
A Bayesian network is a compression of the joint probability distribution (JPD) of a set of random variables. To illustrate the connection between Bayesian networks and classical JPDs, consider the following example.
Suppose Dr. Foo is an expert in diagnosing two different diseases, call them C and D. Suppose also that there are two different major symptoms, A and B, that Dr. Foo looks for when diagnosing C or D, which he uses to help tell the difference between them.
![]() |
Dr. Foo has been collecting data (anonymously) on his patients with diseases
C and D since he begun practicing medicine, in order to help him keep track of
the number of times each disease occurs with each of the different symptoms. For
each patient he sees with disease C or D, he makes a note of the presence or
absence of each of the four variables, A, B, C and D. From this he is easily
able to come up with a JPD for
| No Diseases | Disease C | Disease D | Both Diseases | |
| No Symptoms | 0.4192 | 0.00041958 | 0.00041958 | 0.00000042 |
| Symptom A | 0.0891 | 0.0891 | 0.0009 | 0.0009 |
| Symptom B | 0.0277 | 0.00028 | 0.2495 | 0.0025 |
| Both Symptoms | 0.0324 | 0.0756 | 0.0036 | 0.0084 |
Dr. Foo, being a clever and experienced doctor, suspects that he should be able to independently infer the probability of having either disease C or D only from the presence or absence of symptoms A and B. In order to confirm his suspicions, he does some quick calculations at his desk:
Since he believes that the probability of having disease C only depends on
symptoms A and B, he first checks that
Excited to see that his suspicion so far is holding up, he immediately checks the same thing for all the other possible combinations of the symptoms and finds that he was in fact statistically justified in claiming that diseases C and D were independent. Seeing that he is on a roll, he decides to test another suspicion that he has, namely that the presence or absence of each symptom does not seem to influence the presence or absence of the other symptom. He does indeed confirm that
and
Thus, he removes all of these redundancies from his model, and represents each of the variables only in terms of their conditional probabilites. He has performed a reduction of the model without losing any of the information he started with. This is the Bayesian network paradigm, which is to say it is the compression of the JPD through the use of conditional independence assumptions and conditional probabilities of each variable given only it's 'parents'. Here is Dr. Foo's new Bayesian network representation of his data:
![]() |
It is important to note that this representation is not unique. Namely, the orientation of the arrows connecting the variables cannot be uniquely determined from the data, since Bayes' rule states:
The only restriction on the orientation of arcs in a Bayesian network is that there be no cycles, which means that if you pick any node in the network, and follow any path along the directions of the arrows, it is not possible to end up back at the node you started at. In this case, Dr. Foo has chosen the above orientations for the arcs because of his knowledge of medicine. It will be most useful in diagnosing patients if he is able to immediately see the probability of each disease given the observed symptoms, though it is clear that using Bayes' rule he could with a little more effort determine the desired probability even if the network specified them as P(symptoms | diseases). For more of a discussion of inferring causality from data, please refer to Judea Pearl's online text "Causality".