# Connexions

You are here: Home » Content » Introduction to Bioinformatics » Probabilistic Boolean and Bayesian Networks

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection (Course):

Course by: Ewa Paszek. E-mail the author

# Probabilistic Boolean and Bayesian Networks

Module by: Ewa Paszek. E-mail the author

Summary: This course is a short series of lectures on Statistical Bioinformatics. Topics covered are listed in the Table of Contents. The notes were prepared by Ewa Paszek, Lukasz Wita and Marek Kimmel. The development of this course has been supported by NSF 0203396 grant.

## Probabilistic Boolean Networks

In a Boolean network, each (target) gene is ‘predicted’ by several other genes by means of a Boolean function (predictor). Thus, after having inferred such a function from gene expression data, it could be concluded that if we observe the values of the predictive genes, we know, with full certainty, the value of the target gene. Conceptually, such an inherent determinism seems problematic as it assumes an environment with no uncertainty. However, the data that used for the inference exhibits uncertainty on several levels.

Another class model called Probabilistic Boolean Networks (PBNs) (Shmulevich et al., 2002) shares the appealing properties of Boolean networks, but is able to cope with uncertainty, both in the data and the model selection. A model incorporates only a partial description of a physical system. This means that a Boolean function giving the next state of a variable is likely to be only partially accurate.

The basic idea is to extend the Boolean network to accommodate more than one possible function for each node. Thus, to every node xi. , their corresponds a set Fi={ fj },j=1,..., l(i), Where each fj is a possible function determining the value of gene xi and l(i) is the number of possible functions for gene xi. A realization of the PBN at a given instant of time is determined by a vector of Boolean functions, where the ith element of that vector contains the predictor selected at that instant for gene xi. In other words, the vector function fk:{0,1}^n mapps to {0,1}^n acts as a transition function (mapping) representing a possible realization of the entire PBN. Such functions are commonly referred to as multiple-output Boolean functions Each of the N possible realizations can be thought of as a standard Boolean network operates for one time step. In other words, at every state x(t) belongs to {0,1}^n, one of the N Boolean networks is chosen and used to make the transition to the next state x(t+1) belongs to {0,1}^n . The probability Pi that the ith (Boolean) network or realization is selected can be easily expressed in terms of the individual selection probabilities Cj see (Shmulevich et al., 2002). The dynamics of the PBN are essentially the same as for Boolean networks, but at any given point in time, the value of each node is determined by one of the possible predictors, chosen according to its corresponding probability.This can be interpreted by saying that at any point in time, we have one out of N possible networks. The basic building block of a PBN is shown in the Figure1.

## Bayesian Networks

The well-studied statistical tool, Bayesian networks (Friedman et al.,2000; Pearl, 1988), represent the dependence structure between multiple interacting quantities (e.g., expression levels of different genes). Bayesian networks are a promising tool for analyzing gene expression patterns. First, they are particularly useful for describing processes composed of locally interacting components; that is, the value of each component directly depends on the values of a relatively small number of components. Second, statistical foundations for learning Bayesian networks from observations, and computational algorithms to do so, are well understood and have been used successfully in many applications. Finally, Bayesian networks provide models of causal influence: Although Bayesian networks are mathematically defined strictly in terms of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal influence. (Heckermanet al., 1999; Pearl and Verma, 1991; Spirtes et al.,1993). Although this connection depends on several assumptions that do not necessarily hold in gene expression data, the conclusions of Bayesian network analysis might be indicative of some causal connections in the data.

A Bayesian network (also known as causal probabilistic networks) is an annotated directed acyclic graph that encodes a joint probability distribution of a set of random variables X. Formally, a Bayesian network for X is a pair B=(G,Q). The first component, G, is a directed acyclic graph (DAG) whose vertices correspond to the random variables x1, . . . , xn, and whose edges represent direct dependencies between the variables. The graph G encodes the following set of independence statements: each variable xi is independent of its nondescendants given its parents G. The second component of the pair, namely Q, represents the set of parameters that quantifies the network and describes a conditional distribution for each variable, given its parents in G. Together, these two components specify a unique distribution on x1, . . . , xn. The graph G represents conditional independence assumptions that allow the joint distribution to be decomposed, economizing on the number of parameters. The graph G encodes the Markov Assumption: (Each variable Xi is independent of its nondescendants, given its parents in G. Given a Bayesian network, we might want to answer many types of questions that involve the joint probability (e.g., what is the probability of X = x given observation of some of the other variables?) or independencies in the domain (e.g., are X and Y independent once we observe Z?). The literature contains a suite of algorithms that can answer such queries efficiently by exploiting the explicit representation of structure (Jensen, 1996; Pearl, 1988).

## Biological Example

Let apply the approach to the data of Spellman,(Spellman et al., 1998). This data set contains 76 gene expression measurements of the mRNA levels of 6177 S. cerevisiae ORFs. These experiments measure six time series under different cell cycle synchronization methods. Spellman et al., (1998) identified 800 genes whose expression varied over the different cell-cycle stages. In learning from this data, one treat each measurement as an independent sample from a distribution and do not take into account the temporal aspect of the measurement. Since it is clear that the cell cycle process is of a temporal nature, compensatation is done by introducing an additional variable denoting the cell cycle phase. This variable is forced to be a root in all the networks learned. Its presence allows one to model dependency of expression levels on the current cell cycle phase.3 Two experiments were performed, one with the discrete multinomial distribution, the other with the linear Gaussian distribution. The learned features show that we can recover intricate structure even from such small data sets. It is important to note that a learning algorithm uses no prior biological knowledge nor constraints. All learned networks and relations are based solely on the information conveyed in the measurements themselves. These results are available at the following web page: http://www.cs.huji.ac.il/labs/compbio/expression. The Figure2. illustrates the graphical display of some results from this analysis.

## References

1. Friedman, N., Linial, M., Nachman, I.., Pe’er, D. (2000). Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology, 7, 601–620.
2. Heckerman, D., Meek, C., and Cooper, G. (1999). A Bayesian approach to causal discovery. in Cooper and Glymour, 141–166.
3. Jensen, F.V. (1996). An introduction to Bayesian Networks. University College London Press, London.
4. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco.
5. Pearl, J., and Verma, T.S. (1991). A theory of inferred causation. in Principles of Knowledge Representation and Reasoning: Proc. Second International Conference (KR ’91), 441–452.
6. Shmulevich,I., Dougherty, E.R., Kim, S., Zhang, W. (2002). Probabilistic Boolean Networks: A Rule-based Uncertainty Model for Gene Regulatory Networks. Bioinformatics, 18(2), 261-274.
7. Shmulevich,I., Dougherty, E.R., Zhang, W. (2002). From Boolean to Probabilistic Boolean Networks as Models of Genetic Regulatory Networks. Proceedings of the IEEE, 90(11), 1778-1792.
8. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9, 3273–3297.
9. Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search. Springer-Verlag, New York.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks