# Connexions

You are here: Home » Content » Random Variables and Probabilities

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Applied Probability"

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This module is included inLens: UniqU's lens
By: UniqU, LLCAs a part of collection: "Applied Probability"

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

# Random Variables and Probabilities

Module by: Paul E Pfeiffer. E-mail the author

Summary: Often, each outcome of an experiment is characterized by a number. If the outcome is observed as a physical quantity, the size of that quantity (in prescribed units) is the entity actually observed. In many nonnumerical cases, it is convenient to assign a number to each outcome. For example, in a coin flipping experiment, a “head” may be represented by a 1 and a “tail” by a 0. In a Bernoulli trial, a success may be represented by a 1 and a failure by a 0. In a sequence of trials, we may be interested in the number of successes in a sequence of n component trials. One could assign a distinct number to each card in a deck of playing cards. Observations of the result of selecting a card could be recorded in terms of individual numbers. In each case, the associated number becomes a property of the outcome. The fundamental idea of a real random variable is the assignment of a real number to each elementary outcome ω in the basic space Ω. Such an assignment amounts to determining a function X, whose domain is Ω and whose range is a subset of the real line R. Each ω is mapped into exactly one value t, although several ω may have the same image point. Except in special cases, we cannot write a formula for a random variable X. However, random variables share some important general properties of functions which play an essential role in determining their usefulness. Associated with a function X as a mapping are the inverse mapping and the inverse images it produces. By the inverse image of a set of real numbers M under the mapping X, we mean the set of all those ω∈Ω which are mapped into M by X. If X does not take a value in M, the inverse image is the empty set (impossible event). If M includes the range of X, (the set of all possible values of X), the inverse image is the entire basic space Ω. The class of inverse images of the Borel sets on the real line play an essential role in probability analysis.

## Introduction

Probability associates with an event a number which indicates the likelihood of the occurrence of that event on any trial. An event is modeled as the set of those possible outcomes of an experiment which satisfy a property or proposition characterizing the event.

Often, each outcome is characterized by a number. The experiment is performed. If the outcome is observed as a physical quantity, the size of that quantity (in prescribed units) is the entity actually observed. In many nonnumerical cases, it is convenient to assign a number to each outcome. For example, in a coin flipping experiment, a “head” may be represented by a 1 and a “tail” by a 0. In a Bernoulli trial, a success may be represented by a 1 and a failure by a 0. In a sequence of trials, we may be interested in the number of successes in a sequence of n component trials. One could assign a distinct number to each card in a deck of playing cards. Observations of the result of selecting a card could be recorded in terms of individual numbers. In each case, the associated number becomes a property of the outcome.

## Random variables as functions

We consider in this chapter real random variables (i.e., real-valued random variables). In the chapter "Random Vectors and Joint Distributions", we extend the notion to vector-valued random quantites. The fundamental idea of a real random variable is the assignment of a real number to each elementary outcome ω in the basic space Ω. Such an assignment amounts to determining a function X, whose domain is Ω and whose range is a subset of the real line R. Recall that a real-valued function on a domain (say an interval I on the real line) is characterized by the assignment of a real number y to each element x (argument) in the domain. For a real-valued function of a real variable, it is often possible to write a formula or otherwise state a rule describing the assignment of the value to each argument. Except in special cases, we cannot write a formula for a random variable X. However, random variables share some important general properties of functions which play an essential role in determining their usefulness.

Mappings and inverse mappings

There are various ways of characterizing a function. Probably the most useful for our purposes is as a mapping from the domain Ω to the codomain R. We find the mapping diagram of Figure 1 extremely useful in visualizing the essential patterns. Random variable X, as a mapping from basic space Ω to the real line R, assigns to each element ω a value t=X(ω)t=X(ω). The object point ω is mapped, or carried, into the image point t. Each ω is mapped into exactly one t, although several ω may have the same image point.

Associated with a function X as a mapping are the inverse mapping X-1X-1 and the inverse images it produces. Let M be a set of numbers on the real line. By the inverse image of M under the mapping X, we mean the set of all those ωΩωΩ which are mapped into M by X (see Figure 2). If X does not take a value in M, the inverse image is the empty set (impossible event). If M includes the range of X, (the set of all possible values of X), the inverse image is the entire basic space Ω. Formally we write

X - 1 ( M ) = { ω : X ( ω ) M } X - 1 ( M ) = { ω : X ( ω ) M }
(1)

Now we assume the set X-1(M)X-1(M), a subset of Ω, is an event for each M. A detailed examination of that assertion is a topic in measure theory. Fortunately, the results of measure theory ensure that we may make the assumption for any X and any subset M of the real line likely to be encountered in practice. The set X-1(M)X-1(M) is the event that X takes a value in M. As an event, it may be assigned a probability.

### Example 1: Some illustrative examples.

1. X=IEX=IE where E is an event with probability p. Now X takes on only two values, 0 and 1. The event that X take on the value 1 is the set
{ω:X(ω)=1}=X-1({1})=E{ω:X(ω)=1}=X-1({1})=E
(2)
so that P({ω:X(ω)=1})=pP({ω:X(ω)=1})=p. This rather ungainly notation is shortened to P(X=1)=pP(X=1)=p. Similarly, P(X=0)=1-pP(X=0)=1-p. Consider any set M. If neither 1 nor 0 is in M, then X-1(M)=X-1(M)= If 0 is in M, but 1 is not, then X-1(M)=EcX-1(M)=Ec If 1 is in M, but 0 is not, then X-1(M)=EX-1(M)=E If both 1 and 0 are in M, then X-1(M)=ΩX-1(M)=Ω In this case the class of all events X-1(M)X-1(M) consists of event E, its complement Ec, the impossible event , and the sure event Ω.
2. Consider a sequence of n Bernoulli trials, with probability p of success. Let Sn be the random variable whose value is the number of successes in the sequence of n component trials. Then, according to the analysis in the section "Bernoulli Trials and the Binomial Distribution"
P(Sn=k)=C(n,k)pk(1-p)n-k0knP(Sn=k)=C(n,k)pk(1-p)n-k0kn
(3)

Before considering further examples, we note a general property of inverse images. We state it in terms of a random variable, which maps Ω to the real line (see Figure 3).

Preservation of set operations

Let X be a mapping from Ω to the real line R. If M,Mi,iJM,Mi,iJ, are sets of real numbers, with respective inverse images E,EiE,Ei, then

X - 1 ( M c ) = E c , X - 1 ( i J M i ) = i J E i and X - 1 ( i J M i ) = i J E i X - 1 ( M c ) = E c , X - 1 ( i J M i ) = i J E i and X - 1 ( i J M i ) = i J E i
(4)

Examination of simple graphical examples exhibits the plausibility of these patterns. Formal proofs amount to careful reading of the notation. Central to the structure are the facts that each element ω is mapped into only one image point t and that the inverse image of M is the set of all those ω which are mapped into image points in M.

An easy, but important, consequence of the general patterns is that the inverse images of disjoint M,NM,N are also disjoint. This implies that the inverse of a disjoint union of Mi is a disjoint union of the separate inverse images.

### Example 2: Events determined by a random variable

Consider, again, the random variable Sn which counts the number of successes in a sequence of n Bernoulli trials. Let n=10n=10 and p=0.33p=0.33. Suppose we want to determine the probability P(2<S108)P(2<S108). Let Ak={ω:S10(ω)=k}Ak={ω:S10(ω)=k}, which we usually shorten to Ak={S10=k}Ak={S10=k}. Now the Ak form a partition, since we cannot have ωAkωAk and ωAj,jkωAj,jk (i.e., for any ω, we cannot have two values for Sn(ω)Sn(ω)). Now,

{ 2 < S 10 8 } = A 3 A 4 A 5 A 6 A 7 A 8 { 2 < S 10 8 } = A 3 A 4 A 5 A 6 A 7 A 8
(5)

since S10 takes on a value greater than 2 but no greater than 8 iff it takes one of the integer values from 3 to 8. By the additivity of probability,

P ( 2 < S 10 8 ) = k = 3 8 P ( S 10 = k ) = 0 . 6927 P ( 2 < S 10 8 ) = k = 3 8 P ( S 10 = k ) = 0 . 6927
(6)

## Mass transfer and induced probability distribution

Because of the abstract nature of the basic space and the class of events, we are limited in the kinds of calculations that can be performed meaningfully with the probabilities on the basic space. We represent probability as mass distributed on the basic space and visualize this with the aid of general Venn diagrams and minterm maps. We now think of the mapping from Ω to R as a producing a point-by-point transfer of the probability mass to the real line. This may be done as follows:

To any set M on the real line assign probability mass PX(M)=P(X-1(M))PX(M)=P(X-1(M))

It is apparent that PX(M)0PX(M)0 and PX(R)=P(Ω)=1PX(R)=P(Ω)=1. And because of the preservation of set operations by the inverse mapping

P X i = 1 M i = P X - 1 ( i = 1 M i ) = P i = 1 X - 1 ( M i ) = i = 1 P ( X - 1 ( M i ) ) = i = 1 P X ( M i ) P X i = 1 M i = P X - 1 ( i = 1 M i ) = P i = 1 X - 1 ( M i ) = i = 1 P ( X - 1 ( M i ) ) = i = 1 P X ( M i )
(7)

This means that PX has the properties of a probability measure defined on the subsets of the real line. Some results of measure theory show that this probability is defined uniquely on a class of subsets of R that includes any set normally encountered in applications. We have achieved a point-by-point transfer of the probability apparatus to the real line in such a manner that we can make calculations about the random variable X. We call PX the probability measure induced byX. Its importance lies in the fact that P(XM)=PX(M)P(XM)=PX(M). Thus, to determine the likelihood that random quantity X will take on a value in set M, we determine how much induced probability mass is in the set M. This transfer produces what is called the probability distribution for X. In the chapter "Distribution and Density Functions", we consider useful ways to describe the probability distribution induced by a random variable. We turn first to a special class of random variables.

## Simple random variables

We consider, in some detail, random variables which have only a finite set of possible values. These are called simple random variables. Thus the term “simple” is used in a special, technical sense. The importance of simple random variables rests on two facts. For one thing, in practice we can distinguish only a finite set of possible values for any random variable. In addition, any random variable may be approximated as closely as pleased by a simple random variable. When the structure and properties of simple random variables have been examined, we turn to more general cases. Many properties of simple random variables extend to the general case via the approximation procedure.

Representation with the aid of indicator functions

In order to deal with simple random variables clearly and precisely, we must find suitable ways to express them analytically. We do this with the aid of indicator functions. Three basic forms of representation are encountered. These are not mutually exclusive representatons.

1. Standard or canonical form, which displays the possible values and the corresponding events. If X takes on distinct values
{ t 1 , t 2 , , t n } with respective probabilities { p 1 , p 2 , , p n } { t 1 , t 2 , , t n } with respective probabilities { p 1 , p 2 , , p n }
(8)
and if Ai={X=ti}Ai={X=ti}, for 1in1in, then {A1,A2,,An}{A1,A2,,An} is a partition (i.e., on any trial, exactly one of these events occurs). We call this the partition determined by (or, generated by) X. We may write
X = t 1 I A 1 + t 2 I A 2 + + t n I A n = i = 1 n t i I A i X = t 1 I A 1 + t 2 I A 2 + + t n I A n = i = 1 n t i I A i
(9)
If X(ω)=tiX(ω)=ti, then ωAiωAi, so that IAi(ω)=1IAi(ω)=1 and all the other indicator functions have value zero. The summation expression thus picks out the correct value ti. This is true for any ti, so the expression represents X(ω)X(ω) for all ω. The distinct set {t1,t2,,tn}{t1,t2,,tn} of the values and the corresponding probabilities {p1,p2,,pn}{p1,p2,,pn} constitute the distribution for X. Probability calculations for X are made in terms of its distribution. One of the advantages of the canonical form is that it displays the range (set of values), and if the probabilities pi=P(Ai)pi=P(Ai) are known, the distribution is determined. Note that in canonical form, if one of the ti has value zero, we include that term. For some probability distributions it may be that P(Ai)=0P(Ai)=0 for one or more of the ti. In that case, we call these values null values, for they can only occur with probability zero, and hence are practically impossible. In the general formulation, we include possible null values, since they do not affect any probabilitiy calculations.

### Example 3: Successes in Bernoulli trials

As the analysis of Bernoulli trials and the binomial distribution shows (see Section 4.8), canonical form must be

S n = k = 0 n k I A k with P ( A k ) = C ( n , k ) p k ( 1 - p ) n - k , 0 k n S n = k = 0 n k I A k with P ( A k ) = C ( n , k ) p k ( 1 - p ) n - k , 0 k n
(10)
For many purposes, both theoretical and practical, canonical form is desirable. For one thing, it displays directly the range (i.e., set of values) of the random variable. The distribution consists of the set of values {tk:1kn}{tk:1kn} paired with the corresponding set of probabilities {pk:1kn}{pk:1kn}, where pk=P(Ak)=P(X=tk)pk=P(Ak)=P(X=tk).
2. Simple random variable X may be represented by a primitive form
X = c 1 I C 1 + c 2 I C 2 + , c m I C m , where { C j : 1 j m } is a partition X = c 1 I C 1 + c 2 I C 2 + , c m I C m , where { C j : 1 j m } is a partition
(11)
Remarks
• If {Cj:1jm}{Cj:1jm} is a disjoint class, but j=1mCjΩj=1mCjΩ, we may append the event Cm+1=j=1mCjcCm+1=j=1mCjc and assign value zero to it.
• We say a primitive form, since the representation is not unique. Any of the Ci may be partitioned, with the same value ci associated with each subset formed.
• Canonical form is a special primitive form. Canonical form is unique, and in many ways normative.

### Example 4: Simple random variables in primitive form

• A wheel is spun yielding, on a equally likely basis, the integers 1 through 10. Let Ci be the event the wheel stops at i, 1i101i10. Each P(Ci)=0.1P(Ci)=0.1. If the numbers 1, 4, or 7 turn up, the player loses ten dollars; if the numbers 2, 5, or 8 turn up, the player gains nothing; if the numbers 3, 6, or 9 turn up, the player gains ten dollars; if the number 10 turns up, the player loses one dollar. The random variable expressing the results may be expressed in primitive form as
X=-10IC1+0IC2+10IC3-10IC4+0IC5+10IC6-10IC7+0IC8+10IC9-IC10X=-10IC1+0IC2+10IC3-10IC4+0IC5+10IC6-10IC7+0IC8+10IC9-IC10
(12)
• A store has eight items for sale. The prices are $3.50,$5.00, $3.50,$7.50, $5.00,$5.00, $3.50, and$7.50, respectively. A customer comes in. She purchases one of the items with probabilities 0.10, 0.15, 0.15, 0.20, 0.10 0.05, 0.10 0.15. The random variable expressing the amount of her purchase may be written
X=3.5IC1+5.0IC2+3.5IC3+7.5IC4+5.0IC5+5.0IC6+3.5IC7+7.5IC8X=3.5IC1+5.0IC2+3.5IC3+7.5IC4+5.0IC5+5.0IC6+3.5IC7+7.5IC8
(13)
3. We commonly have X represented in affine form, in which the random variable is represented as an affine combination of indicator functions (i.e., a linear combination of the indicator functions plus a constant, which may be zero).
X = c 0 + c 1 I E 1 + c 2 I E 2 + + c m I E m = c 0 + j = 1 m c j I E j X = c 0 + c 1 I E 1 + c 2 I E 2 + + c m I E m = c 0 + j = 1 m c j I E j
(14)
In this form, the class {E1,E2,,Em}{E1,E2,,Em} is not necessarily mutually exclusive, and the coefficients do not display directly the set of possible values. In fact, the Ei often form an independent class. Remark. Any primitive form is a special affine form in which c0=0c0=0 and the Ei form a partition.

### Example 5

Consider, again, the random variable Sn which counts the number of successes in a sequence of n Bernoulli trials. If Ei is the event of a success on the ith trial, then one natural way to express the count is

S n = i = 1 n I E i , with P ( E i ) = p 1 i n S n = i = 1 n I E i , with P ( E i ) = p 1 i n
(15)

This is affine form, with c0=0c0=0 and ci=1ci=1 for 1in1in. In this case, the Ei cannot form a mutually exclusive class, since they form an independent class.

Events generated by a simple random variable: canonical form
We may characterize the class of all inverse images formed by a simple random X in terms of the partition {Ai:1in}{Ai:1in} it determines. Consider any set M of real numbers. If ti in the range of X is in M, then every point ωAiωAi maps into ti, hence into M. If the set J is the set of indices i such that tiMtiM, then
Only those points ω in AM=iJAiAM=iJAi map into M.
Hence, the class of events (i.e., inverse images) determined by X consists of the impossible event , the sure event Ω, and the union of any subclass of the Ai in the partition determined by X.

### Example 6: Events determined by a simple random variable

Suppose simple random variable X is represented in canonical form by

X = - 2 I A - I B + 0 I C + 3 I D X = - 2 I A - I B + 0 I C + 3 I D
(16)

Then the class {A,B,C,D}{A,B,C,D} is the partition determined by X and the range of X is {-2,-1,0,3}{-2,-1,0,3}.

1. If M is the interval [-2,1][-2,1], then the values -2, -1, and 0 are in M and X-1(M)=ABCX-1(M)=ABC.
2. If M is the set (-2,-1][1,5](-2,-1][1,5], then the values -1, 3 are in M and X-1(M)=BDX-1(M)=BD.
3. The event {X1}={X(-,1]}=X-1(M){X1}={X(-,1]}=X-1(M), where M=(-,1]M=(-,1]. Since values -2, -1, 0 are in M, the event {X1}=ABC{X1}=ABC.

## Determination of the distribution

Determining the partition generated by a simple random variable amounts to determining the canonical form. The distribution is then completed by determining the probabilities of each event Ak={X=tk}Ak={X=tk}.

From a primitive form

Before writing down the general pattern, we consider an illustrative example.

### Example 7: The distribution from a primitive form

Suppose one item is selected at random from a group of ten items. The values (in dollars) and respective probabilities are

 cj 2 1.5 2 2.5 1.5 1.5 1 2.5 2 1.5 P ( C j ) P ( C j ) 0.08 0.11 0.07 0.15 0.1 0.09 0.14 0.08 0.08 0.1

By inspection, we find four distinct values: t1=1.00t1=1.00, t2=1.50t2=1.50, t3=2.00t3=2.00, and t4=2.50t4=2.50. The value 1.00 is taken on for ωC7ωC7 , so that A1=C7A1=C7 and P(A1)=P(C7)=0.14P(A1)=P(C7)=0.14. Value 1.50 is taken on for ωC2,C5,C6,C10ωC2,C5,C6,C10 so that

A 2 = C 2 C 5 C 6 C 10 and P ( A 2 ) = P ( C 2 ) + P ( C 5 ) + P ( C 6 ) + P ( C 10 ) = 0 . 40 A 2 = C 2 C 5 C 6 C 10 and P ( A 2 ) = P ( C 2 ) + P ( C 5 ) + P ( C 6 ) + P ( C 10 ) = 0 . 40
(17)

Similarly

P ( A 3 ) = P ( C 1 ) + P ( C 3 ) + P ( C 9 ) = 0 . 23 and P ( A 4 ) = P ( C 4 ) + P ( C 8 ) = 0 . 23 P ( A 3 ) = P ( C 1 ) + P ( C 3 ) + P ( C 9 ) = 0 . 23 and P ( A 4 ) = P ( C 4 ) + P ( C 8 ) = 0 . 23
(18)

The distribution for X is thus

 k 1 1.5 2 2.5 P ( X = k ) P ( X = k ) 0.14 0.4 0.23 0.23

The general procedure may be formulated as follows:

If X=j=1mcjICjX=j=1mcjICj, we identify the set of distinct values in the set {cj:1jm}{cj:1jm}. Suppose these are t1<t2<<tnt1<t2<<tn. For any possible value ti in the range, identify the index set Ji of those j such that cj=ticj=ti. Then the terms

J i c j I C j = t i J i I C j = t i I A i , where A i = j J i C j , J i c j I C j = t i J i I C j = t i I A i , where A i = j J i C j ,
(19)

and

P ( A i ) = P ( X = t i ) = j J i P ( C j ) P ( A i ) = P ( X = t i ) = j J i P ( C j )
(20)

Examination of this procedure shows that there are two phases:

• Select and sort the distinct values t1,t2,,tnt1,t2,,tn
• Add all probabilities associated with each value ti to determine P(X=ti)P(X=ti)

We use the m-function csort which performs these two operations (see Example 4 from "Minterms and MATLAB Calculations").

### Example 8: Use of csort on Example 7

>> C = [2.00 1.50 2.00 2.50 1.50 1.50 1.00 2.50 2.00 1.50];  % Matrix of c_j
>> pc = [0.08 0.11 0.07 0.15 0.10 0.09 0.14 0.08 0.08 0.10]; % Matrix of P(C_j)
>> [X,PX] = csort(C,pc);     % The sorting and consolidating operation
>> disp([X;PX]')             % Display of results
1.0000    0.1400
1.5000    0.4000
2.0000    0.2300
2.5000    0.2300


For a problem this small, use of a tool such as csort is not really needed. But in many problems with large sets of data the m-function csort is very useful.

From affine form

Suppose X is in affine form,

X = c 0 + c 1 I E 1 + c 2 I E 2 + + c m I E m = c 0 + j = 1 m c j I E j X = c 0 + c 1 I E 1 + c 2 I E 2 + + c m I E m = c 0 + j = 1 m c j I E j
(21)

We determine a particular primitive form by determining the value of X on each minterm generated by the class {Ej:1jm}{Ej:1jm}. We do this in a systematic way by utilizing minterm vectors and properties of indicator functions.

1. Step 1. X is constant on each minterm generated by the class {E1,E2,,Em}{E1,E2,,Em} since, as noted in the treatment of the minterm expansion, each indicator function IEiIEi is constant on each minterm. We determine the value si of X on each minterm Mi. This describes X in a special primitive form
X=k=02m-1siIMi,withP(Mi)=pi,0i2m-1X=k=02m-1siIMi,withP(Mi)=pi,0i2m-1
(22)
2. Step 2. We apply the csort operation to the matrices of values and minterm probabilities to determine the distribution for X.

We illustrate with a simple example. Extension to the general case should be quite evident. First, we do the problem “by hand” in tabular form. Then we use the m-procedures to carry out the desired operations.

### Example 9: Finding the distribution from affine form

A mail order house is featuring three items (limit one of each kind per customer). Let

• E1=E1= the event the customer orders item 1, at a price of 10 dollars.
• E2=E2= the event the customer orders item 2, at a price of 18 dollars.
• E3=E3= the event the customer orders item 3, at a price of 10 dollars.

There is a mailing charge of 3 dollars per order.

We suppose {E1,E2,E3}{E1,E2,E3} is independent with probabilities 0.6, 0.3, 0.5, respectively. Let X be the amount a customer who orders the special items spends on them plus mailing cost. Then, in affine form,

X = 10 I E 1 + 18 I E 2 + 10 I E 3 + 3 X = 10 I E 1 + 18 I E 2 + 10 I E 3 + 3
(23)

We seek first the primitive form, using the minterm probabilities, which may calculated in this case by using the m-function minprob.

1. To obtain the value of X on each minterm we
• Multiply the minterm vector for each generating event by the coefficient for that event
• Sum the values on each minterm and add the constant
To complete the table, list the corresponding minterm probabilities.
 i 10 I E i 10 I E i 18 I E 2 18 I E 2 10 I E 3 10 I E 3 c si p m i p m i 0 0 0 0 3 3 0.14 1 0 0 10 3 13 0.14 2 0 18 0 3 21 0.06 3 0 18 10 3 31 0.06 4 10 0 0 3 13 0.21 5 10 0 10 3 23 0.21 6 10 18 0 3 31 0.09 7 10 18 10 3 41 0.09
We then sort on the si, the values on the various Mi, to expose more clearly the primitive form for X.
 i si p m i p m i 0 3 0.14 1 13 0.14 4 13 0.21 2 21 0.06 5 23 0.21 3 31 0.06 6 31 0.09 7 41 0.09
The primitive form of X is thus
X = 3 I M 0 + 13 I M 1 + 13 I M 4 + 21 I M 2 + 23 I M 5 + 31 I M 3 + 31 I M 6 + 41 I M 7 X = 3 I M 0 + 13 I M 1 + 13 I M 4 + 21 I M 2 + 23 I M 5 + 31 I M 3 + 31 I M 6 + 41 I M 7
(24)
We note that the value 13 is taken on on minterms M1 and M4. The probability X has the value 13 is thus p(1)+p(4)p(1)+p(4). Similarly, X has value 31 on minterms M3 and M6.
2. To complete the process of determining the distribution, we list the sorted values and consolidate by adding together the probabilities of the minterms on which each value is taken, as follows:
 k tk pk 1 3 0.14 2 13 0.14 + 0.21 = 0.35 3 21 0.06 4 23 0.21 5 31 0.06 + 0.09 = 0.15 6 41 0.09
The results may be put in a matrix X of possible values and a corresponding matrix PXPX of probabilities that X takes on each of these values. Examination of the table shows that
X = [ 3 13 21 23 31 41 ] and P X = [ 0 . 14 0 . 35 0 . 06 0 . 21 0 . 15 0 . 09 ] X = [ 3 13 21 23 31 41 ] and P X = [ 0 . 14 0 . 35 0 . 06 0 . 21 0 . 15 0 . 09 ]
(25)
Matrices X and PXPX describe the distribution for X.

## An m-procedure for determining the distribution from affine form

We now consider suitable MATLAB steps in determining the distribution from affine form, then incorporate these in the m-procedure canonic for carrying out the transformation. We start with the random variable in affine form, and suppose we have available, or can calculate, the minterm probabilities.

1. The procedure uses mintable to set the basic minterm vector patterns, then uses a matrix of coefficients, including the constant term (set to zero if absent), to obtain the values on each minterm. The minterm probabilities are included in a row matrix.
2. Having obtained the values on each minterm, the procedure performs the desired consolidation by using the m-function csort.

### Example 10: Steps in determining the distribution for X in Example 9

>> c = [10 18 10 3];                 % Constant term is listed last
>> pm = minprob(0.1*[6 3 5]);
>> M  = mintable(3)                  % Minterm vector pattern
M =
0     0     0     0     1     1     1     1
0     0     1     1     0     0     1     1
0     1     0     1     0     1     0     1
% - - - - - - - - - - - - - -        % An approach mimicking hand'' calculation
>> C = colcopy(c(1:3),8)             % Coefficients in position
C =
10    10    10    10    10    10    10    10
18    18    18    18    18    18    18    18
10    10    10    10    10    10    10    10
>> CM = C.*M                         % Minterm vector values
CM =
0     0     0     0    10    10    10    10
0     0    18    18     0     0    18    18
0    10     0    10     0    10     0    10
>> cM = sum(CM) + c(4)               % Values on minterms
cM =
3    13    21    31    13    23    31    41
% - - - - - - - - - - - -  -         % Practical MATLAB procedure
>> s = c(1:3)*M + c(4)
s =
3    13    21    31    13    23    31    41
>> pm = 0.14  0.14  0.06  0.06  0.21  0.21  0.09  0.09   % Extra zeros deleted
>> const = c(4)*ones(1,8);}

>> disp([CM;const;s;pm]')            % Display of primitive form
0     0     0   3    3    0.14  % MATLAB gives four decimals
0     0    10   3   13    0.14
0    18     0   3   21    0.06
0    18    10   3   31    0.06
10     0     0   3   13    0.21
10     0    10   3   23    0.21
10    18     0   3   31    0.09
10    18    10   3   41    0.09
>> [X,PX] = csort(s,pm);              % Sorting on s, consolidation of  pm
>> disp([X;PX]')                      % Display of final result
3    0.14
13    0.35
21    0.06
23    0.21
31    0.15
41    0.09


The two basic steps are combined in the m-procedure canonic, which we use to solve the previous problem.

### Example 11: Use of canonic on the variables of Example 10

>> c = [10 18 10 3]; % Note that the constant term 3 must be included last
>> pm = minprob([0.6 0.3 0.5]);
>> canonic
Enter row vector of coefficients  c
Enter row vector of minterm probabilities  pm
Use row matrices X and PX for calculations
Call for XDBN to view the distribution
>> disp(XDBN)
3.0000    0.1400
13.0000    0.3500
21.0000    0.0600
23.0000    0.2100
31.0000    0.1500
41.0000    0.0900


With the distribution available in the matrices X (set of values) and PXPX (set of probabilities), we may calculate a wide variety of quantities associated with the random variable.

We use two key devices:

1. Use relational and logical operations on the matrix of values X to determine a matrix M which has ones for those values which meet a prescribed condition. P(XM)P(XM): PM = M*PX'
2. Determine G=g(X)=[g(X1)g(X2)g(Xn)]G=g(X)=[g(X1)g(X2)g(Xn)] by using array operations on matrix X. We have two alternatives:
1. Use the matrix G, which has values g(ti)g(ti) for each possible value ti for X, or,
2. Apply csort to the pair (G,PX)(G,PX) to get the distribution for Z=g(X)Z=g(X). This distribution (in value and probability matrices) may be used in exactly the same manner as that for the original random variable X.

### Example 12: Continuation of Example 11

Suppose for the random variable X in Example 11 it is desired to determine the probabilities

P(15X35)P(15X35), P(|X-20|7)P(|X-20|7), and P((X-10)(X-25)>0)P((X-10)(X-25)>0).

>> M = (X>=15)&(X<=35);
M = 0   0    1    1    1    0    % Ones for minterms on which 15 <= X <= 35
>> PM = M*PX'                    % Picks out and sums those minterm probs
PM =  0.4200
>> N = abs(X-20)<=7;
N = 0    1    1    1    0    0   % Ones for minterms on which |X - 20| <= 7
>> PN = N*PX'                    % Picks out and sums those minterm probs
PN =  0.6200
>> G = (X - 10).*(X - 25)
G = 154 -36 -44 -26 126 496      % Value of g(t_i) for each possible value
>> P1 = (G>0)*PX'                % Total probability for those t_i such that
P1 =  0.3800                     % g(t_i) > 0
>> [Z,PZ] = csort(G,PX)          % Distribution for Z = g(X)
Z =  -44   -36   -26   126   154   496
PZ =  0.0600    0.3500    0.2100    0.1500    0.1400    0.0900
>> P2 = (Z>0)*PZ'                % Calculation using distribution for Z
P2 =  0.3800


### Example 13: Alternate formulation of Example 3 from "Composite Trials"

Ten race cars are involved in time trials to determine pole positions for an upcoming race. To qualify, they must post an average speed of 125 mph or more on a trial run. Let Ei be the event the ith car makes qualifying speed. It seems reasonable to suppose the class {Ei:1i10}{Ei:1i10} is independent. If the respective probabilities for success are 0.90, 0.88, 0.93, 0.77, 0.85, 0.96, 0.72, 0.83, 0.91, 0.84, what is the probability that k or more will qualify (k=6,7,8,9,10k=6,7,8,9,10)?

#### SOLUTION

Let X=i=110IEiX=i=110IEi.

>> c = [ones(1,10) 0];
>> P = [0.90, 0.88, 0.93, 0.77, 0.85, 0.96, 0.72, 0.83, 0.91, 0.84];
>> canonic
Enter row vector of coefficients  c
Enter row vector of minterm probabilities  minprob(P)
Use row matrices X and PX for calculations
Call for XDBN to view the distribution
>> k = 6:10;
>> for i = 1:length(k)
Pk(i) = (X>=k(i))*PX';
end
>> disp(Pk)
0.9938    0.9628    0.8472    0.5756    0.2114


This solution is not as convenient to write out. However, with the distribution for X as defined, a great many other probabilities can be determined. This is particularly the case when it is desired to compare the results of two independent races or “heats.” We consider such problems in the study of Independent Classes of Random Variables.

A function form for canonic

One disadvantage of the procedure canonic is that it always names the output X and PXPX. While these can easily be renamed, frequently it is desirable to use some other name for the random variable from the start. A function form, which we call canonicf, is useful in this case.

### Example 14: Alternate solution of Example 13, using canonicf

>> c = [10 18 10 3];
>> pm = minprob(0.1*[6 3 5]);
>> [Z,PZ] = canonicf(c,pm);
>> disp([Z;PZ]')                % Numbers as before, but the distribution
3.0000    0.1400            % matrices are now named Z and PZ
13.0000    0.3500
21.0000    0.0600
23.0000    0.2100
31.0000    0.1500
41.0000    0.0900


## General random variables

The distribution for a simple random variable is easily visualized as point mass concentrations at the various values in the range, and the class of events determined by a simple random variable is described in terms of the partition generated by X (i.e., the class of those events of the form Ai={X=ti}Ai={X=ti} for each ti in the range). The situation is conceptually the same for the general case, but the details are more complicated. If the random variable takes on a continuum of values, then the probability mass distribution may be spread smoothly on the line. Or, the distribution may be a mixture of point mass concentrations and smooth distributions on some intervals. The class of events determined by X is the set of all inverse images X-1(M)X-1(M) for M any member of a general class of subsets of subsets of the real line known in the mathematical literature as the Borel sets. There are technical mathematical reasons for not saying M is any subset, but the class of Borel sets is general enough to include any set likely to be encountered in applications—certainly at the level of this treatment. The Borel sets include any interval and any set that can be formed by complements, countable unions, and countable intersections of Borel sets. This is a type of class known as a sigma algebra of events. Because of the preservation of set operations by the inverse image, the class of events determined by random variable X is also a sigma algebra, and is often designated σ(X)σ(X). There are some technical questions concerning the probability measure PX induced by X, hence the distribution. These also are settled in such a manner that there is no need for concern at this level of analysis. However, some of these questions become important in dealing with random processes and other advanced notions increasingly used in applications. Two facts provide the freedom we need to proceed with little concern for the technical details.

1. X-1(M)X-1(M) is an event for every Borel set M iff for every semi-infinite interval (-,t](-,t] on the real line X-1((-,t])X-1((-,t]) is an event.
2. The induced probability distribution is determined uniquely by its assignment to all intervals of the form (-,t](-,t].

These facts point to the importance of the distribution function introduced in the next chapter.

Another fact, alluded to above and discussed in some detail in the next chapter, is that any general random variable can be approximated as closely as pleased by a simple random variable. We turn in the next chapter to a description of certain commonly encountered probability distributions and ways to describe them analytically.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks