# Connexions

You are here: Home » Content » Applied Probability » Mathematical Expectation; General Random Variables

• Preface to Pfeiffer Applied Probability

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This collection is included in aLens by: Digital Scholarship at Rice University

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection:

Collection by: Paul E Pfeiffer. E-mail the author

# Mathematical Expectation; General Random Variables

Module by: Paul E Pfeiffer. E-mail the author

Summary: We extend the definition and properties of mathematical expectation to the general case, and note the relationship of mathematical expectation to the Lebesque integral. Although we do not develop the theory, identification of this relationship provides access to a rich and powerful set of properties. In the unit on Distribution Approximations, we show that a bounded random variable X can be represented as the limit of a nondecreasing sequence of simple random variables. Also, a real random variable can be expressed as the difference of two nonnegative random variables. The extension of mathematical expectation to the general case is based on these facts and certain basic properties of simple random variables.

In this unit, we extend the definition and properties of mathematical expectation to the general case. In the process, we note the relationship of mathematical expectation to the Lebesque integral, which is developed in abstract measure theory. Although we do not develop this theory, which lies beyond the scope of this study, identification of this relationship provides access to a rich and powerful set of properties which have far reaching consequences in both application and theory.

## Extension to the General Case

In the unit on Distribution Approximations, we show that a bounded random variable X can be represented as the limit of a nondecreasing sequence of simple random variables. Also, a real random variable can be expressed as the difference X=X+-X-X=X+-X- of two nonnegative random variables. The extension of mathematical expectation to the general case is based on these facts and certain basic properties of simple random variables, some of which are established in the unit on expectation for simple random variables. We list these properties and sketch how the extension is accomplished.

Definition. A condition on a random variable or on a relationship between random variables is said to hold almost surely, abbreviated “a.s.” iff the condition or relationship holds for all ω except possibly a set with probability zero.

Basic properties of simple random variables

• (E0) : If X=Ya.s. X=Ya.s. then E[X]=E[Y]E[X]=E[Y].
• (E1) : E[aIE]=aP(E)E[aIE]=aP(E).
• (E2) : Linearity. X=i=1naiXiimpliesE[X]=i=1naiE[Xi]X=i=1naiXiimpliesE[X]=i=1naiE[Xi]
• (E3) : Positivity; monotonicity
1. If X0a.s.X0a.s., then E[X]0E[X]0, with equality iff X=0a.s.X=0a.s..
2. If XYa.s.XYa.s., then E[X]E[Y]E[X]E[Y], with equality iff X=Ya.s.X=Ya.s.
• (E4) : Fundamental lemma If X0X0 is bounded and {Xn:1n}{Xn:1n} is an a.s. nonnegative, nondecreasing sequence with limnXn(ω)X(ω)limnXn(ω)X(ω) for almost every ω, then limnE[Xn]E[X]limnE[Xn]E[X].
• (E4a): If for all n, 0XnXn+1a.s.0XnXn+1a.s. and XnXa.s.XnXa.s., then E[Xn]E[X]E[Xn]E[X] (i.e., the expectation of the limit is the limit of the expectations).

Ideas of the proofs of the fundamental properties

• Modifying the random variable X on a set of probability zero simply modifies one or more of the Ai without changing P(Ai)P(Ai). Such a modification does not change E[X]E[X].
• Properties (E1) and (E2) are established in the unit on expectation of simple random variables..
• Positivity (E3a) is a simple property of sums of real numbers. Modification of sets of probability zero cannot affect the expectation.
• Monotonicity (E3b) is a consequence of positivity and linearity.
XYiffX-Y0a.s.andE[X]E[Y]iffE[X]-E[Y]=E[X-Y]0XYiffX-Y0a.s.andE[X]E[Y]iffE[X]-E[Y]=E[X-Y]0
(1)
• The fundamental lemma (E4) plays an essential role in extending the concept of expectation. It involves elementary, but somewhat sophisticated, use of linearity and monotonicity, limited to nonnegative random variables and positive coefficients. We forgo a proof.
• Monotonicity and the fundamental lemma provide a very simple proof of the monotone convergence theoem, often designated MC. Its role is essential in the extension.

Nonnegative random variables

There is a nondecreasing sequence of nonnegative simple random variables converging to X. Monotonicity implies the integrals of the nondecreasing sequence is a nondecreasing sequence of real numbers, which must have a limit or increase without bound (in which case we say the limit is infinite). We define E[X]=limnE[Xn]E[X]=limnE[Xn].

Two questions arise.

1. Is the limit unique? The approximating sequences for a simple random variable are not unique, although their limit is the same.
2. Is the definition consistent? If the limit random variable X is simple, does the new definition coincide with the old?

The fundamental lemma and monotone convergence may be used to show that the answer to both questions is affirmative, so that the definition is reasonable. Also, the six fundamental properties survive the passage to the limit.

As a simple applications of these ideas, consider discrete random variables such as the geometric (p)(p) or Poisson (μ)(μ), which are integer-valued but unbounded.

### Example 1: Unbounded, nonnegative, integer-valued random variables

The random variable X may be expressed

X = k = 0 k I E k , where E k = { X = k } with P ( E k ) = p k X = k = 0 k I E k , where E k = { X = k } with P ( E k ) = p k
(2)

Let

X n = k = 0 n - 1 k I E k + n I B n , where B n = { X n } X n = k = 0 n - 1 k I E k + n I B n , where B n = { X n }
(3)

Then each Xn is a simple random variable with XnXn+1XnXn+1. If X(ω)=kX(ω)=k, then Xn(ω)=k=X(ω)Xn(ω)=k=X(ω) for all nk+1nk+1. Hence, Xn(ω)X(ω)Xn(ω)X(ω) for all ω. By monotone convergence, E[Xn]E[X]E[Xn]E[X]. Now

E [ X n ] = k = 1 n - 1 k P ( E k ) + n P ( B n ) E [ X n ] = k = 1 n - 1 k P ( E k ) + n P ( B n )
(4)

If k=0kP(Ek)<k=0kP(Ek)<, then

0 n P ( B n ) = n k = n P ( E k ) k = n k P ( E k ) 0 as n 0 n P ( B n ) = n k = n P ( E k ) k = n k P ( E k ) 0 as n
(5)

Hence

E [ X ] = lim n E [ X n ] = k = 0 k P ( A k ) E [ X ] = lim n E [ X n ] = k = 0 k P ( A k )
(6)

We may use this result to establish the expectation for the geometric and Poisson distributions.

### Example 2: X∼X∼ geometric (p)(p)

We have pk=P(X=k)=qkp,0kpk=P(X=k)=qkp,0k. By the result of Example 1

E [ X ] = k = 0 k p q k = p q k = 1 k q k - 1 = p q ( 1 - q ) 2 = q / p E [ X ] = k = 0 k p q k = p q k = 1 k q k - 1 = p q ( 1 - q ) 2 = q / p
(7)

For Y-1Y-1 geometric (p)(p), pk=pqk-1pk=pqk-1 so that E[Y]=1qE[X]=1/pE[Y]=1qE[X]=1/p

### Example 3: X∼X∼ Poisson (μ)(μ)

We have pk=e-μμkk!pk=e-μμkk!. By the result of Example 1

E [ X ] = e - μ k = 0 k μ k k ! = μ e - μ k = 1 μ k - 1 ( k - 1 ) ! = μ e - μ e μ = μ E [ X ] = e - μ k = 0 k μ k k ! = μ e - μ k = 1 μ k - 1 ( k - 1 ) ! = μ e - μ e μ = μ
(8)

The general case

We make use of the fact that X=X+-X-X=X+-X-, where both X+ and X- are nonnegative. Then

E [ X ] = E [ X + ] - E [ X - ] provided at least one of E [ X + ] , E [ X - ] is finite E [ X ] = E [ X + ] - E [ X - ] provided at least one of E [ X + ] , E [ X - ] is finite
(9)

Definition. If both E[X+]E[X+] and E[X-]E[X-] are finite, X is said to be integrable.

The term integrable comes from the relation of expectation to the abstract Lebesgue integral of measure theory.

Again, the basic properties survive the extension. The property (E0) is subsumed in a more general uniqueness property noted in the list of properties discussed below.

Theoretical note

The development of expectation sketched above is exactly the development of the Lebesgue integral of the random variable X as a measurable function on the basic probability space (Ω,F,P)(Ω,F,P), so that

E [ X ] = Ω X d P E [ X ] = Ω X d P
(10)

As a consequence, we may utilize the properties of the general Lebesgue integral. In its abstract form, it is not particularly useful for actual calculations. A careful use of the mapping of probability mass to the real line by random variable X produces a corresponding mapping of the integral on the basic space to an integral on the real line. Although this integral is also a Lebesgue integral it agrees with the ordinary Riemann integral of calculus when the latter exists, so that ordinary integrals may be used to compute expectations.

The fundamental properties of simple random variables which survive the extension serve as the basis of an extensive and powerful list of properties of expectation of real random variables and real functions of random vectors. Some of the more important of these are listed in the table in Appendix E. We often refer to these properties by the numbers used in that table.

Some basic forms

The mapping theorems provide a number of basic integral (or summation) forms for computation.

1. In general, if Z=g(X)Z=g(X) with distribution functions FX and FZ, we have the expectation as a Stieltjes integral.
E[Z]=E[g(X)]=g(t)FX(dt)=uFZ(du)E[Z]=E[g(X)]=g(t)FX(dt)=uFZ(du)
(11)
2. If X and g(X)g(X) are absolutely continuous, the Stieltjes integrals are replaced by
E[Z]=g(t)fX(t)dt=ufZ(u)duE[Z]=g(t)fX(t)dt=ufZ(u)du
(12)
where limits of integration are determined by fX or fY. Justification for use of the density function is provided by the Radon-Nikodym theorem—property (E19).
3. If X is simple, in a primitive form (including canonical form), then
E[Z]=E[g(X)]=j=1mg(cj)P(Cj)E[Z]=E[g(X)]=j=1mg(cj)P(Cj)
(13)
If the distribution for Z=g(X)Z=g(X) is determined by a csort operation, then
E[Z]=k=1nvkP(Z=vk)E[Z]=k=1nvkP(Z=vk)
(14)
4. The extension to unbounded, nonnegative, integer-valued random variables is shown in Example 1, above. The finite sums are replaced by infinite series (provided they converge).
5. For Z=g(X,Y)Z=g(X,Y),
E[Z]=E[g(X,Y)]=g(t,u)FXY(dtdu)=vFZ(dv)E[Z]=E[g(X,Y)]=g(t,u)FXY(dtdu)=vFZ(dv)
(15)
6. In the absolutely continuous case
E[Z]=E[g(X,Y)]=g(t,u)fXY(t,u)dudt=vfZ(v)dvE[Z]=E[g(X,Y)]=g(t,u)fXY(t,u)dudt=vfZ(v)dv
(16)
7. For joint simple X,YX,Y (Section on Expectation for Simple Random Variables)
E[Z]=E[g(X,Y)]=i=1nj=1mg(ti,uj)P(X=ti,Y=uj)E[Z]=E[g(X,Y)]=i=1nj=1mg(ti,uj)P(X=ti,Y=uj)
(17)

Mechanical interpretation and approximation procedures

In elementary mechanics, since the total mass is one, the quantity E[X]=tfX(t)dtE[X]=tfX(t)dt is the location of the center of mass. This theoretically rigorous fact may be derived heuristically from an examination of the expectation for a simple approximating random variable. Recall the discussion of the m-procedure for discrete approximation in the unit on Distribution Approximations The range of X is divided into equal subintervals. The values of the approximating random variable are at the midpoints of the subintervals. The associated probability is the probability mass in the subinterval, which is approximately fX(ti)dxfX(ti)dx, where dxdx is the length of the subinterval. This approximation improves with an increasing number of subdivisions, with corresponding decrease in dxdx. The expectation of the approximating simple random variable Xs is

E [ X s ] = i t i f X ( t i ) d x t f X ( t ) d t E [ X s ] = i t i f X ( t i ) d x t f X ( t ) d t
(18)

The approximation improves with increasingly fine subdivisions. The center of mass of the approximating distribution approaches the center of mass of the smooth distribution.

It should be clear that a similar argument for g(X)g(X) leads to the integral expression

E [ g ( X ) ] = g ( t ) f X ( t ) d t E [ g ( X ) ] = g ( t ) f X ( t ) d t
(19)

This argument shows that we should be able to use tappr to set up for approximating the expectation E[g(X)]E[g(X)] as well as for approximating P(g(X)M)P(g(X)M), etc. We return to this in Section 2.

Mean values for some absolutely continuous distributions

1. Uniform on [a,b][a,b]fX(t)=1b-a,atbfX(t)=1b-a,atb The center of mass is at (a+b)/2(a+b)/2. To calculate the value formally, we write
E[X]=tfX(t)dt=1b-aabtdt=b2-a22(b-a)=b+a2E[X]=tfX(t)dt=1b-aabtdt=b2-a22(b-a)=b+a2
(20)
2. Symmetric triangular on[a,b][a,b] The graph of the density is an isoceles triangle with base on the interval [a,b][a,b]. By symmetry, the center of mass, hence the expectation, is at the midpoint (a+b)/2(a+b)/2.
3. Exponential(λ)(λ). fX(t)=λe-λt,0tfX(t)=λe-λt,0t Using a well known definite integral (see Appendix B), we have
E[X]=tfX(t)dt=0λte-λtdt=1/λE[X]=tfX(t)dt=0λte-λtdt=1/λ
(21)
4. Gamma(α,λ)(α,λ). fX(t)=1Γ(α)tα-1λαe-λt,0tfX(t)=1Γ(α)tα-1λαe-λt,0t Again we use one of the integrals in Appendix B to obtain
E[X]=tfX(t)dt=1Γ(α)0λαtαe-λtdt=Γ(α+1)λΓ(α)=α/λE[X]=tfX(t)dt=1Γ(α)0λαtαe-λtdt=Γ(α+1)λΓ(α)=α/λ
(22)
The last equality comes from the fact that Γ(α+1)=αΓ(α)Γ(α+1)=αΓ(α).
5. Beta(r,s)(r,s). fX(t)=Γ(r+s)Γ(r)Γ(s)tr-1(1-t)s-1,0<t<1fX(t)=Γ(r+s)Γ(r)Γ(s)tr-1(1-t)s-1,0<t<1 We use the fact that 01ur-1(1-u)s-1du=Γ(r)Γ(s)Γ(r+s),r>0,s>001ur-1(1-u)s-1du=Γ(r)Γ(s)Γ(r+s),r>0,s>0.
E[X]=tfX(t)dt=Γ(r+s)Γ(r)Γ(s)01tr(1-t)s-1dt=Γ(r+s)Γ(r)Γ(s)·Γ(r+1)Γ(s)Γ(r+s+1)=rr+sE[X]=tfX(t)dt=Γ(r+s)Γ(r)Γ(s)01tr(1-t)s-1dt=Γ(r+s)Γ(r)Γ(s)·Γ(r+1)Γ(s)Γ(r+s+1)=rr+s
(23)
6. Weibull(α,λ,ν)(α,λ,ν). FX(t)=1-e-λ(t-ν)αα>0,λ>0,ν0,tνFX(t)=1-e-λ(t-ν)αα>0,λ>0,ν0,tν. Differentiation shows
fX(t)=αλ(t-ν)α-1e-λ(t-ν)α,tνfX(t)=αλ(t-ν)α-1e-λ(t-ν)α,tν
(24)
First, consider YY exponential (λ)(λ). For this random variable
E[Yr]=0trλe-λtdt=Γ(r+1)λrE[Yr]=0trλe-λtdt=Γ(r+1)λr
(25)
If Y is exponential (1), then techniques for functions of random variables show that 1λY1/α+ν1λY1/α+ν Weibull (α,λ,ν)(α,λ,ν). Hence,
E[X]=1λ1/αE[Y1/α]+ν=1λ1/αΓ(1α+1)+νE[X]=1λ1/αE[Y1/α]+ν=1λ1/αΓ(1α+1)+ν
(26)
7. Normal(μ,σ2)(μ,σ2) The symmetry of the distribution about t=μt=μ shows that E[X]=μE[X]=μ. This, of course, may be verified by integration. A standard trick simplifies the work.
E[X]=-tfX(t)dt=-(t-μ)fX(t)dt+μE[X]=-tfX(t)dt=-(t-μ)fX(t)dt+μ
(27)
We have used the fact that -fX(t)dt=1-fX(t)dt=1. If we make the change of variable x=t-μx=t-μ in the last integral, the integrand becomes an odd function, so that the integral is zero. Thus, E[X]=μE[X]=μ.

## Properties and computation

The properties in the table in Appendix E constitute a powerful and convenient resource for the use of mathematical expectation. These are properties of the abstract Lebesgue integral, expressed in the notation for mathematical expectation.

E [ g ( X ) ] = g ( X ) d P E [ g ( X ) ] = g ( X ) d P
(28)

In the development of additional properties, the four basic properties: (E1) Expectation of indicator functions, (E2) Linearity, (E3) Positivity; monotonicity, and (E4a) Monotone convergence play a foundational role. We utilize the properties in the table, as needed, often referring to them by the numbers assigned in the table.

In this section, we include a number of examples which illustrate the use of various properties. Some are theoretical examples, deriving additional properties or displaying the basis and structure of some in the table. Others apply these properties to facilitate computation

### Example 4: Probability as expectation

Probability may be expressed entirely in terms of expectation.

• By properties (E1) and positivity (E3a), P(A)=E[IA]0P(A)=E[IA]0.
• As a special case of (E1), we have P(Ω)=E[IΩ]=1P(Ω)=E[IΩ]=1
• By the countable sums property (E8),
A=iAiimpliesP(A)=E[IA]=E[iIAi]=iE[IAi]=iP(Ai)A=iAiimpliesP(A)=E[IA]=E[iIAi]=iE[IAi]=iP(Ai)
(29)

Thus, the three defining properties for a probability measure are satisfied.

Remark. There are treatments of probability which characterize mathematical expectation with properties (E0) through (E4a), then define P(A)=E[IA]P(A)=E[IA]. Although such a development is quite feasible, it has not been widely adopted.

### Example 5: An indicator function pattern

Suppose X is a real random variable and E=X-1(M)={ω:X(ω)M}E=X-1(M)={ω:X(ω)M}. Then

I E = I M ( X ) I E = I M ( X )
(30)

To see this, note that X(ω)MX(ω)M iff ωEωE, so that IE(ω)=1IE(ω)=1 iff IM(X(ω))=1IM(X(ω))=1.

Similarly, if E=X-1(M)Y-1(N)E=X-1(M)Y-1(N), then IE=IM(X)IN(Y)IE=IM(X)IN(Y). We thus have, by (E1),

P ( X M ) = E [ I M ( X ) ] and P ( X M , Y N ) = E [ I M ( X ) I N ( Y ) ] P ( X M ) = E [ I M ( X ) ] and P ( X M , Y N ) = E [ I M ( X ) I N ( Y ) ]
(31)

### Example 6: Alternate interpretation of the mean value

E [ ( X - c ) 2 ] is a minimum iff c = E [ X ] , in which case E ( X - E [ X ] ) 2 = E [ X 2 ] - E 2 [ X ] E [ ( X - c ) 2 ] is a minimum iff c = E [ X ] , in which case E ( X - E [ X ] ) 2 = E [ X 2 ] - E 2 [ X ]
(32)

INTERPRETATION. If we approximate the random variable X by a constant c, then for any ω the error of approximation is X(ω)-cX(ω)-c. The probability weighted average of the square of the error (often called the mean squared error) is E(X-c)2E(X-c)2. This average squared error is smallest iff the approximating constant c is the mean value.

VERIFICATION

We expand (X-c)2(X-c)2 and apply linearity to obtain

E ( X - c ) 2 = E [ X 2 - 2 c X + c 2 ] = E [ X 2 ] - 2 E [ X ] c + c 2 E ( X - c ) 2 = E [ X 2 - 2 c X + c 2 ] = E [ X 2 ] - 2 E [ X ] c + c 2
(33)

The last expression is a quadratic in c (since E[X2]E[X2] and E[X]E[X] are constants). The usual calculus treatment shows the expression has a minimum for c=E[X]c=E[X]. Substitution of this value for c shows the expression reduces to E[X2]-E2[X]E[X2]-E2[X].

A number of inequalities are listed among the properties in the table. The basis for these inequalities is usually some standard analytical inequality on random variables to which the monotonicity property is applied. We illustrate with a derivation of the important Jensen's inequality.

### Example 7: Jensen's inequality

If X is a real random variable and g is a convex function on an interval I which includes the range of X, then

g ( E [ X ] ) E [ g ( X ) ] g ( E [ X ] ) E [ g ( X ) ]
(34)

VERIFICATION

The function g is convex on I iff for each t0I=[a,b]t0I=[a,b] there is a number λ(t0)λ(t0) such that

g ( t ) g ( t 0 ) + λ ( t 0 ) ( t - t 0 ) g ( t ) g ( t 0 ) + λ ( t 0 ) ( t - t 0 )
(35)

This means there is a line through (t0,g(t0))(t0,g(t0)) such that the graph of g lies on or above it. If aXbaXb, then by monotonicity E[a]=aE[X]E[b]=bE[a]=aE[X]E[b]=b (this is the mean value property (E11)). We may choose t0=E[X]It0=E[X]I. If we designate the constant λ(E[X])λ(E[X]) by c, we have

g ( X ) g ( E [ X ] ) + c ( X - E [ X ] ) g ( X ) g ( E [ X ] ) + c ( X - E [ X ] )
(36)

Recalling that E[X]E[X] is a constant, we take expectation of both sides, using linearity and monotonicity, to get

E [ g ( X ) ] g ( E [ X ] ) + c ( E [ X ] - E [ X ] ) = g ( E [ X ] ) E [ g ( X ) ] g ( E [ X ] ) + c ( E [ X ] - E [ X ] ) = g ( E [ X ] )
(37)

Remark. It is easy to show that the function λ(·)λ(·) is nondecreasing. This fact is used in establishing Jensen's inequality for conditional expectation.

The product rule for expectations of independent random variables

### Example 8: Product rule for simple random variables

Consider an independent pair {X,Y}{X,Y} of simple random variables

X = i = 1 n t i I A i Y = j = 1 m u j I B j (both in canonical form) X = i = 1 n t i I A i Y = j = 1 m u j I B j (both in canonical form)
(38)

We know that each pair {Ai,Bj}{Ai,Bj} is independent, so that P(AiBj)=P(Ai)P(Bj)P(AiBj)=P(Ai)P(Bj). Consider the product XYXY. According to the pattern described after Example 9 from "Mathematical Expectation: Simple Random Variables."

X Y = i = 1 n t i I A i j = 1 m u j I B j = i = 1 n j = 1 m t i u j I A i B j X Y = i = 1 n t i I A i j = 1 m u j I B j = i = 1 n j = 1 m t i u j I A i B j
(39)

The latter double sum is a primitive form, so that

E [ X Y ] = i = 1 n j = 1 m t i u j P ( A i B j ) = i = 1 n j = 1 m t i u j P ( A i ) P ( B j ) = i = 1 n t i P ( A i ) j = 1 m u j P ( B j ) = E [ X ] E [ Y ] E [ X Y ] = i = 1 n j = 1 m t i u j P ( A i B j ) = i = 1 n j = 1 m t i u j P ( A i ) P ( B j ) = i = 1 n t i P ( A i ) j = 1 m u j P ( B j ) = E [ X ] E [ Y ]
(40)

Thus the product rule holds for independent simple random variables.

### Example 9: Approximating simple functions for an independent pair

Suppose {X,Y}{X,Y} is an independent pair, with an approximating simple pair {Xs,Ys}{Xs,Ys}. As functions of X and Y, respectively, the pair {Xs,Ys}{Xs,Ys} is independent. According to Example 8, above, the product rule E[XsYs]=E[Xs]E[Ys]E[XsYs]=E[Xs]E[Ys] must hold.

### Example 10: Product rule for an independent pair

For X0,Y0X0,Y0, there exist nondecreasing sequences {Xn:1n}{Xn:1n} and {Yn:1n}{Yn:1n} of simple random variables increasing to X and Y, respectively. The sequence {XnYn:1n}{XnYn:1n} is also a nondecreasing sequence of simple random variables, increasing to XYXY. By the monotone convergence theorem (MC)

E [ X n ] E [ X ] , E [ Y n ] E [ Y ] , and E [ X n Y n ] E [ X Y ] E [ X n ] E [ X ] , E [ Y n ] E [ Y ] , and E [ X n Y n ] E [ X Y ]
(41)

Since E[XnYn]=E[Xn]E[Yn]E[XnYn]=E[Xn]E[Yn] for each n, we conclude E[XY]=E[X]E[Y]E[XY]=E[X]E[Y]

In the general case,

X Y = ( X + - X - ) ( Y + - Y - ) = X + Y + - X + Y - - X - Y + + X - Y - X Y = ( X + - X - ) ( Y + - Y - ) = X + Y + - X + Y - - X - Y + + X - Y -
(42)

Application of the product rule to each nonnegative pair and the use of linearity gives the product rule for the pair {X,Y}{X,Y}

Remark. It should be apparent that the product rule can be extended to any finite independent class.

### Example 11: The joint distribution of three random variables

The class {X,Y,Z}{X,Y,Z} is independent, with the marginal distributions shown below. Let W=g(X,Y,Z)=3X2+2XY-3XYZW=g(X,Y,Z)=3X2+2XY-3XYZ. Determine E[W]E[W].

X = 0:4;
Y = 1:2:7;
Z = 0:3:12;
PX = 0.1*[1 3 2 3 1];
PY = 0.1*[2 2 3 3];
PZ = 0.1*[2 2 1 3 2];

icalc3                            % Setup for joint dbn for{X,Y,Z}
Enter row matrix of X-values  X
Enter row matrix of Y-values  Y
Enter row matrix of Z-values  Z
Enter X probabilities  PX
Enter Y probabilities  PY
Enter Z probabilities  PZ
Use array operations on matrices X, Y, Z,
PX, PY, PZ, t, u, v, and P
EX = X*PX'                    % E[X]
EX =    2
EX2 = (X.^2)*PX'              % E[X^2]
EX2 =   5.4000
EY =  Y*PY'                   % E[Y]
EY =    4.4000
EZ = Z*PZ'                    % E[Z]
EZ =    6.3000
G = 3*t.^2 + 2*t.*u - 3*t.*u.*v;  % W = g(X,Y,Z) = 3X^2 + 2XY - 3XYZ
EG = total(G.*P)              % E[g(X,Y,Z)]
EG = -132.5200
[W,PW] = csort(G,P);          % Distribution for W = g(X,Y,Z)
EW = W*PW'                    % E[W]
EW = -132.5200
ew = 3*EX2 + 2*EX*EY - 3*EX*EY*EZ % Use of linearity and product rule
ew = -132.5200


### Example 12: A function with a compound definition: truncated exponential

Suppose XX exponential (0.3). Let

Z = X 2 for X 4 16 for X > 4 = I [ 0 , 4 ] ( X ) X 2 + I ( 4 , ] ( X ) 16 Z = X 2 for X 4 16 for X > 4 = I [ 0 , 4 ] ( X ) X 2 + I ( 4 , ] ( X ) 16
(43)

Determine E[Z]E[Z].

ANALYTIC SOLUTION

E [ g ( X ) ] = g ( t ) f X ( t ) d t = 0 I [ 0 , 4 ] ( t ) t 2 0 . 3 e - 0 . 3 t d t + 16 E [ I ( 4 , ] ( X ) ] E [ g ( X ) ] = g ( t ) f X ( t ) d t = 0 I [ 0 , 4 ] ( t ) t 2 0 . 3 e - 0 . 3 t d t + 16 E [ I ( 4 , ] ( X ) ]
(44)
= 0 4 t 2 0 . 3 e - 0 . 3 t d t + 16 P ( X > 4 ) 7 . 4972 (by Maple) = 0 4 t 2 0 . 3 e - 0 . 3 t d t + 16 P ( X > 4 ) 7 . 4972 (by Maple)
(45)

APPROXIMATION

To obtain a simple aproximation, we must approximate the exponential by a bounded random variable. Since P(X>50)=e-153·10-7P(X>50)=e-153·10-7 we may safely truncate X at 50.

tappr
Enter matrix [a b] of x-range endpoints  [0 50]
Enter number of x approximation points  1000
Enter density as a function of t  0.3*exp(-0.3*t)
Use row matrices X and PX as in the simple case
M = X <= 4;
G = M.*X.^2 + 16*(1 - M);  % g(X)
EG = G*PX'                 % E[g(X)]
EG =  7.4972
[Z,PZ] = csort(G,PX);      % Distribution for Z = g(X)
EZ = Z*PZ'                 % E[Z] from distribution
EZ =  7.4972


Because of the large number of approximation points, the results agree quite closely with the theoretical value.

### Example 13: Stocking for random demand (see Exercise 4 from "Problems on Functions of Random Variables")

The manager of a department store is planning for the holiday season. A certain item costs c dollars per unit and sells for p dollars per unit. If the demand exceeds the amount m ordered, additional units can be special ordered for s dollars per unit (s>cs>c). If demand is less than amount ordered, the remaining stock can be returned (or otherwise disposed of) at r dollars per unit (r<cr<c). Demand D for the season is assumed to be a random variable with Poisson (μ)(μ) distribution. Suppose μ=50,c=30,p=50,s=40,r=20μ=50,c=30,p=50,s=40,r=20. What amount m should the manager order to maximize the expected profit?

PROBLEM FORMULATION

Suppose D is the demand and X is the profit. Then

• For Dm,X=D(p-c)-(m-D)(c-r)=D(p-r)+m(r-c)Dm,X=D(p-c)-(m-D)(c-r)=D(p-r)+m(r-c)
• For D>m,X=m(p-c)+(D-m)(p-s)=D(p-s)+m(s-c)D>m,X=m(p-c)+(D-m)(p-s)=D(p-s)+m(s-c)

It is convenient to write the expression for X in terms of IM, where M=(-,m]M=(-,m]. Thus

X = I M ( D ) [ D ( p - r ) + m ( r - c ) ] + [ 1 - I M ( D ) ] [ D ( p - s ) + m ( s - c ) ] X = I M ( D ) [ D ( p - r ) + m ( r - c ) ] + [ 1 - I M ( D ) ] [ D ( p - s ) + m ( s - c ) ]
(46)
= D ( p - s ) + m ( s - c ) + I M ( D ) [ D ( p - r ) + m ( r - c ) - D ( p - s ) - m ( s - c ) ] = D ( p - s ) + m ( s - c ) + I M ( D ) [ D ( p - r ) + m ( r - c ) - D ( p - s ) - m ( s - c ) ]
(47)
= D ( p - s ) + m ( s - c ) + I M ( D ) ( s - r ) ( D - m ) = D ( p - s ) + m ( s - c ) + I M ( D ) ( s - r ) ( D - m )
(48)

Then E[X]=(p-s)E[D]+m(s-c)+(s-r)E[IM(D)D]-(s-r)mE[IM(D)]E[X]=(p-s)E[D]+m(s-c)+(s-r)E[IM(D)D]-(s-r)mE[IM(D)].

ANALYTIC SOLUTION

For DD Poisson (μ)(μ), E[D]=μandE[IM(D)]=P(Dm)E[D]=μandE[IM(D)]=P(Dm)

E [ I M ( D ) D ] = e - μ k = 1 m k μ k k ! = μ e - μ k = 1 m μ k - 1 ( k - 1 ) ! = μ P ( D m - 1 ) E [ I M ( D ) D ] = e - μ k = 1 m k μ k k ! = μ e - μ k = 1 m μ k - 1 ( k - 1 ) ! = μ P ( D m - 1 )
(49)

Hence,

E [ X ] = ( p - s ) E [ D ] + m ( s - c ) + ( s - r ) E [ I M ( D ) D ] - ( s - r ) m E [ I M ( D ) ] E [ X ] = ( p - s ) E [ D ] + m ( s - c ) + ( s - r ) E [ I M ( D ) D ] - ( s - r ) m E [ I M ( D ) ]
(50)
= ( p - s ) μ + m ( s - c ) + ( s - r ) μ P ( D m - 1 ) - ( s - r ) m P ( D m ) = ( p - s ) μ + m ( s - c ) + ( s - r ) μ P ( D m - 1 ) - ( s - r ) m P ( D m )
(51)

Because of the discrete nature of the problem, we cannot solve for the optimum m by ordinary calculus. We may solve for various m about m=μm=μ and determine the optimum. We do so with the aid of MATLAB and the m-function cpoisson.

mu = 50;
c  = 30;
p  = 50;
s  = 40;
r  = 20;
m  = 45:55;
EX = (p - s)*mu + m*(s -c) + (s - r)*mu*(1 - cpoisson(mu,m)) ...
-(s - r)*m.*(1 - cpoisson(mu,m+1));
disp([m;EX]')
45.0000  930.8604
46.0000  935.5231
47.0000  939.1895
48.0000  941.7962
49.0000  943.2988
50.0000  943.6750          % Optimum m = 50
51.0000  942.9247
52.0000  941.0699
53.0000  938.1532
54.0000  934.2347
55.0000  929.3886


A direct, solution may be obtained by MATLAB, using finite approximation for the Poisson distribution.

APPROXIMATION

ptest = cpoisson(mu,100)     % Check for suitable value of n
ptest =  3.2001e-10
n = 100;
t = 0:n;
pD = ipoisson(mu,t);
for i = 1:length(m)          % Step by step calculation for various m
M = t > m(i);
G(i,:) = t*(p - r) - M.*(t - m(i))*(s - r)- m(i)*(c - r);
end
EG = G*pD';                  % Values agree with theoretical to four deicmals


An advantage of the second solution, based on simple approximation to D, is that the distribution of gain for each m could be studied — e.g., the maximum and minimum gains.

### Example 14: A jointly distributed pair

Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=3ufXY(t,u)=3u on the triangular region bounded by u=0u=0, u=1+tu=1+t, u=1-tu=1-t (see Figure 1). Let Z=g(X,Y)=X2+2XYZ=g(X,Y)=X2+2XY. Determine E[Z]E[Z].

#### ANALYTIC SOLUTION

E[Z]=(t2+2tu)fXY(t,u)dudtE[Z]=(t2+2tu)fXY(t,u)dudt

= 3 - 1 0 0 1 + t ( t 2 u + 2 t u 2 ) d u d t + 3 0 1 0 1 - t ( t 2 u + 2 t u 2 ) d u d t = 1 / 10 = 3 - 1 0 0 1 + t ( t 2 u + 2 t u 2 ) d u d t + 3 0 1 0 1 - t ( t 2 u + 2 t u 2 ) d u d t = 1 / 10
(52)

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [-1 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  400
Enter number of Y approximation points  200
Enter expression for joint density  3*u.*(u<=min(1+t,1-t))
Use array operations on X, Y, PX, PY, t, u, and P
G = t.^2 + 2*t.*u;          % g(X,Y) = X^2 + 2XY
EG = total(G.*P)            % E[g(X,Y)]
EG =   0.1006                  % Theoretical value = 1/10
[Z,PZ] = csort(G,P);        % Distribution for Z
EZ = Z*PZ'              % E[Z] from distribution
EZ =  0.1006


### Example 15: A function with a compound definition

The pair {X,Y}{X,Y} has joint density fXY(t,u)=1/2fXY(t,u)=1/2 on the square region bounded by u=1+tu=1+t, u=1-tu=1-t, u=3-tu=3-t, and u=t-1u=t-1 (see Figure 2),

W = X for max { X , Y } 1 2 Y for max { X , Y } > 1 = I Q ( X , Y ) X + I Q c ( X , Y ) 2 Y W = X for max { X , Y } 1 2 Y for max { X , Y } > 1 = I Q ( X , Y ) X + I Q c ( X , Y ) 2 Y
(53)

where Q={(t,u):max{t,u}1}={(t,u):t1,u1}Q={(t,u):max{t,u}1}={(t,u):t1,u1}. Determine E[W]E[W].

ANALYTIC SOLUTION

The intersection of the region Q and the square is the set for which 0t10t1 and 1-tu11-tu1. Reference to the figure shows three regions of integration.

E [ W ] = 1 2 0 1 1 - t 1 t d u d t + 1 2 0 1 1 1 + t 2 u d u d t + 1 2 1 2 t - 1 3 - t 2 u d u d t = 11 / 6 1 . 8333 E [ W ] = 1 2 0 1 1 - t 1 t d u d t + 1 2 0 1 1 1 + t 2 u d u d t + 1 2 1 2 t - 1 3 - t 2 u d u d t = 11 / 6 1 . 8333
(54)

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 2]
Enter matrix [c d] of Y-range endpoints  [0 2]
Enter number of X approximation points  200
Enter number of Y approximation points  200
Enter expression for joint density  ((u<=min(t+1,3-t))& ...
(u>=max(1-t,t-1)))/2
Use array operations on X, Y, PX, PY, t, u, and P
M = max(t,u)<=1;
G = t.*M + 2*u.*(1 - M);   % Z = g(X,Y)
EG = total(G.*P)           % E[g(X,Y)]
EG =  1.8340               % Theoretical 11/6 = 1.8333
[Z,PZ] = csort(G,P);       % Distribution for Z
EZ = dot(Z,PZ)             % E[Z] from distribution
EZ =  1.8340


Special forms for expectation

The various special forms related to property (E20a) are often useful. The general result, which we do not need, is usually derived by an argument which employs a general form of what is known as Fubini's theorem. The special form (E20b)

E [ X ] = - [ u ( t ) - F X ( t ) ] d t E [ X ] = - [ u ( t ) - F X ( t ) ] d t
(55)

may be derived from (E20a) by use of integration by parts for Stieltjes integrals. However, we use the relationship between the graph of the distribution function and the graph of the quantile function to show the equivalence of (E20b) and (E20f). The latter property is readily established by elementary arguments.

### Example 16: The property (E20f)

If Q is the quantile function for the distribution function FX, then

E [ g ( X ) ] = 0 1 g [ Q ( u ) ] d u E [ g ( X ) ] = 0 1 g [ Q ( u ) ] d u
(56)

VERIFICATION

If Y=Q(U)Y=Q(U), where UU uniform on (0,1)(0,1), then Y has the same distribution as X. Hence,

E [ g ( X ) ] = E [ g ( Q ( U ) ) ] = g ( Q ( u ) ) f U ( u ) d u = 0 1 g ( Q ( u ) ) d u E [ g ( X ) ] = E [ g ( Q ( U ) ) ] = g ( Q ( u ) ) f U ( u ) d u = 0 1 g ( Q ( u ) ) d u
(57)

### Example 17: Reliability and expectation

In reliability, if X is the life duration (time to failure) for a device, the reliability function is the probability at any time t the device is still operative. Thus

R ( t ) = P ( X > t ) = 1 - F X ( t ) R ( t ) = P ( X > t ) = 1 - F X ( t )
(58)

According to property (E20b)

E [ X ] = 0 R ( t ) d t E [ X ] = 0 R ( t ) d t
(59)

### Example 18: Use of the quantile function

Suppose FX(t)=ta,a>0,0t1FX(t)=ta,a>0,0t1. Then Q(u)=u1/a,0uaQ(u)=u1/a,0ua.

E [ X ] = 0 1 u 1 / a d u = 1 1 + 1 / a = a a + 1 E [ X ] = 0 1 u 1 / a d u = 1 1 + 1 / a = a a + 1
(60)

The same result could be obtained by using fX(t)=FX'(t)fX(t)=FX'(t) and evaluating tfX(t)dttfX(t)dt.

### Example 19: Equivalence of (E20b) and (E20f)

For the special case g(X)=Xg(X)=X, Figure 3(a) shows 01Q(u)du01Q(u)du is the difference in the shaded areas

0 1 Q ( u ) d u = Area A - Area B 0 1 Q ( u ) d u = Area A - Area B
(61)

The corresponding graph of the distribution function F is shown in Figure 3(b). Because of the construction, the areas of the regions marked A and B are the same in the two figures. As may be seen,

Area A = 0 [ 1 - F ( t ) ] d t and Area B = - 0 F ( t ) d t Area A = 0 [ 1 - F ( t ) ] d t and Area B = - 0 F ( t ) d t
(62)

Use of the unit step function u(t)=1u(t)=1 for t>0t>0 and 0 for t<0t<0 (defined arbitrarily at t=0t=0) enables us to combine the two expressions to get

0 1 Q ( u ) d u = Area A - Area B = - [ u ( t ) - F ( t ) ] d t 0 1 Q ( u ) d u = Area A - Area B = - [ u ( t ) - F ( t ) ] d t
(63)

Property (E20c) is a direct result of linearity and (E20b), with the unit step functions cancelling out.

### Example 20: Property (E20d) Useful inequalities

Suppose X0X0. Then

n = 0 P ( X n + 1 ) E [ X ] n = 0 P ( X n ) N k = 0 P ( X k N ) , for all N 1 n = 0 P ( X n + 1 ) E [ X ] n = 0 P ( X n ) N k = 0 P ( X k N ) , for all N 1
(64)

VERIFICATION

For X0X0, by (E20b)

E [ X ] = 0 [ 1 - F ( t ) ] d t = 0 P ( X > t ) d t E [ X ] = 0 [ 1 - F ( t ) ] d t = 0 P ( X > t ) d t
(65)

Since F can have only a countable number of jumps on any interval and P(X>t)P(X>t) and P(Xt)P(Xt) differ only at jump points, we may assert

a b P ( X > t ) d t = a b P ( X t ) d t a b P ( X > t ) d t = a b P ( X t ) d t
(66)

For each nonnegative integer n, let En=[n,n+1)En=[n,n+1). By the countable additivity of expectation

E [ X ] = n = 0 E [ I E n X ] = n = 0 E n P ( X t ) d t E [ X ] = n = 0 E [ I E n X ] = n = 0 E n P ( X t ) d t
(67)

Since P(Xt)P(Xt) is decreasing with t and each En has unit length, we have by the mean value theorem

P ( X n + 1 ) E [ I E n X ] P ( X n ) P ( X n + 1 ) E [ I E n X ] P ( X n )
(68)

The third inequality follows from the fact that

k N ( k + 1 ) N P ( X t ) d t N E k N P ( X t ) d t N P ( X k N ) k N ( k + 1 ) N P ( X t ) d t N E k N P ( X t ) d t N P ( X k N )
(69)

Remark. Property (E20d) is used primarily for theoretical purposes. The special case (E20e) is more frequently used.

### Example 21: Property (E20e)

If X is nonnegative, integer valued, then

E [ X ] = k = 1 P ( X k ) = k = 0 P ( X > k ) E [ X ] = k = 1 P ( X k ) = k = 0 P ( X > k )
(70)

VERIFICATION

The result follows as a special case of (E20d). For integer valued random variables,

P ( X t ) = P ( X n ) on E n and P ( X t ) = P ( X > n ) = P ( X n + 1 ) on E n + 1 P ( X t ) = P ( X n ) on E n and P ( X t ) = P ( X > n ) = P ( X n + 1 ) on E n + 1
(71)

An elementary derivation of (E20e) can be constructed as follows.

### Example 22: (E20e) for integer-valued random variables

By definition

E [ X ] = k = 1 k P ( X = k ) = lim n k = 1 n k P ( X = k ) E [ X ] = k = 1 k P ( X = k ) = lim n k = 1 n k P ( X = k )
(72)

Now for each finite n,

k = 1 n k P ( X = k ) = k = 1 n j = 1 k P ( X = k ) = j = 1 n k = j n P ( X = k ) = j = 1 n P ( X j ) k = 1 n k P ( X = k ) = k = 1 n j = 1 k P ( X = k ) = j = 1 n k = j n P ( X = k ) = j = 1 n P ( X j )
(73)

Taking limits as nn yields the desired result.

### Example 23: The geometric distribution

Suppose XX geometric (p)(p). Then P(Xk)=qkP(Xk)=qk. Use of (E20e) gives

E [ X ] = k = 1 q k = q k = 0 q k = q 1 - q = q / p E [ X ] = k = 1 q k = q k = 0 q k = q 1 - q = q / p
(74)

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks