# OpenStax_CNX

You are here: Home » Content » Conditional Independence, Given a Random Vector

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Applied Probability"

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This module is included inLens: UniqU's lens
By: UniqU, LLCAs a part of collection: "Applied Probability"

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

# Conditional Independence, Given a Random Vector

Module by: Paul E Pfeiffer. E-mail the author

Summary: The notion of conditional independence is extended to random variables by considering appropriate indicator functions, then extending to more general random vectors. A variety of equivalent conditions provide a basis for practical analysis and interpretation. An important application is the Bayesian approach to statistics. The essential idea is that an unknown parameter about which there is uncertainty is modeled as the value of a random variable. The name Bayesian comes from the role of Bayesian reversal in the analysis. The Bayesian estimate often seems preferable for small samples, and it has the advantage that prior information may be utilized. The sampling procedure upgrades the prior distribution.

In the unit on Conditional Independence , the concept of conditional independence of events is examined and used to model a variety of common situations. In this unit, we investigate a more general concept of conditional independence, based on the theory of conditional expectation. This concept lies at the foundations of Bayesian statistics, of many topics in decision theory, and of the theory of Markov systems. We examine in this unit, very briefly, the first of these. In the unit on Markov Sequences, we provide an introduction to the third.

## The concept

The definition of conditional independence of events is based on a product rule which may be expressed in terms of conditional expectation, given an event. The pair {A,B}{A,B} is conditionally independent, given C, iff

E [ I A I B | C ] = P ( A B | C ) = P ( A | C ) P ( B | C ) = E [ I A | C ] E [ I B | C ] E [ I A I B | C ] = P ( A B | C ) = P ( A | C ) P ( B | C ) = E [ I A | C ] E [ I B | C ]
(1)

If we let A=X-1(M)A=X-1(M) and B=Y-1(N)B=Y-1(N), then IA=IM(X)IA=IM(X) and IB=IN(Y)IB=IN(Y). It would be reasonable to consider the pair {X,Y}{X,Y} conditionally independent, given event C, iff the product rule

E [ I M ( X ) I N ( Y ) | C ] = E [ I M ( X ) | C ] E [ I N ( Y ) | C ] E [ I M ( X ) I N ( Y ) | C ] = E [ I M ( X ) | C ] E [ I N ( Y ) | C ]
(2)

holds for all reasonable M and N (technically, all Borel M and N). This suggests a possible extension to conditional expectation, given a random vector. We examine the following concept.

Definition. The pair {X,Y}{X,Y} is conditionally independent, givenZ, designated {X,Y} ci |Z{X,Y} ci |Z, iff

E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] for all Borel M . N E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] for all Borel M . N
(3)

Remark. Since it is not necessary that X,YX,Y, or Z be real valued, we understand that the sets M and N are on the codomains for X and Y, respectively. For example, if X is a three dimensional random vector, then M is a subset of R3.

As in the case of other concepts, it is useful to identify some key properties, which we refer to by the numbers used in the table in Appendix G. We note two kinds of equivalences. For example, the following are equivalent.

(CI1) E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] a . s . for all Borel sets M , N E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] a . s . for all Borel sets M , N

(CI5) E [ g ( X , Z ) h ( Y , Z ) | Z ] = E [ g ( X , Z ) | Z ] E [ h ( Y , Z ) | Z ] a . s . for all Borel functions g , h E [ g ( X , Z ) h ( Y , Z ) | Z ] = E [ g ( X , Z ) | Z ] E [ h ( Y , Z ) | Z ] a . s . for all Borel functions g , h

Because the indicator functions are special Borel functions, (CI1) is a special case of (CI5). To show that (CI1) implies (CI5), we need to use linearity, monotonicity, and monotone convergence in a manner similar to that used in extending properties (CE1) to (CE6) for conditional expectation. A second kind of equivalence involves various patterns. The properties (CI1), (CI2), (CI3), and (CI4) are equivalent, with (CI1) being the defining condition for {X,Y} ci |Z{X,Y} ci |Z.

(CI1) E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] a . s . for all Borel sets M , N E [ I M ( X ) I N ( Y ) | Z ] = E [ I M ( X ) | Z ] E [ I N ( Y ) | Z ] a . s . for all Borel sets M , N

(CI2) E [ I M ( X ) | Z , Y ] = E [ I M ( X ) | Z ] a . s . for all Borel sets M E [ I M ( X ) | Z , Y ] = E [ I M ( X ) | Z ] a . s . for all Borel sets M

(CI3) E [ I M ( X ) I Q ( Z ) | Z , Y ] = E [ I M ( X ) I Q ( Z ) | Z ] a . s . for all Borel sets M , Q E [ I M ( X ) I Q ( Z ) | Z , Y ] = E [ I M ( X ) I Q ( Z ) | Z ] a . s . for all Borel sets M , Q

(CI4) E [ I M ( X ) I Q ( Z ) | Y ] = E { E [ I M ( X ) I Q ( Z ) | Z ] | Y } a . s . for all Borel sets M , Q E [ I M ( X ) I Q ( Z ) | Y ] = E { E [ I M ( X ) I Q ( Z ) | Z ] | Y } a . s . for all Borel sets M , Q

As an example of the kinds of argument needed to verify these equivalences, we show the equivalence of (CI1) and (CI2).

• (CI1) implies (CI2). Set e1(Y,Z)=E[IM(X)|Z,Y]e1(Y,Z)=E[IM(X)|Z,Y] and e2(Y,Z)=E[IM(X)|Z]e2(Y,Z)=E[IM(X)|Z]. If we show
E[IN(Y)IQ(Z)e1(Y,Z)]=E[IN(Y)IQ(Z)e2(Y,Z)]forallBorelN,QE[IN(Y)IQ(Z)e1(Y,Z)]=E[IN(Y)IQ(Z)e2(Y,Z)]forallBorelN,Q
(4)
then by the uniqueness property (E5b) for expectation we may assert e1(Y,Z)=e2(Y,Z)a.s.e1(Y,Z)=e2(Y,Z)a.s. Using the defining property (CE1) for conditional expectation, we have
E{IN(Y)IQ(Z)E[IM(X)|Z,Y]}=E[IN(Y)IQ(Z)IM(X)]E{IN(Y)IQ(Z)E[IM(X)|Z,Y]}=E[IN(Y)IQ(Z)IM(X)]
(5)
On the other hand, use of (CE1), (CE8), (CI1), and (CE1) yields
E{IN(Y)IQ(Z)E[IM(X)|Z]}=E{IQ(Z)E[IN(Y)E[IM(X)|Z]|Z]}E{IN(Y)IQ(Z)E[IM(X)|Z]}=E{IQ(Z)E[IN(Y)E[IM(X)|Z]|Z]}
(6)
=E{IQ(Z)E[IM(X)|Z]E[IN(Y)|Z]}=E{IQ(Z)E[IM(X)IN(Y)|Z]}=E{IQ(Z)E[IM(X)|Z]E[IN(Y)|Z]}=E{IQ(Z)E[IM(X)IN(Y)|Z]}
(7)
=E[IN(Y)IQ(Z)IM(X)]=E[IN(Y)IQ(Z)IM(X)]
(8)
which establishes the desired equality.
• (CI2) implies (CI1). Using (CE9), (CE8), (CI2), and (CE8), we have
E[IM(X)IN(Y)|Z]=E{E[IM(X)IN(Y)|Z,Y]|Z}E[IM(X)IN(Y)|Z]=E{E[IM(X)IN(Y)|Z,Y]|Z}
(9)
=E{IN(Y)E[IM(X)|Z,Y]|Z}=E{IN(Y)E[IM(X)|Z]|Z}=E{IN(Y)E[IM(X)|Z,Y]|Z}=E{IN(Y)E[IM(X)|Z]|Z}
(10)
=E[IM(X)|Z]E[IN(Y)|Z]=E[IM(X)|Z]E[IN(Y)|Z]
(11)

Use of property (CE8) shows that (CI2) and (CI3) are equivalent. Now just as (CI1) extends to (CI5), so also (CI3) is equivalent to

(CI6) E [ g ( X , Z ) | Z , Y ] = E [ g ( X , Z ) | Z ] a . s . for all Borel functions g E [ g ( X , Z ) | Z , Y ] = E [ g ( X , Z ) | Z ] a . s . for all Borel functions g

Property (CI6) provides an important interpretation of conditional independence:

E[g(X,Z)|Z]E[g(X,Z)|Z] is the best mean-square estimator for g(X,Z)g(X,Z), given knowledge of Z. The condition {X,Y} ci |Z{X,Y} ci |Z implies that additional knowledge about Y does not modify that best estimate. This interpretation is often the most useful as a modeling assumption.

Similarly, property (CI4) is equivalent to

(CI8) E [ g ( X , Z ) | Y ] = E { E [ g ( X , Z ) | Z ] | Y } a . s . for all Borel functions g E [ g ( X , Z ) | Y ] = E { E [ g ( X , Z ) | Z ] | Y } a . s . for all Borel functions g

Property (CI7) is an alternate way of expressing (CI6). Property (CI9) is just a convenient way of expressing the other conditions.

The additional properties in Appendix G are useful in a variety of contexts, particularly in establishing properties of Markov systems. We refer to them as needed.

## The Bayesian approach to statistics

In the classical approach to statistics, a fundamental problem is to obtain information about the population distribution from the distribution in a simple random sample. There is an inherent difficulty with this approach. Suppose it is desired to determine the population mean μ. Now μ is an unknown quantity about which there is uncertainty. However, since it is a constant, we cannot assign a probability such as P(a<μb)P(a<μb). This has no meaning.

The Bayesian approach makes a fundamental change of viewpoint. Since the population mean is a quantity about which there is uncertainty, it is modeled as a random variable whose value is to be determined by experiment. In this view, the population distribution is conceived as randomly selected from a class of such distributions. One way of expressing this idea is to refer to a state of nature. The population distribution has been “selected by nature” from a class of distributions. The mean value is thus a random variable whose value is determined by this selection. To implement this point of view, we assume

1. The value of the parameter (say μ in the discussion above) is a “realization” of a parameter random variable H. If two or more parameters are sought (say the mean and variance), they may be considered components of a parameter random vector.
2. The population distribution is a conditional distribution, given the value of H.

The Bayesian model

If X is a random variable whose distribution is the population distribution and H is the parameter random variable, then {X,H}{X,H} have a joint distribution.

1. For each u in the range of H, we have a conditional distribution for X, given H=uH=u.
2. We assume a prior distribution for H. This is based on previous experience.
3. We have a random sampling process, given H: i.e., {Xi:1in}{Xi:1in} is conditionally iid, given H. Let W=(X1,X2,,Xn)W=(X1,X2,,Xn) and consider the joint conditional distribution function
FW|H(t1,t2,,tn|u)=P(X1t1,X2t2,Xntn|H=u)FW|H(t1,t2,,tn|u)=P(X1t1,X2t2,Xntn|H=u)
(12)
=Ei=1nI(-,ti](Xi)|H=u]=i=1nE[I(-,ti](Xi)|H=u]=i=1nFX|H(ti|u)=Ei=1nI(-,ti](Xi)|H=u]=i=1nE[I(-,ti](Xi)|H=u]=i=1nFX|H(ti|u)
(13)
If X has conditional density, given H, then a similar product rule holds.

Population proportion

We illustrate these ideas with one of the simplest, but most important, statistical problems: that of determining the proportion of a population which has a particular characteristic. Examples abound. We mention only a few to indicate the importance.

1. The proportion of a population of voters who plan to vote for a certain candidate.
2. The proportion of a given population which has a certain disease.
3. The fraction of items from a production line which meet specifications.
4. The fraction of women between the ages eighteen and fifty five who hold full time jobs.

The parameter in this case is the proportion p who meet the criterion. If sampling is at random, then the sampling process is equivalent to a sequence of Bernoulli trials. If H is the parameter random variable and Sn is the number of “successes” in a sample of size n, then the conditional distribution for Sn, given H=u, H=u, is binomial (n,u)(n,u). To see this, consider

X i = I E i , with P ( E i | H = u ) = E [ X i | H = u ] = e ( u ) = u X i = I E i , with P ( E i | H = u ) = E [ X i | H = u ] = e ( u ) = u
(14)

Anaysis is carried out for each fixed u as in the ordinary Bernoulli case. If

S n = i = 1 n X i = i = 1 n I E i is the number of successes in n component trials S n = i = 1 n X i = i = 1 n I E i is the number of successes in n component trials
(15)

we have the result

E [ I { k } ( S i ) | H = u ] = P ( S n = k | H = u ) = C ( n , k ) u k ( 1 - u ) n - k and E [ S n | H = u ] = n u E [ I { k } ( S i ) | H = u ] = P ( S n = k | H = u ) = C ( n , k ) u k ( 1 - u ) n - k and E [ S n | H = u ] = n u
(16)

The objective

We seek to determine the best mean-square estimate of H, given Sn=kSn=k. Two steps must be taken:

1. If H=uH=u, we know E[Sn|H=u]=nuE[Sn|H=u]=nu. Sampling gives Sn=kSn=k. We make a Bayesian reversal to get an exression for E[H|Sn=k]E[H|Sn=k].
2. To complete the task, we must assume a prior distribution for H on the basis of prior knowledge, if any.

The Bayesian reversal

Since {Sn=k}{Sn=k} is an event with positive probability, we use the definition of the conditional expectation, given an event, and the law of total probability (CE1b) to obtain

E [ H | S n = k ] = E [ H I { k } ( S n ) ] E [ I { k } ( S n ) ] = E { H E [ I { k } ( S n ) | H ] } E { E [ I { k } ( S n ) | H ] } = u E [ I { k } ( S n ) | H = u ] f H ( u ) d u E [ I { k } ( S n ) | H = u ] f H ( u ) d u E [ H | S n = k ] = E [ H I { k } ( S n ) ] E [ I { k } ( S n ) ] = E { H E [ I { k } ( S n ) | H ] } E { E [ I { k } ( S n ) | H ] } = u E [ I { k } ( S n ) | H = u ] f H ( u ) d u E [ I { k } ( S n ) | H = u ] f H ( u ) d u
(17)
= C ( n , k ) u k + 1 ( 1 - u ) n - k f H ( u ) d u C ( n , k ) u k ( 1 - u ) n - k f H ( u ) d u = C ( n , k ) u k + 1 ( 1 - u ) n - k f H ( u ) d u C ( n , k ) u k ( 1 - u ) n - k f H ( u ) d u
(18)

A prior distribution for H

The beta (r, s)(r, s) distribution (see Appendix G), proves to be a “natural” choice for this purpose. Its range is the unit interval, and by proper choice of parameters r, s, r, s, the density function can be given a variety of forms (see Figures 1 and 2).

Its analysis is based on the integrals

0 1 u r - 1 ( 1 - u ) s - 1 d u = Γ ( r ) Γ ( s ) Γ ( r + s ) with Γ ( a + 1 ) = a Γ ( a ) 0 1 u r - 1 ( 1 - u ) s - 1 d u = Γ ( r ) Γ ( s ) Γ ( r + s ) with Γ ( a + 1 ) = a Γ ( a )
(19)

For HH beta (r,s)(r,s), the density is given by

f H ( t ) = Γ ( r + s ) Γ ( r ) Γ ( s ) t r - 1 ( 1 - t ) s - 1 = A ( r , s ) t r - 1 ( 1 - t ) s - 1 0 < t < 1 f H ( t ) = Γ ( r + s ) Γ ( r ) Γ ( s ) t r - 1 ( 1 - t ) s - 1 = A ( r , s ) t r - 1 ( 1 - t ) s - 1 0 < t < 1
(20)

For r2,s2r2,s2, fH has a maximum at (r-1)/(r+s-2)(r-1)/(r+s-2). For r,sr,s positive integers, fH is a polynomial on [0,1][0,1], so that determination of the distribution function is easy. In any case, straightforward integration, using the integral formula above, shows

E [ H ] = r r + s and Var [ H ] = r s ( r + s ) 2 ( r + s + 1 ) E [ H ] = r r + s and Var [ H ] = r s ( r + s ) 2 ( r + s + 1 )
(21)

If the prior distribution for H is beta (r,s),(r,s), we may complete the determination of E[H|Sn=k]E[H|Sn=k] as follows.

E [ H | S n = k ] = A ( r , s ) 0 1 u k + 1 ( 1 - u ) n - k u r - 1 ( 1 - u ) s - 1 d u A ( r , s ) 0 1 u k ( 1 - u ) n - k u r - 1 ( 1 - u ) s - 1 d u = 0 1 u k + r ( 1 - u ) n + s - k - 1 d u 0 1 u k + r - 1 ( 1 - u ) n + s - k - 1 d u E [ H | S n = k ] = A ( r , s ) 0 1 u k + 1 ( 1 - u ) n - k u r - 1 ( 1 - u ) s - 1 d u A ( r , s ) 0 1 u k ( 1 - u ) n - k u r - 1 ( 1 - u ) s - 1 d u = 0 1 u k + r ( 1 - u ) n + s - k - 1 d u 0 1 u k + r - 1 ( 1 - u ) n + s - k - 1 d u
(22)
= Γ ( r + k + 1 ) Γ ( n + s - k ) Γ ( r + s + n + 1 ) · Γ ( r + s + n ) Γ ( r + k ) Γ ( n + s - k ) = k + r n + r + s = Γ ( r + k + 1 ) Γ ( n + s - k ) Γ ( r + s + n + 1 ) · Γ ( r + s + n ) Γ ( r + k ) Γ ( n + s - k ) = k + r n + r + s
(23)

We may adapt the analysis above to show that H is conditionally beta (r+k,s+n-k),(r+k,s+n-k), given Sn=kSn=k.

F H | S ( t | k ) = E [ I t ( H ) I { k } ( S n ) ] E [ I { k } ( S n ) ] where I t ( H ) = I [ 0 , t ] ( H ) F H | S ( t | k ) = E [ I t ( H ) I { k } ( S n ) ] E [ I { k } ( S n ) ] where I t ( H ) = I [ 0 , t ] ( H )
(24)

The analysis goes through exactly as for E[H|Sn=k]E[H|Sn=k], except that H is replaced by It(H)It(H). In the integral expression for the numerator, one factor u is replaced by It(u)It(u). ForHH beta (r,s)(r,s), we get

F H | S ( t | k ) = Γ ( r + s + n ) Γ ( r + k ) Γ ( n + s - k ) 0 t u k + r - 1 ( 1 - u ) n + s - k - 1 d u = 0 t f H | S ( u | k ) d u F H | S ( t | k ) = Γ ( r + s + n ) Γ ( r + k ) Γ ( n + s - k ) 0 t u k + r - 1 ( 1 - u ) n + s - k - 1 d u = 0 t f H | S ( u | k ) d u
(25)

The integrand is the density for beta (r+k,n+s-k)(r+k,n+s-k).

Any prior information on the distribution for H can be utilized to select suitable r,s. r,s. If there is no prior information, we simply take r=1,s=1, r=1,s=1, which corresponds to

HH uniform on (0,1)(0,1). The value is as likely to be in any subinterval of a given length as in any other of the same length. The information in the sample serves to modify the distribution for H, conditional upon that information.

### Example 1: Population proportion with a beta prior

It is desired to estimate the portion of the student body which favors a proposed increase in the student blanket tax to fund the campus radio station. A sample of size n=20n=20 is taken. Fourteen respond in favor of the increase. Assuming prior ignorance (i.e., thatHH beta (1,1)), what is the conditional distribution given S20=14S20=14? After the first sample is taken, a second sample of size n=20n=20 is taken, with thirteen favorable responses. Analysis is made using the conditional distribution for the first sample as the prior for the second. Make a new estimate of H.

SOLUTION

For the first sample the parameters are r=s=1r=s=1. According the treatment above, H is conditionally beta (k+r,n+s-k)=(15,7)(k+r,n+s-k)=(15,7). The density has a maximum at (r+k-1)/(r+k+n+s-k-2)=k/n(r+k-1)/(r+k+n+s-k-2)=k/n. The conditional expecation, however, is (r+k)/(r+s+n)=15/220.6818(r+k)/(r+s+n)=15/220.6818.

For the second sample, with the conditional distribution as the new prior, we should expect more sharpening of the density about the new mean-square estimate. For the new sample, n=20n=20, k=13k=13, and the prior HH beta (15,7)(15,7). The new conditional distribution has parameters r*=15+13=28r*=15+13=28 and s*=20+7-13=14s*=20+7-13=14. The density has a maximum at t=(28-1)/(28+14-2)=27/40=0.6750t=(28-1)/(28+14-2)=27/40=0.6750. The best estimate of H is 28/(28+14)=2/328/(28+14)=2/3. The conditonal densities in the two cases may be plotted with MATLAB (see Figure 1).

t = 0:0.01:1;
plot(t,beta(15,7,t),'k-',t,beta(28,14,t),'k--')


As expected, the maximum for the second is somewhat larger and occurs at a slightly smaller t, reflecting the smaller k. And the density in the second case shows less spread, resulting from the fact that prior information from the first sample is incorporated into the analysis of the second sample.

The same result is obtained if the two samples are combined into one sample of size 40.

It may be well to compare the result of Bayesian analysis with that for classical statistics. Since, in the latter, case prior information is not utilized, we make the comparison with the case of no prior knowledge (r=s=1r=s=1). For the classical case, the estimator for μ is the sample average; for the Bayesian case with beta prior, the estimate is the conditional expectation of H, given Sn.

If S n = k : Classical estimate = k / n Bayesian estimate = ( k + 1 ) / ( n + 2 ) If S n = k : Classical estimate = k / n Bayesian estimate = ( k + 1 ) / ( n + 2 )
(26)

For large sample size n, these do not differ significantly. For small samples, the difference may be quite important. The Bayesian estimate is often referred to as the small sample estimate, although there is nothing in the Bayesian procedure which calls for small samples. In any event, the Bayesian estimate seems preferable for small samples, and it has the advantage that prior information may be utilized. The sampling procedure upgrades the prior distribution.

The essential idea of the Bayesian approach is the view that an unknown parameter about which there is uncertainty is modeled as the value of a random variable. The name Bayesian comes from the role of Bayesian reversal in the analysis.

The application of Bayesian analysis to the population proportion required Bayesian reversal in the case of discrete Sn. We consider, next, this reversal process when all random variables are absolutely continuous.

The Bayesian reversal for a joint absolutely continuous pair

In the treatment above, we utilize the fact that the conditioning random variable Sn is discrete. Suppose the pair {W,H}{W,H} is jointly absolutely continuous, and fW|H(t|u)fW|H(t|u) and fH(u)fH(u) are specified. To determine

E [ H | W = t ] = u f H | W ( u | t ) d u E [ H | W = t ] = u f H | W ( u | t ) d u
(27)

we need fH|W(u|t)fH|W(u|t). This requires a Bayesian reversal of the conditional densities. Now by definition

f H | W ( u | t ) = f W H ( t , u ) f W ( t ) and f W H ( t , u ) = f W | H ( t | u ) f H ( u ) f H | W ( u | t ) = f W H ( t , u ) f W ( t ) and f W H ( t , u ) = f W | H ( t | u ) f H ( u )
(28)

Since by the rule for determining the marginal density

f W ( t ) = f W H ( t . u ) d u = f W | H ( t | u ) f H ( u ) d u f W ( t ) = f W H ( t . u ) d u = f W | H ( t | u ) f H ( u ) d u
(29)

we have

f H | W ( u | t ) = f W | H ( t | u ) f H ( u ) f W | H ( t | u ) f H ( u ) d u and E [ H | W = t ] = u f W | H ( t | u ) f H ( u ) d u f W | H ( t | u ) f H ( u ) d u f H | W ( u | t ) = f W | H ( t | u ) f H ( u ) f W | H ( t | u ) f H ( u ) d u and E [ H | W = t ] = u f W | H ( t | u ) f H ( u ) d u f W | H ( t | u ) f H ( u ) d u
(30)

### Example 2: A Bayesian reversal

Suppose HH exponential (λ)(λ) and the Xi are conditionally iid, exponential (u)(u), given H=u. H=u. A sample of size n is taken. Put W=(X1,X2,,Xn)W=(X1,X2,,Xn), t=(t1,t2,,tn)t=(t1,t2,,tn), and t*=t1+t2++tnt*=t1+t2++tn. Determine the best mean-square estimate of H, given W=tW=t.

SOLUTION

f X i | H ( t i | u ) = u e - u t i so that f W | H ( t | u ) = i = 1 n u e - u t i = u n e - u t * f X i | H ( t i | u ) = u e - u t i so that f W | H ( t | u ) = i = 1 n u e - u t i = u n e - u t *
(31)

Hence

E [ H | W = t ] = u f H | W ( u | t ) d u = 0 u n + 1 e - u t * λ e - λ u d u 0 u n e - u t * λ e - λ u d u E [ H | W = t ] = u f H | W ( u | t ) d u = 0 u n + 1 e - u t * λ e - λ u d u 0 u n e - u t * λ e - λ u d u
(32)
= 0 u n + 1 e - ( λ + t * ) u d u 0 u n e - ( λ + t * ) u d u = ( n + 1 ) ! ( λ + t * ) n + 2 · ( λ + t * ) n + 1 n ! = n + 1 ( λ + t * ) where t * = i = 1 n t i = 0 u n + 1 e - ( λ + t * ) u d u 0 u n e - ( λ + t * ) u d u = ( n + 1 ) ! ( λ + t * ) n + 2 · ( λ + t * ) n + 1 n ! = n + 1 ( λ + t * ) where t * = i = 1 n t i
(33)

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks