# Connexions

You are here: Home » Content » Conditional Expectation, Regression

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Applied Probability"

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This module is included inLens: UniqU's lens
By: UniqU, LLCAs a part of collection: "Applied Probability"

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

# Conditional Expectation, Regression

Module by: Paul E Pfeiffer. E-mail the author

Summary: Conditional expectation, given a random vector, plays a fundamental role in much of modern probability theory. Various types of “conditioning” characterize some of the more important random sequences and processes. The notion of conditional independence is expressed in terms of conditional expectation. Conditional independence plays an essential role in the theory of Markov processes and in much of decision theory.

Conditional expectation, given a random vector, plays a fundamental role in much of modern probability theory. Various types of “conditioning” characterize some of the more important random sequences and processes. The notion of conditional independence is expressed in terms of conditional expectation. Conditional independence plays an essential role in the theory of Markov processes and in much of decision theory.

We first consider an elementary form of conditional expectation with respect to an event. Then we consider two highly intuitive special cases of conditional expectation, given a random variable. In examining these, we identify a fundamental property which provides the basis for a very general extension. We discover that conditional expectation is a random quantity. The basic property for conditional expectation and properties of ordinary expectation are used to obtain four fundamental properties which imply the “expectationlike” character of conditional expectation. An extension of the fundamental property leads directly to the solution of the regression problem which, in turn, gives an alternate interpretation of conditional expectation.

## Conditioning by an event

If a conditioning event C occurs, we modify the original probabilities by introducing the conditional probability measure P(·|C)P(·|C). In making the change from

P ( A ) to P ( A | C ) = P ( A C ) P ( C ) P ( A ) to P ( A | C ) = P ( A C ) P ( C )
(1)

we effectively do two things:

• We limit the possible outcomes to event C
• We “normalize” the probability mass by taking P(C)P(C) as the new unit

It seems reasonable to make a corresponding modification of mathematical expectation when the occurrence of event C is known. The expectation E[X]E[X] is the probability weighted average of the values taken on by X. Two possibilities for making the modification are suggested.

• We could replace the prior probability measure P(·)P(·) with the conditional probability measure P(·|C)P(·|C) and take the weighted average with respect to these new weights.
• We could continue to use the prior probability measure P(·)P(·) and modify the averaging process as follows:
• Consider the values X(ω)X(ω) for only those ωCωC. This may be done by using the random variable ICXICX which has value X(ω)X(ω) for ωCωC and zero elsewhere. The expectation E[ICX]E[ICX] is the probability weighted sum of those values taken on in C.
• The weighted average is obtained by dividing by P(C)P(C).

These two approaches are equivalent. For a simple random variable X=k=1ntkIAkX=k=1ntkIAk in canonical form

E [ I C X ] / P ( C ) = k = 1 n E [ t k I C I A k ] / P ( C ) = k = 1 n t k P ( C A k ) / P ( C ) = k = 1 n t k P ( A k | C ) E [ I C X ] / P ( C ) = k = 1 n E [ t k I C I A k ] / P ( C ) = k = 1 n t k P ( C A k ) / P ( C ) = k = 1 n t k P ( A k | C )
(2)

The final sum is expectation with respect to the conditional probability measure. Arguments using basic theorems on expectation and the approximation of general random variables by simple random variables allow an extension to a general random variable X. The notion of a conditional distribution, given C, and taking weighted averages with respect to the conditional probability is intuitive and natural in this case. However, this point of view is limited. In order to display a natural relationship with more the general concept of conditioning with repspect to a random vector, we adopt the following

Definition. The conditional expectation of X, given event C with positive probability, is the quantity

E [ X | C ] = E [ I C X ] P ( C ) = E [ I C X ] E [ I C ] E [ X | C ] = E [ I C X ] P ( C ) = E [ I C X ] E [ I C ]
(3)

Remark. The product form E[X|C]P(C)=E[ICX]E[X|C]P(C)=E[ICX] is often useful.

### Example 1: A numerical example

Suppose XX exponential (λ)(λ) and C={1/λX2/λ}C={1/λX2/λ}. Now IC=IM(X)IC=IM(X) where M=[1/λ,2/λ]M=[1/λ,2/λ].

P ( C ) = P ( X 1 / λ ) - P ( X > 2 / λ ) = e - 1 - e - 2 and P ( C ) = P ( X 1 / λ ) - P ( X > 2 / λ ) = e - 1 - e - 2 and
(4)
E [ I C X ] = I M ( t ) t λ e - λ t d t = 1 / λ 2 / λ t λ e - λ t d t = 1 λ ( 2 e - 1 - 3 e - 2 ) E [ I C X ] = I M ( t ) t λ e - λ t d t = 1 / λ 2 / λ t λ e - λ t d t = 1 λ ( 2 e - 1 - 3 e - 2 )
(5)

Thus

E [ X | C ] = 2 e - 1 - 3 e - 2 λ ( e - 1 - e - 2 ) 1 . 418 λ E [ X | C ] = 2 e - 1 - 3 e - 2 λ ( e - 1 - e - 2 ) 1 . 418 λ
(6)

## Conditioning by a random vector—discrete case

Suppose X=i=1ntiIAiX=i=1ntiIAi and Y=j=1mujIBjY=j=1mujIBj in canonical form. We supposeP(Ai)=P(X=ti)>0P(Ai)=P(X=ti)>0 and P(Bj)=P(Y=uj)>0P(Bj)=P(Y=uj)>0, for each permissible i,ji,j. Now

P ( Y = u j | X = t i ) = P ( X = t i , Y = u j ) P ( X = t i ) P ( Y = u j | X = t i ) = P ( X = t i , Y = u j ) P ( X = t i )
(7)

We take the expectation relative to the conditional probability P(·|X=ti)P(·|X=ti) to get

E [ g ( Y ) | X = t i ] = j = 1 m g ( u j ) P ( Y = u j | X = t i ) = e ( t i ) E [ g ( Y ) | X = t i ] = j = 1 m g ( u j ) P ( Y = u j | X = t i ) = e ( t i )
(8)

Since we have a value for each ti in the range of X, the function e(·)e(·) is defined on the range of X. Now consider any reasonable set M on the real line and determine the expectation

E [ I M ( X ) g ( Y ) ] = i = 1 n j = 1 m I M ( t i ) g ( u j ) P ( X = t i , Y = u j ) E [ I M ( X ) g ( Y ) ] = i = 1 n j = 1 m I M ( t i ) g ( u j ) P ( X = t i , Y = u j )
(9)
= i = 1 n I M ( t i ) j = 1 m g ( u j ) P ( Y = u j | X = t i ) P ( X = t i ) = i = 1 n I M ( t i ) j = 1 m g ( u j ) P ( Y = u j | X = t i ) P ( X = t i )
(10)
= i = 1 n I M ( t i ) e ( t i ) P ( X = t i ) = E [ I M ( X ) e ( X ) ] = i = 1 n I M ( t i ) e ( t i ) P ( X = t i ) = E [ I M ( X ) e ( X ) ]
(11)

We have the pattern

( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t i ) = E [ g ( Y ) | X = t i ] ( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t i ) = E [ g ( Y ) | X = t i ]
(12)

for all ti in the range of X.

We return to examine this property later. But first, consider an example to display the nature of the concept.

### Example 2: Basic calculations and interpretation

Suppose the pair {X,Y}{X,Y} has the joint distribution

P ( X = t i , Y = u j ) P ( X = t i , Y = u j )
(13)
 X = X = 0 1 4 9 Y = 2 Y = 2 0.05 0.04 0.21 0.15 0 0.05 0.01 0.09 0.1 -1 0.1 0.05 0.1 0.05 P X P X 0.2 0.1 0.4 0.3

Calculate E[Y|X=ti]E[Y|X=ti] for each possible value ti taken on by X

• E[Y|X=0]=-10.100.20+00.050.20+20.050.20E[Y|X=0]=-10.100.20+00.050.20+20.050.20
• =(-1·0.10+0·0.05+2·0.05)/0.20=0=(-1·0.10+0·0.05+2·0.05)/0.20=0
• E[Y|X=1]=(-1·0.05+0·0.01+2·0.04)/0.10=0.30E[Y|X=1]=(-1·0.05+0·0.01+2·0.04)/0.10=0.30
• E[Y|X=4]=(-1·0.10+0·0.09+2·0.21)/0.40=0.80E[Y|X=4]=(-1·0.10+0·0.09+2·0.21)/0.40=0.80
• E[Y|X=9]=(-1·0.05+0·0.10+2·0.15)/0.10=0.83E[Y|X=9]=(-1·0.05+0·0.10+2·0.15)/0.10=0.83

The pattern of operation in each case can be described as follows:

• For the ith column, multiply each value uj by P(X=ti,Y=uj)P(X=ti,Y=uj), sum, then divide by P(X=ti)P(X=ti).

The following interpretation helps visualize the conditional expectation and points to an important result in the general case.

• For each ti we use the mass distributed “above” it. This mass is distributed along a vertical line at values uj taken on by Y. The result of the computation is to determine the center of mass for the conditional distribution above t=tit=ti. As in the case of ordinary expectations, this should be the best estimate, in the mean-square sense, of Y when X=tiX=ti. We examine that possibility in the treatment of the regression problem in Section 5.

Although the calculations are not difficult for a problem of this size, the basic pattern can be implemented simply with MATLAB, making the handling of much larger problems quite easy. This is particularly useful in dealing with the simple approximation to an absolutely continuous pair.

X = [0 1 4 9];             % Data for the joint distribution
Y = [-1 0 2];
P = 0.01*[ 5  4 21 15; 5  1  9 10; 10  5 10  5];
jcalc                      % Setup for calculations
Enter JOINT PROBABILITIES (as on the plane)  P
Enter row matrix of VALUES of X  X
Enter row matrix of VALUES of Y  Y
Use array operations on matrices X, Y, PX, PY, t, u, and P
EYX = sum(u.*P)./sum(P);   % sum(P) = PX  (operation sum yields column sums)
disp([X;EYX]')             % u.*P = u_j P(X = t_i, Y = u_j) for all i, j
0         0
1.0000    0.3000
4.0000    0.8000
9.0000    0.8333


The calculations extend to E[g(X,Y)|X=ti]E[g(X,Y)|X=ti]. Instead of values of uj we use values of g(ti,uj)g(ti,uj) in the calculations. Suppose Z=g(X,Y)=Y2-2XYZ=g(X,Y)=Y2-2XY.

G = u.^2 - 2*t.*u;         % Z = g(X,Y) = Y^2 - 2XY
EZX = sum(G.*P)./sum(P);   % E[Z|X=x]
disp([X;EZX]')
0    1.5000
1.0000    1.5000
4.0000   -4.0500
9.0000  -12.8333


## Conditioning by a random vector — absolutely continuous case

Suppose the pair {X,Y}{X,Y} has joint density function fXYfXY. We seek to use the concept of a conditional distribution, given X=tX=t. The fact that P(X=t)=0P(X=t)=0 for each t requires a modification of the approach adopted in the discrete case. Intuitively, we consider the conditional density

f Y | X ( u | t ) = f X Y ( t , u ) / f X ( t ) for f X ( t ) > 0 0 elsewhere f Y | X ( u | t ) = f X Y ( t , u ) / f X ( t ) for f X ( t ) > 0 0 elsewhere
(14)

The condition fX(t)>0fX(t)>0 effectively determines the range of X. The function fY|X(·|t)fY|X(·|t) has the properties of a density for each fixed t for which fX(t)>0fX(t)>0.

f Y | X ( u | t ) 0 , f Y | X ( u | t ) d u = 1 f X ( t ) f X Y ( t , u ) d u = f X ( t ) / f X ( t ) = 1 f Y | X ( u | t ) 0 , f Y | X ( u | t ) d u = 1 f X ( t ) f X Y ( t , u ) d u = f X ( t ) / f X ( t ) = 1
(15)

We define, in this case,

E [ g ( Y ) | X = t ] = g ( u ) f Y | X ( u | t ) d u = e ( t ) E [ g ( Y ) | X = t ] = g ( u ) f Y | X ( u | t ) d u = e ( t )
(16)

The function e(·)e(·) is defined for fX(t)>0fX(t)>0, hence effectively on the range of X. For any reasonable set M on the real line,

E [ I M ( X ) g ( Y ) ] = I M ( t ) g ( u ) f X Y ( t , u ) d u d t = I M ( t ) g ( u ) f Y | X ( u | t ) d u f X ( t ) d t E [ I M ( X ) g ( Y ) ] = I M ( t ) g ( u ) f X Y ( t , u ) d u d t = I M ( t ) g ( u ) f Y | X ( u | t ) d u f X ( t ) d t
(17)
= I M ( t ) e ( t ) f X ( t ) d t , where e ( t ) = E [ g ( Y ) | X = t ] = I M ( t ) e ( t ) f X ( t ) d t , where e ( t ) = E [ g ( Y ) | X = t ]
(18)

Thus we have, as in the discrete case, for each t in the range of X.

( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t ) = E [ g ( Y ) | X = t ] ( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t ) = E [ g ( Y ) | X = t ]
(19)

Again, we postpone examination of this pattern until we consider a more general case.

### Example 3: Basic calculation and interpretation

Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=65(t+2u)fXY(t,u)=65(t+2u) on the triangular region bounded by t=0t=0, u=1u=1, and u=tu=t (see Figure 1). Then

f X ( t ) = 6 5 t 1 ( t + 2 u ) d u = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1 f X ( t ) = 6 5 t 1 ( t + 2 u ) d u = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1
(20)

By definition, then,

f Y | X ( u | t ) = t + 2 u 1 + t - 2 t 2 on the triangle (zero elsewhere) f Y | X ( u | t ) = t + 2 u 1 + t - 2 t 2 on the triangle (zero elsewhere)
(21)

We thus have

E [ Y | X = t ] = u f Y | X ( u | t ) d u = 1 1 + t - 2 t 2 t 1 ( t u + 2 u 2 ) d u = 4 + 3 t - 7 t 3 6 ( 1 + t - 2 t 2 ) 0 t < 1 E [ Y | X = t ] = u f Y | X ( u | t ) d u = 1 1 + t - 2 t 2 t 1 ( t u + 2 u 2 ) d u = 4 + 3 t - 7 t 3 6 ( 1 + t - 2 t 2 ) 0 t < 1
(22)

Theoretically, we must rule out t=1t=1 since the denominator is zero for that value of t. This causes no problem in practice.

We are able to make an interpretation quite analogous to that for the discrete case. This also points the way to practical MATLAB calculations.

• For any t in the range of X (between 0 and 1 in this case), consider a narrow vertical strip of width ΔtΔt with the vertical line through t at its center. If the strip is narrow enough, then fXY(t,u)fXY(t,u) does not vary appreciably with t for any u.
• The mass in the strip is approximately
MassΔtfXY(t,u)du=ΔtfX(t)MassΔtfXY(t,u)du=ΔtfX(t)
(23)
• The moment of the mass in the strip about the line u=0u=0 is approximately
MomentΔtufXY(t,u)duMomentΔtufXY(t,u)du
(24)
• The center of mass in the strip is
Centerofmass=MomentMassΔtufXY(t,u)duΔtfX(t)=ufY|X(u|t)du=e(t)Centerofmass=MomentMassΔtufXY(t,u)duΔtfX(t)=ufY|X(u|t)du=e(t)
(25)

This interpretation points the way to the use of MATLAB in approximating the conditional expectation. The success of the discrete approach in approximating the theoretical value in turns supports the validity of the interpretation. Also, this points to the general result on regression in the section, "The Regression Problem".

In the MATLAB handling of joint absolutely continuous random variables, we divide the region into narrow vertical strips. Then we deal with each of these by dividing the vertical strips to form the grid structure. The center of mass of the discrete distribution over one of the t chosen for the approximation must lie close to the actual center of mass of the probability in the strip. Consider the MATLAB treatment of the example under consideration.

f = '(6/5)*(t + 2*u).*(u>=t)';                  % Density as string variable
tuappr
Enter matrix [a b] of X-range endpoints  [0 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  200
Enter number of Y approximation points  200
Enter expression for joint density  eval(f)     % Evaluation of string variable
Use array operations on X, Y, PX, PY, t, u, and P
EYx = sum(u.*P)./sum(P);                        % Approximate values
eYx = (4 + 3*X - 7*X.^3)./(6*(1 + X - 2*X.^2)); % Theoretical expression
plot(X,EYx,X,eYx)
% Plotting details             (see Figure 2)


The agreement of the theoretical and approximate values is quite good enough for practical purposes. It also indicates that the interpretation is reasonable, since the approximation determines the center of mass of the discretized mass which approximates the center of the actual mass in each vertical strip.

## Extension to the general case

Most examples for which we make numerical calculations will be one of the types above. Analysis of these cases is built upon the intuitive notion of conditional distributions. However, these cases and this interpretation are rather limited and do not provide the basis for the range of applications—theoretical and practical—which characterize modern probability theory. We seek a basis for extension (which includes the special cases). In each case examined above, we have the property

( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t ) = E [ g ( Y ) | X = t ] ( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] where e ( t ) = E [ g ( Y ) | X = t ]
(26)

for all t in the range of X.

We have a tie to the simple case of conditioning with respect to an event. If C={XM}C={XM} has positive probability, then using IC=IM(X)IC=IM(X) we have

( B ) E [ I M ( X ) g ( Y ) ] = E [ g ( Y ) | X M ] P ( X M ) ( B ) E [ I M ( X ) g ( Y ) ] = E [ g ( Y ) | X M ] P ( X M )
(27)

Two properties of expectation are crucial here:

1. By the uniqueness property (E5), since (A) holds for all reasonable (Borel) sets, then e(X)e(X) is unique a.s. (i.e., except for a set of ω of probability zero).
2. By the special case of the Radon Nikodym theorem (E19), the function e(·)e(·)always exists and is such that random variable e(X)e(X) is unique a.s.

We make a definition based on these facts.

Definition. The conditional expectationE[g(Y)|X=t]=e(t)E[g(Y)|X=t]=e(t) is the a.s. unique function defined on the range of X such that

( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] for all Borel sets M ( A ) E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] for all Borel sets M
(28)

Note that e(X)e(X) is a random variable and e(·)e(·) is a function. Expectation E[g(Y)]E[g(Y)] is always a constant. The concept is abstract. At this point it has little apparent significance, except that it must include the two special cases studied in the previous sections. Also, it is not clear why the term conditional expectation should be used. The justification rests in certain formal properties which are based on the defining condition (A) and other properties of expectation.

In Appendix F we tabulate a number of key properties of conditional expectation. The condition (A) is called property (CE1). We examine several of these properties. For a detailed treatment and proofs, any of a number of books on measure-theoretic probability may be consulted.

(CE1) Defining condition. e(X)=E[g(Y)|X]e(X)=E[g(Y)|X] a.s. iff

E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] for each Borel set M on the codomain of X E [ I M ( X ) g ( Y ) ] = E [ I M ( X ) e ( X ) ] for each Borel set M on the codomain of X
(29)

Note that X and Y do not need to be real valued, although g(Y)g(Y) is real valued. This extension to possible vector valued X and Y is extremely important. The next condition is just the property (B) noted above.

(CE1a) If P(XM)>0P(XM)>0, then E[IM(X)e(X)]=E[g(Y)|XM]P(XM)E[IM(X)e(X)]=E[g(Y)|XM]P(XM)

The special case which is obtained by setting M to include the entire range of X so that IM(X(ω))=1IM(X(ω))=1 for all ω is useful in many theoretical and applied problems.

(CE1b) Law of total probability. E[g(Y)]=E{E[g(Y)|X]}E[g(Y)]=E{E[g(Y)|X]}

It may seem strange that we should complicate the problem of determining E[g(Y)]E[g(Y)] by first getting the conditional expectation e(X)=E[g(Y)|X]e(X)=E[g(Y)|X] then taking expectation of that function. Frequently, the data supplied in a problem makes this the expedient procedure.

### Example 4: Use of the law of total probability

Suppose the time to failure of a device is a random quantity XX exponential (u)(u), where the parameter u is the value of a parameter random variable H. Thus

f X | H ( t | u ) = u e - u t for t 0 f X | H ( t | u ) = u e - u t for t 0
(30)

If the parameter random variable HH uniform (a,b)(a,b), determine the expected life E[X]E[X] of the device.

SOLUTION

We use the law of total probability:

E [ X ] = E { E [ X | H ] } = E [ X | H = u ] f H ( u ) d u E [ X ] = E { E [ X | H ] } = E [ X | H = u ] f H ( u ) d u
(31)

Now by assumption

E [ X | H = u ] = 1 / u and f H ( u ) = 1 b - a , a < u < b E [ X | H = u ] = 1 / u and f H ( u ) = 1 b - a , a < u < b
(32)

Thus

E [ X ] = 1 b - a a b 1 u d u = ln ( b / a ) b - a E [ X ] = 1 b - a a b 1 u d u = ln ( b / a ) b - a
(33)

For a=1/100,b=2/100a=1/100,b=2/100, E[X]=100ln(2)69.31E[X]=100ln(2)69.31.

The next three properties, linearity, positivity/monotonicity, and monotone convergence, along with the defining condition provide the “expectation like” character. These properties for expectation yield most of the other essential properties for expectation. A similar development holds for conditional expectation, with some reservation for the fact that e(X)e(X) is a random variable, unique a.s. This restriction causes little problem for applications at the level of this treatment.

In order to get some sense of how these properties root in basic properties of expectation, we examine one of them.

(CE2) Linearity. For any constants a,ba,b

E [ a g ( Y ) + b h ( Z ) | X ] = a E [ g ( Y ) | X ] + b E [ h ( Z ) | X ] a . s . E [ a g ( Y ) + b h ( Z ) | X ] = a E [ g ( Y ) | X ] + b E [ h ( Z ) | X ] a . s .
(34)

VERIFICATION

Let e1(X)=E[g(Y)|X],e2(X)=E[h(Z)|X],ande(X)=E[ag(Y)+bh(Z)|X]a.s.e1(X)=E[g(Y)|X],e2(X)=E[h(Z)|X],ande(X)=E[ag(Y)+bh(Z)|X]a.s..

E[IM(X)e(X)] = E{IM(X)[ag(Y)+bh(Z)]}a.s. by (CE1) = aE[IM(X)g(Y)]+bE[IM(X)h(Z)]a.s. by linearity of expectation = aE[IM(X)e1(X)]+bE[IM(X)e2(X)]a.s. by (CE1) = E{IM(X)[ae1(X)+be2(X)]}a.s. by linearity of expectation E[IM(X)e(X)] = E{IM(X)[ag(Y)+bh(Z)]}a.s. by (CE1) = aE[IM(X)g(Y)]+bE[IM(X)h(Z)]a.s. by linearity of expectation = aE[IM(X)e1(X)]+bE[IM(X)e2(X)]a.s. by (CE1) = E{IM(X)[ae1(X)+be2(X)]}a.s. by linearity of expectation

Since the equalities hold for any Borel M, the uniqueness property (E5) for expectation implies

e ( X ) = a e 1 ( X ) + b e 2 ( X ) a . s . e ( X ) = a e 1 ( X ) + b e 2 ( X ) a . s .
(35)

This is property (CE2). An extension to any finite linear combination is easily established by mathematical induction.

Property (CE5) provides another condition for independence.

(CE5) Independence. {X,Y}{X,Y} is an independent pair

• iff E[g(Y)|X]=E[g(Y)]E[g(Y)|X]=E[g(Y)] a.s. for all Borel functions g
• iff E[IN(Y)|X]=E[IN(Y)]E[IN(Y)|X]=E[IN(Y)] a.s. for all Borel sets N on the codomain of Y

Since knowledge of X does not affect the likelihood that Y will take on any set of values, then conditional expectation should not be affected by the value of X. The resulting constant value of the conditional expectation must be E[g(Y)]E[g(Y)] in order for the law of total probability to hold. A formal proof utilizes uniqueness (E5) and the product rule (E18) for expectation.

Property (CE6) forms the basis for the solution of the regresson problem in the next section.

(CE6) e(X)=E[g(Y)|X]e(X)=E[g(Y)|X] a.s. iff E[h(X)g(Y)]=E[h(X)e(X)]E[h(X)g(Y)]=E[h(X)e(X)] a.s. for any Borel function h

Examination shows this to be the result of replacing IM(X)IM(X) in (CE1) with arbitrary h(X)h(X). Again, to get some insight into how the various properties arise, we sketch the ideas of a proof of (CE6).

IDEAS OF A PROOF OF (CE6)

1. For h(X)=IM(X)h(X)=IM(X), this is (CE1).
2. For h(X)=i=1naiIMi(X)h(X)=i=1naiIMi(X), the result follows by linearity.
3. For h0,g0h0,g0, there is a seqence of nonnegative, simple hnhhnh. Now by positivity, e(X)0e(X)0. By monotone convergence (CE4),
E[hn(X)g(Y)]E[h(X)g(Y)]andE[hn(X)e(X)]E[h(X)e(X)]E[hn(X)g(Y)]E[h(X)g(Y)]andE[hn(X)e(X)]E[h(X)e(X)]
(36)
Since corresponding terms in the sequences are equal, the limits are equal.
4. For h=h+-h-h=h+-h-, g0g0, the result follows by linearity (CE2).
5. For g=g+-g-g=g+-g-, the result again follows by linearity.

Properties (CE8) and (CE9) are peculiar to conditional expectation. They play an essential role in many theoretical developments. They are essential in the study of Markov sequences and of a class of random sequences known as submartingales. We list them here (as well as in Appendix F) for reference.

(CE8) E[h(X)g(Y)|X]=h(X)E[g(Y)|X]E[h(X)g(Y)|X]=h(X)E[g(Y)|X] a.s. for any Borel function h

This property says that any function of the conditioning random vector may be treated as a constant factor. This combined with (CE10) below provide useful aids to computation.

(CE9) Repeated conditioning

If X = h ( W ) , then E { E [ g ( Y ) | X ] | W } = E { E [ g ( Y ) | W ] | X } = E [ g ( Y ) | X ] a . s . If X = h ( W ) , then E { E [ g ( Y ) | X ] | W } = E { E [ g ( Y ) | W ] | X } = E [ g ( Y ) | X ] a . s .
(37)

This somewhat formal property is highly useful in many theoretical developments. We provide an interpretation after the development of regression theory in the next section.

The next property is highly intuitive and very useful. It is easy to establish in the two elementary cases developed in previous sections. Its proof in the general case is quite sophisticated.

(CE10) Under conditions on g that are nearly always met in practice

1. E[g(X,Y)|X=t]=E[g(t,Y)|X=t]a.s.[PX]E[g(X,Y)|X=t]=E[g(t,Y)|X=t]a.s.[PX]
2. If {X,Y}{X,Y} is independent, then E[g(X,Y)|X=t]=E[g(t,Y)]a.s.[PX]E[g(X,Y)|X=t]=E[g(t,Y)]a.s.[PX]

It certainly seem reasonable to suppose that if X=tX=t, then we should be able to replace X by t in E[g(X,Y)|X=t]E[g(X,Y)|X=t] to get E[g(t,Y)|X=t]E[g(t,Y)|X=t]. Property (CE10) assures this. If {X,Y}{X,Y} is an independent pair, then the value of X should not affect the value of Y, so that E[g(t,Y)|X=t]=E[g(t,Y)]a.s.E[g(t,Y)|X=t]=E[g(t,Y)]a.s..

### Example 5: Use of property (CE10)

Consider again the distribution for Example 3. The pair {X,Y}{X,Y} has density

f X Y ( t , u ) = 6 5 ( t + 2 u ) on the triangular region bounded by t = 0 , u = 1 , and u = t f X Y ( t , u ) = 6 5 ( t + 2 u ) on the triangular region bounded by t = 0 , u = 1 , and u = t
(38)

We show in Example 3 that

E [ Y | X = t ] = 4 + 3 t - 7 t 3 6 ( 1 + t - 2 t 2 ) 0 t < 1 E [ Y | X = t ] = 4 + 3 t - 7 t 3 6 ( 1 + t - 2 t 2 ) 0 t < 1
(39)

Let Z=3X2+2XYZ=3X2+2XY. Determine E[Z|X=t]E[Z|X=t].

SOLUTION

By linearity, (CE8), and (CE10)

E [ Z | X = t ] = 3 t 2 + 2 t E [ Y | X = t ] = 3 t 2 + 4 t + 3 t 2 - 7 t 4 3 ( 1 + t - 2 t 2 ) E [ Z | X = t ] = 3 t 2 + 2 t E [ Y | X = t ] = 3 t 2 + 4 t + 3 t 2 - 7 t 4 3 ( 1 + t - 2 t 2 )
(40)

Conditional probability

In the treatment of mathematical expectation, we note that probability may be expressed as an expectation

P ( E ) = E [ I E ] P ( E ) = E [ I E ]
(41)

For conditional probability, given an event, we have

E [ I E | C ] = E [ I E I C ] P ( C ) = P ( E C ) P ( C ) = P ( E | C ) E [ I E | C ] = E [ I E I C ] P ( C ) = P ( E C ) P ( C ) = P ( E | C )
(42)

In this manner, we extend the concept conditional expectation.

Definition. The conditional probability of event E, given X, is

P ( E | X ) = E [ I E | X ] P ( E | X ) = E [ I E | X ]
(43)

Thus, there is no need for a separate theory of conditional probability. We may define the conditional distribution function

F Y | X ( u | X ) = P ( Y u | X ) = E [ I ( - , u ] ( Y ) | X ] F Y | X ( u | X ) = P ( Y u | X ) = E [ I ( - , u ] ( Y ) | X ]
(44)

Then, by the law of total probability (CE1b),

F Y ( u ) = E [ F Y | X ( u | X ) ] = F Y | X ( u | t ) F X ( d t ) F Y ( u ) = E [ F Y | X ( u | X ) ] = F Y | X ( u | t ) F X ( d t )
(45)

If there is a conditional density fY|XfY|X such that

P ( Y M | X = t ) = M f Y | X ( r | t ) d r P ( Y M | X = t ) = M f Y | X ( r | t ) d r
(46)

then

F Y | X ( u | t ) = - u f Y | X ( r | t ) d r so that f Y | X ( u | t ) = u F Y | X ( u | t ) F Y | X ( u | t ) = - u f Y | X ( r | t ) d r so that f Y | X ( u | t ) = u F Y | X ( u | t )
(47)

A careful, measure-theoretic treatment shows that it may not be true that FY|X(·|t)FY|X(·|t) is a distribution function for all t in the range of X. However, in applications, this is seldom a problem. Modeling assumptions often start with such a family of distribution functions or density functions.

### Example 6: The conditional distribution function

As in Example 4, suppose XX exponential (u)(u), where the parameter u is the value of a parameter random variable H. If the parameter random variable HH uniform (a,b)(a,b), determine the distribution function FX.

SOLUTON

As in Example 4, take the assumption on the conditional distribution to mean

f X | H ( t | u ) = u e - u t t 0 f X | H ( t | u ) = u e - u t t 0
(48)

Then

F X | H ( t | u ) = 0 t u e - u s d s = 1 - e - u t 0 t F X | H ( t | u ) = 0 t u e - u s d s = 1 - e - u t 0 t
(49)

By the law of total probability

F X ( t ) = F X | H ( t | u ) f H ( u ) d u = 1 b - a a b ( 1 - e - u t ) d u = 1 - 1 b - a a b e - u t d u F X ( t ) = F X | H ( t | u ) f H ( u ) d u = 1 b - a a b ( 1 - e - u t ) d u = 1 - 1 b - a a b e - u t d u
(50)
= 1 - 1 t ( b - a ) [ e - b t - e - a t ] = 1 - 1 t ( b - a ) [ e - b t - e - a t ]
(51)

Differentiation with respect to t yields the expression for fX(t)fX(t)

f X ( t ) = 1 b - a 1 t 2 + b t e - b t - 1 t 2 + a t e - a t t > 0 f X ( t ) = 1 b - a 1 t 2 + b t e - b t - 1 t 2 + a t e - a t t > 0
(52)

The following example uses a discrete conditional distribution and marginal distribution to obtain the joint distribution for the pair.

### Example 7: A random number N of Bernoulli trials

A number N is chosen by a random selection from the integers from 1 through 20 (say by drawing a card from a box). A pair of dice is thrown N times. Let S be the number of “matches” (i.e., both ones, both twos, etc.). Determine the joint distribution for {N,S}{N,S}.

SOLUTION

NN uniform on the integers 1 through 20. P(N=i)=1/20P(N=i)=1/20 for 1i201i20. Since there are 36 pairs of numbers for the two dice and six possible matches, the probability of a match on any throw is 1/6. Since the i throws of the dice constitute a Bernoulli sequence with probability 1/6 of a success (a match), we have S conditionally binomial (i,1/6)(i,1/6), given N=iN=i. For any pair (i,j)(i,j), 0ji0ji,

P ( N = i , S = j ) = P ( S = j | N = i ) P ( N = i ) P ( N = i , S = j ) = P ( S = j | N = i ) P ( N = i )
(53)

Now E[S|N=i]=i/6E[S|N=i]=i/6. so that

E [ S ] = 1 6 · 1 20 i = 1 20 i = 20 · 21 6 · 20 · 2 = 7 4 = 1 . 75 E [ S ] = 1 6 · 1 20 i = 1 20 i = 20 · 21 6 · 20 · 2 = 7 4 = 1 . 75
(54)

The following MATLAB procedure calculates the joint probabilities and arranges them “as on the plane.”

% file randbern.m
p  = input('Enter the probability of success  ');
N  = input('Enter VALUES of N  ');
PN = input('Enter PROBABILITIES for N  ');
n  = length(N);
m  = max(N);
S  = 0:m;
P  = zeros(n,m+1);
for i = 1:n
P(i,1:N(i)+1) = PN(i)*ibinom(N(i),p,0:N(i));
end
PS = sum(P);
P  = rot90(P);
disp('Joint distribution N, S, P, and marginal PS')
randbern                           % Call for the procedure
Enter the probability of success  1/6
Enter VALUES of N  1:20
Enter PROBABILITIES for N  0.05*ones(1,20)
Joint distribution N, S, P, and marginal PS
ES = S*PS'
ES =  1.7500                          % Agrees with the theoretical value


## The regression problem

We introduce the regression problem in the treatment of linear regression. Here we are concerned with more general regression. A pair {X,Y}{X,Y} of real random variables has a joint distribution. A value X(ω)X(ω) is observed. We desire a rule for obtaining the “best” estimate of the corresponding value Y(ω)Y(ω). If Y(ω)Y(ω) is the actual value and r(X(ω))r(X(ω)) is the estimate, then Y(ω)-r(X(ω))Y(ω)-r(X(ω)) is the error of estimate. The best estimation rule (function) r(·)r(·) is taken to be that for which the average square of the error is a minimum. That is, we seek a function r such that

E [ ( Y - r ( X ) ) 2 ] is a minimum. E [ ( Y - r ( X ) ) 2 ] is a minimum.
(55)

In the treatment of linear regression, we determine the best affine function, u=at+bu=at+b. The optimum function of this form defines the regression line of Y on X. We now turn to the problem of finding the best function r, which may in some cases be an affine function, but more often is not.

We have some hints of possibilities. In the treatment of expectation, we find that the best constant to approximate a random variable in the mean square sense is the mean value, which is the center of mass for the distribution. In the interpretive Example 14.2.1 for the discrete case, we find the conditional expectation E[Y|X=ti]E[Y|X=ti] is the center of mass for the conditional distribution at X=tiX=ti. A similar result, considering thin vertical strips, is found in Example 2 for the absolutely continuous case. This suggests the possibility that e(t)=E[Y|X=t]e(t)=E[Y|X=t] might be the best estimate for Y when the value X(ω)=tX(ω)=t is observed. We investigate this possibility. The property (CE6) proves to be key to obtaining the result.

Let e(X)=E[Y|X]e(X)=E[Y|X]. We may write (CE6) in the form E[h(X)(Y-e(X))]=0E[h(X)(Y-e(X))]=0 for any reasonable function h. Consider

E [ ( Y - r ( X ) ) 2 ] = E [ ( Y - e ( X ) + e ( X ) - r ( X ) ) 2 ] E [ ( Y - r ( X ) ) 2 ] = E [ ( Y - e ( X ) + e ( X ) - r ( X ) ) 2 ]
(56)
= E [ ( Y - e ( X ) ) 2 ] + E [ ( e ( X ) - r ( X ) ) 2 ] + 2 E [ ( Y - e ( X ) ) ( r ( X ) - e ( X ) ) ] = E [ ( Y - e ( X ) ) 2 ] + E [ ( e ( X ) - r ( X ) ) 2 ] + 2 E [ ( Y - e ( X ) ) ( r ( X ) - e ( X ) ) ]
(57)

Now e(X)e(X) is fixed (a.s.) and for any choice of r we may take h(X)=r(X)-e(X)h(X)=r(X)-e(X) to assert that

E [ ( Y - e ( X ) ) ( r ( X ) - e ( X ) ) ] = E [ ( Y - e ( X ) ) h ( X ) ] = 0 E [ ( Y - e ( X ) ) ( r ( X ) - e ( X ) ) ] = E [ ( Y - e ( X ) ) h ( X ) ] = 0
(58)

Thus

E [ ( Y - r ( X ) ) 2 ] = E [ ( Y - e ( X ) ) 2 ] + E [ ( e ( X ) - r ( X ) ) 2 ] E [ ( Y - r ( X ) ) 2 ] = E [ ( Y - e ( X ) ) 2 ] + E [ ( e ( X ) - r ( X ) ) 2 ]
(59)

The first term on the right hand side is fixed; the second term is nonnegative, with a minimum at zero iff r(X)=e(X)a.s.r(X)=e(X)a.s. Thus, r=er=e is the best rule. For a given value X(ω)=tX(ω)=t the best mean square estimate of Y is

u = e ( t ) = E [ Y | X = t ] u = e ( t ) = E [ Y | X = t ]
(60)

The graph of u=e(t)u=e(t) vs t is known as the regression curve of Y on X. This is defined for argument t in the range of X, and is unique except possibly on a set N such that P(XN)=0P(XN)=0. Determination of the regression curve is thus determination of the conditional expectation.

### Example 8: Regression curve for an independent pair

If the pair {X,Y}{X,Y} is independent, then u=E[Y|X=t]=E[Y]u=E[Y|X=t]=E[Y], so that the regression curve of Y on X is the horizontal line through u=E[Y]u=E[Y]. This, of course, agrees with the regression line, since Cov [X,Y]=0 Cov [X,Y]=0 and the regression line is u=0+E[Y]u=0+E[Y].

The result extends to functions of X and Y. Suppose Z=g(X,Y)Z=g(X,Y). Then the pair {X,Z}{X,Z} has a joint distribution, and the best mean square estimate of Z given X=tX=t is E[Z|X=t]E[Z|X=t].

### Example 9: Estimate of a function of {X,Y}{X,Y}

Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=60t2ufXY(t,u)=60t2u for 0t10t1, 0u1-t0u1-t. This is the triangular region bounded by t=0t=0, u=0u=0, and u=1-tu=1-t (see Figure 3). Integration shows that

f X ( t ) = 30 t 2 ( 1 - t ) 2 , 0 t 1 and f Y | X ( u | t ) = 2 u ( 1 - t ) 2 on the triangle f X ( t ) = 30 t 2 ( 1 - t ) 2 , 0 t 1 and f Y | X ( u | t ) = 2 u ( 1 - t ) 2 on the triangle
(61)

Consider

Z = X 2 for X 1 / 2 2 Y for X > 1 / 2 = I M ( X ) X 2 + I N ( X ) 2 Y Z = X 2 for X 1 / 2 2 Y for X > 1 / 2 = I M ( X ) X 2 + I N ( X ) 2 Y
(62)

where M=[0,1/2]M=[0,1/2] and N=(1/2,1]N=(1/2,1]. Determine E[Z|X=t]E[Z|X=t].

SOLUTION By linearity and (CE8),

E [ Z | X = t ] = E [ I M ( X ) X 2 | | X = t ] + E [ I N ( X ) 2 Y | | X = t ] = I M ( t ) t 2 + I N ( t ) 2 E [ Y | X = t ] E [ Z | X = t ] = E [ I M ( X ) X 2 | | X = t ] + E [ I N ( X ) 2 Y | | X = t ] = I M ( t ) t 2 + I N ( t ) 2 E [ Y | X = t ]
(63)

Now

E [ Y | X = t ] = u f Y | X ( u | t ) d u = 1 ( 1 - t ) 2 0 1 - t 2 u 2 d u = 2 3 · ( 1 - t ) 3 ( 1 - t ) 2 = 2 3 ( 1 - t ) , 0 t < 1 E [ Y | X = t ] = u f Y | X ( u | t ) d u = 1 ( 1 - t ) 2 0 1 - t 2 u 2 d u = 2 3 · ( 1 - t ) 3 ( 1 - t ) 2 = 2 3 ( 1 - t ) , 0 t < 1
(64)

so that

E [ Z | X = t ] = I M ( t ) t 2 + I N ( t ) 4 3 ( 1 - t ) E [ Z | X = t ] = I M ( t ) t 2 + I N ( t ) 4 3 ( 1 - t )
(65)

Note that the indicator functions separate the two expressions. The first holds on the interval M=[0,1/2]M=[0,1/2] and the second holds on the interval N=(1/2,1]N=(1/2,1]. The two expressions t2 and (4/3)(1-t)(4/3)(1-t)must not be added, for this would give an expression incorrect for all t in the range of X.

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  100
Enter number of Y approximation points  100
Enter expression for joint density  60*t.^2.*u.*(u<=1-t)
Use array operations on X, Y, PX, PY, t, u, and P
G = (t<=0.5).*t.^2 + 2*(t>0.5).*u;
EZx = sum(G.*P)./sum(P);                       % Approximation
eZx = (X<=0.5).*X.^2 + (4/3)*(X>0.5).*(1-X);   % Theoretical
plot(X,EZx,'k-',X,eZx,'k-.')
% Plotting details                             % See Figure 4


The fit is quite sufficient for practical purposes, in spite of the moderate number of approximation points. The difference in expressions for the two intervals of X values is quite clear.

### Example 10: Estimate of a function of {X,Y}{X,Y}

Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=65(t2+u)fXY(t,u)=65(t2+u), on the unit square 0t10t1, 0u10u1 (see Figure 5). The usual integration shows

f X ( t ) = 3 5 ( 2 t 2 + 1 ) , 0 t 1 , and f Y | X ( u | t ) = 2 t 2 + u 2 t 2 + 1 on the square f X ( t ) = 3 5 ( 2 t 2 + 1 ) , 0 t 1 , and f Y | X ( u | t ) = 2 t 2 + u 2 t 2 + 1 on the square
(66)

Consider

Z = 2 X 2 for X Y 3 X Y for X > Y = I Q ( X , Y ) 2 X 2 + I Q c ( X , Y ) 3 X Y , where Q = { ( t , u ) : u t } Z = 2 X 2 for X Y 3 X Y for X > Y = I Q ( X , Y ) 2 X 2 + I Q c ( X , Y ) 3 X Y , where Q = { ( t , u ) : u t }
(67)

Determine E[Z|X=t]E[Z|X=t].

SOLUTION

E [ Z | X = t ] = 2 t 2 I Q ( t , u ) f Y | X ( u | t ) d u + 3 t I Q c ( t , u ) u f Y | X ( u | t ) d u E [ Z | X = t ] = 2 t 2 I Q ( t , u ) f Y | X ( u | t ) d u + 3 t I Q c ( t , u ) u f Y | X ( u | t ) d u
(68)
= 4 t 2 2 t 2 + 1 t 1 ( t 2 + u ) d u + 6 t 2 t 2 + 1 0 t ( t 2 u + u 2 ) d u = - t 5 + 4 t 4 + 2 t 2 2 t 2 + 1 , 0 t 1 = 4 t 2 2 t 2 + 1 t 1 ( t 2 + u ) d u + 6 t 2 t 2 + 1 0 t ( t 2 u + u 2 ) d u = - t 5 + 4 t 4 + 2 t 2 2 t 2 + 1 , 0 t 1
(69)

Note the different role of the indicator functions than in Example 9. There they provide a separation of two parts of the result. Here they serve to set the effective limits of integration, but sum of the two parts is needed for each t.

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  200
Enter number of Y approximation points  200
Enter expression for joint density  (6/5)*(t.^2 + u)
Use array operations on X, Y, PX, PY, t, u, and P
G = 2*t.^2.*(u>=t) + 3*t.*u.*(u<t);
EZx = sum(G.*P)./sum(P);                        % Approximate
eZx = (-X.^5 + 4*X.^4 + 2*X.^2)./(2*X.^2 + 1);  % Theoretical
plot(X,EZx,'k-',X,eZx,'k-.')
% Plotting details                              % See Figure 6


The theoretical and approximate are barely distinguishable on the plot. Although the same number of approximation points are use as in Figure 4 (Example 9), the fact that the entire region is included in the grid means a larger number of effective points in this example.

Given our approach to conditional expectation, the fact that it solves the regression problem is a matter that requires proof using properties of of conditional expectation. An alternate approach is simply to define the conditional expectation to be the solution to the regression problem, then determine its properties. This yields, in particular, our defining condition (CE1). Once that is established, properties of expectation (including the uniqueness property (E5)) show the essential equivalence of the two concepts. There are some technical differences which do not affect most applications. The alternate approach assumes the second moment E[X2]E[X2] is finite. Not all random variables have this property. However, those ordinarily used in applications at the level of this treatment will have a variance, hence a finite second moment.

We use the interpretation of e(X)=E[g(Y)|X]e(X)=E[g(Y)|X] as the best mean square estimator of g(Y)g(Y), given X, to interpret the formal property (CE9). We examine the special form

(CE9a) E{E[g(Y)|X]|X,Z}=E{E[g(Y)|X,Z]|X}=E[g(Y)|X]E{E[g(Y)|X]|X,Z}=E{E[g(Y)|X,Z]|X}=E[g(Y)|X]

Put e1(X,Z)=E[g(Y)|X,Z]e1(X,Z)=E[g(Y)|X,Z], the best mean square estimator of g(Y)g(Y), given (X,Z)(X,Z). Then (CE9b) can be expressed

E [ e ( X ) | X , Z ] = e ( X ) a . s . and E [ e 1 ( X , Z ) | X ] = e ( X ) a . s . E [ e ( X ) | X , Z ] = e ( X ) a . s . and E [ e 1 ( X , Z ) | X ] = e ( X ) a . s .
(70)

In words, if we take the best estimate of g(Y)g(Y), given X, then take the best mean sqare estimate of that, given (X,Z)(X,Z), we do not change the estimate of g(Y)g(Y). On the other hand if we first get the best mean sqare estimate of g(Y)g(Y), given (X,Z)(X,Z), and then take the best mean square estimate of that, given X, we get the best mean square estimate of g(Y)g(Y), given X.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks