# Connexions

You are here: Home » Content » Applied Probability » Covariance and the Correlation Coefficient

• Preface to Pfeiffer Applied Probability

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This collection is included in aLens by: Digital Scholarship at Rice University

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection:

Collection by: Paul E Pfeiffer. E-mail the author

# Covariance and the Correlation Coefficient

Module by: Paul E Pfeiffer. E-mail the author

Summary: The mean value and the variance give important information about the distribution for a real random variable X. We consider the expectation of an appropriate function of a pair (X, Y) which gives useful information about their joint distribution. This is the covariance function.

## Covariance and the Correlation Coefficient

The mean value μX=E[X]μX=E[X] and the variance σX2=E[(X-μX)2]σX2=E[(X-μX)2] give important information about the distribution for real random variable X. Can the expectation of an appropriate function of (X,Y)(X,Y) give useful information about the joint distribution? A clue to one possibility is given in the expression

Var [ X ± Y ] = Var [ X ] + Var [ Y ] ± 2 E [ X Y ] - E [ X ] E [ Y ] Var [ X ± Y ] = Var [ X ] + Var [ Y ] ± 2 E [ X Y ] - E [ X ] E [ Y ]
(1)

The expression E[XY]-E[X]E[Y]E[XY]-E[X]E[Y] vanishes if the pair is independent (and in some other cases). We note also that for μX=E[X]μX=E[X] and μY=E[Y]μY=E[Y]

E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y ] - μ X μ Y E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y ] - μ X μ Y
(2)

To see this, expand the expression (X-μX)(Y-μY)(X-μX)(Y-μY) and use linearity to get

E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y - μ Y X - μ X Y + μ X μ Y ] = E [ X Y ] - μ Y E [ X ] - μ X E [ Y ] + μ X μ Y E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y - μ Y X - μ X Y + μ X μ Y ] = E [ X Y ] - μ Y E [ X ] - μ X E [ Y ] + μ X μ Y
(3)

which reduces directly to the desired expression. Now for given ω, X(ω)-μXX(ω)-μX is the variation of X from its mean and Y(ω)-μYY(ω)-μY is the variation of Y from its mean. For this reason, the following terminology is used.

Definition. The quantity Cov [X,Y]=E[(X-μX)(Y-μY)] Cov [X,Y]=E[(X-μX)(Y-μY)] is called the covariance of X and Y.

If we let X'=X-μXX'=X-μX and Y'=Y-μYY'=Y-μY be the centered random variables, then

Cov [ X , Y ] = E [ X ' Y ' ] Cov [ X , Y ] = E [ X ' Y ' ]
(4)

Note that the variance of X is the covariance of X with itself.

If we standardize, with X*=(X-μX)/σXX*=(X-μX)/σX and Y*=(Y-μY)/σYY*=(Y-μY)/σY, we have

Definition. The correlation coefficientρ=ρ[X,Y]ρ=ρ[X,Y] is the quantity

ρ [ X , Y ] = E [ X * Y * ] = E [ ( X - μ X ) ( Y - μ Y ) ] σ X σ Y ρ [ X , Y ] = E [ X * Y * ] = E [ ( X - μ X ) ( Y - μ Y ) ] σ X σ Y
(5)

Thus ρ= Cov [X,Y]/σXσYρ= Cov [X,Y]/σXσY. We examine these concepts for information on the joint distribution. By Schwarz' inequality (E15), we have

ρ 2 = E 2 [ X * Y * ] E [ ( X * ) 2 ] E [ ( Y * ) 2 ] = 1 with equality iff Y * = c X * ρ 2 = E 2 [ X * Y * ] E [ ( X * ) 2 ] E [ ( Y * ) 2 ] = 1 with equality iff Y * = c X *
(6)

Now equality holds iff

1 = c 2 E 2 [ ( X * ) 2 ] = c 2 which implies c = ± 1 and ρ = ± 1 1 = c 2 E 2 [ ( X * ) 2 ] = c 2 which implies c = ± 1 and ρ = ± 1
(7)

We conclude -1ρ1-1ρ1, with ρ=±1ρ=±1 iff Y*=±X*Y*=±X*

Relationship between ρ and the joint distribution

• We consider first the distribution for the standardized pair (X*,Y*)(X*,Y*)
• Since P(X*r,Y*s)=PX-μXσXr,Y-μYσYsP(X*r,Y*s)=PX-μXσXr,Y-μYσYs
=P(Xt=σXr+μX,Yu=σYs+μY)=P(Xt=σXr+μX,Yu=σYs+μY)
(8)
we obtain the results for the distribution for (X,Y)(X,Y) by the mapping
t=σXr+μXu=σYs+μYt=σXr+μXu=σYs+μY
(9)

Joint distribution for the standardized variables (X*,Y*)(X*,Y*), (r,s)=(X*,Y*)(ω)(r,s)=(X*,Y*)(ω)

• ρ=1ρ=1 iff X*=Y*X*=Y* iff all probability mass is on the line s=rs=r.
• ρ=-1ρ=-1 iff X*=-Y*X*=-Y* iff all probability mass is on the line s=-rs=-r.

If -1<ρ<1-1<ρ<1, then at least some of the mass must fail to be on these lines.

The ρ=±1ρ=±1 lines for the (X,Y)(X,Y) distribution are:

u - μ Y σ Y = ± t - μ X σ X or u = ± σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = ± t - μ X σ X or u = ± σ Y σ X ( t - μ X ) + μ Y
(10)

Consider Z=Y*-X*Z=Y*-X*. Then E[12Z2]=12E[(Y*-X*)2]E[12Z2]=12E[(Y*-X*)2]. Reference to Figure 1 shows this is the average of the square of the distances of the points (r,s)=(X*,Y*)(ω)(r,s)=(X*,Y*)(ω) from the line s=rs=r (i.e., the variance about the line s=rs=r). Similarly for W=Y*+X*W=Y*+X*, E[W2/2]E[W2/2] is the variance about s=-rs=-r. Now

1 2 E [ ( Y * ± X * ) 2 ] = 1 2 E [ ( Y * ) 2 ] + E [ ( X * ) 2 ] ± 2 E [ X * Y * ] = 1 ± ρ 1 2 E [ ( Y * ± X * ) 2 ] = 1 2 E [ ( Y * ) 2 ] + E [ ( X * ) 2 ] ± 2 E [ X * Y * ] = 1 ± ρ
(11)

Thus

• 1-ρ1-ρ is the variance about s=rs=r (the ρ=1ρ=1 line)
• 1+ρ1+ρ is the variance about s=-rs=-r (the ρ=-1ρ=-1 line)

Now since

E [ ( Y * - X * ) 2 ] = E [ ( Y * + X * ) 2 ] iff ρ = E [ X * Y * ] = 0 E [ ( Y * - X * ) 2 ] = E [ ( Y * + X * ) 2 ] iff ρ = E [ X * Y * ] = 0
(12)

the condition ρ=0ρ=0 is the condition for equality of the two variances.

Transformation to the (X,Y)(X,Y) plane

t = σ X r + μ X u = σ Y s + μ Y r = t - μ X σ X s = u - μ Y σ Y t = σ X r + μ X u = σ Y s + μ Y r = t - μ X σ X s = u - μ Y σ Y
(13)

The ρ=1ρ=1 line is:

u - μ Y σ Y = t - μ X σ X or u = σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = t - μ X σ X or u = σ Y σ X ( t - μ X ) + μ Y
(14)

The ρ=-1ρ=-1 line is:

u - μ Y σ Y = - t - μ X σ X or u = - σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = - t - μ X σ X or u = - σ Y σ X ( t - μ X ) + μ Y
(15)

1-ρ1-ρ is proportional to the variance abut the ρ=1ρ=1 line and 1+ρ1+ρ is proportional to the variance about the ρ=-1ρ=-1 line. ρ=0ρ=0 iff the variances about both are the same.

### Example 1: Uncorrelated but not independent

Suppose the joint density for {X,Y}{X,Y} is constant on the unit circle about the origin. By the rectangle test, the pair cannot be independent. By symmetry, the ρ=1ρ=1 line is u=tu=t and the ρ=-1ρ=-1 line is u=-tu=-t. By symmetry, also, the variance about each of these lines is the same. Thus ρ=0ρ=0, which is true iff Cov [X,Y]=0 Cov [X,Y]=0. This fact can be verified by calculation, if desired.

### Example 2: Uniform marginal distributions

Consider the three distributions in Figure 2. In case (a), the distribution is uniform over the square centered at the origin with vertices at (1,1), (-1,1), (-1,-1), (1,-1). In case (b), the distribution is uniform over two squares, in the first and third quadrants with vertices (0,0), (1,0), (1,1), (0,1) and (0,0),

(-1,0), (-1,-1), (0,-1). In case (c) the two squares are in the second and fourth quadrants. The marginals are uniform on (-1,1) in each case, so that in each case

E [ X ] = E [ Y ] = 0 and Var [ X ] = Var [ Y ] = 1 / 3 E [ X ] = E [ Y ] = 0 and Var [ X ] = Var [ Y ] = 1 / 3
(16)

This means the ρ=1ρ=1 line is u=tu=t and the ρ=-1ρ=-1 line is u=-tu=-t.

1. By symmetry, E[XY]=0E[XY]=0 (in fact the pair is independent) and ρ=0ρ=0.
2. For every pair of possible values, the two signs must be the same, so E[XY]>0E[XY]>0 which implies ρ>0ρ>0. The actual value may be calculated to give ρ=3/4ρ=3/4. Since 1-ρ<1+ρ1-ρ<1+ρ, the variance about the ρ=1ρ=1 line is less than that about the ρ=-1ρ=-1 line. This is evident from the figure.
3. E[XY]<0E[XY]<0 and ρ<0ρ<0. Since 1+ρ<1-ρ1+ρ<1-ρ, the variance about the ρ=-1ρ=-1 line is less than that about the ρ=1ρ=1 line. Again, examination of the figure confirms this.

### Example 3: A pair of simple random variables

With the aid of m-functions and MATLAB we can easily caluclate the covariance and the correlation coefficient. We use the joint distribution for Example 9 in "Variance." In that example calculations show

E [ X Y ] - E [ X ] E [ Y ] = - 0 . 1633 = Cov [ X , Y ] , σ X = 1 . 8170 and σ Y = 1 . 9122 E [ X Y ] - E [ X ] E [ Y ] = - 0 . 1633 = Cov [ X , Y ] , σ X = 1 . 8170 and σ Y = 1 . 9122
(17)

so that ρ=-0.04699ρ=-0.04699.

### Example 4: An absolutely continuous pair

The pair {X,Y}{X,Y} has joint density function fXY(t,u)=65(t+2u)fXY(t,u)=65(t+2u) on the triangular region bounded by t=0,u=tt=0,u=t, and u=1u=1. By the usual integration techniques, we have

f X ( t ) = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1 and f Y ( u ) = 3 u 2 , 0 u 1 f X ( t ) = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1 and f Y ( u ) = 3 u 2 , 0 u 1
(18)

From this we obtain E[X]=2/5, Var [X]=3/50,E[Y]=3/4E[X]=2/5, Var [X]=3/50,E[Y]=3/4, and Var [Y]=3/80 Var [Y]=3/80. To complete the picture we need

E [ X Y ] = 6 5 0 1 t 1 ( t 2 u + 2 t u 2 ) d u d t = 8 / 25 E [ X Y ] = 6 5 0 1 t 1 ( t 2 u + 2 t u 2 ) d u d t = 8 / 25
(19)

Then

Cov [ X , Y ] = E [ X Y ] - E [ X ] E [ Y ] = 2 / 100 and ρ = Cov [ X , Y ] σ X σ Y = 4 30 10 0 . 4216 Cov [ X , Y ] = E [ X Y ] - E [ X ] E [ Y ] = 2 / 100 and ρ = Cov [ X , Y ] σ X σ Y = 4 30 10 0 . 4216
(20)

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  200
Enter number of Y approximation points  200
Enter expression for joint density  (6/5)*(t + 2*u).*(u>=t)
Use array operations on X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX =   0.4012                    % Theoretical = 0.4
EY = total(u.*P)
EY =   0.7496                    % Theoretical = 0.75
VX = total(t.^2.*P) - EX^2
VX =   0.0603                    % Theoretical = 0.06
VY = total(u.^2.*P) - EY^2
VY =   0.0376                    % Theoretical = 0.0375
CV = total(t.*u.*P) - EX*EY
CV =   0.0201                    % Theoretical = 0.02
rho = CV/sqrt(VX*VY)
rho =  0.4212                    % Theoretical = 0.4216


Coefficient of linear correlation

The parameter ρ is usually called the correlation coefficient. A more descriptive name would be coefficient of linear correlation. The following example shows that all probability mass may be on a curve, so that Y=g(X)Y=g(X) (i.e., the value of Y is completely determined by the value of X), yet ρ=0ρ=0.

### Example 5: Y=g(X)Y=g(X) but ρ=0ρ=0

Suppose XX uniform (-1,1), so that fX(t)=1/2,-1<t<1fX(t)=1/2,-1<t<1 and E[X]=0E[X]=0. Let Y=g(X)=cosXY=g(X)=cosX. Then

Cov [ X , Y ] = E [ X Y ] = 1 2 - 1 1 t cos t d t = 0 Cov [ X , Y ] = E [ X Y ] = 1 2 - 1 1 t cos t d t = 0
(21)

Thus ρ=0ρ=0. Note that g could be any even function defined on (-1,1). In this case the integrand tg(t)tg(t) is odd, so that the value of the integral is zero.

Variance and covariance for linear combinations

We generalize the property (V4) on linear combinations. Consider the linear combinations

X = i = 1 n a i X i and Y = j = 1 m b j Y j X = i = 1 n a i X i and Y = j = 1 m b j Y j
(22)

We wish to determine Cov [X,Y] Cov [X,Y] and Var [X] Var [X]. It is convenient to work with the centered random variables X'=X-μXX'=X-μX and Y'=Y-μyY'=Y-μy. Since by linearity of expectation,

μ X = i = 1 n a i μ X i and μ Y = j = 1 m b j μ Y j μ X = i = 1 n a i μ X i and μ Y = j = 1 m b j μ Y j
(23)

we have

X ' = i = 1 n a i X i - i = 1 n a i μ X i = i = 1 n a i ( X i - μ X i ) = i = 1 n a i X i ' X ' = i = 1 n a i X i - i = 1 n a i μ X i = i = 1 n a i ( X i - μ X i ) = i = 1 n a i X i '
(24)

and similarly for Y'. By definition

Cov ( X , Y ) = E [ X ' Y ' ] = E [ i , j a i b j X i ' Y j ' ] = i , j a i b j E [ X i ' Y j ' ] = i , j a i b j Cov ( X i , Y j ) Cov ( X , Y ) = E [ X ' Y ' ] = E [ i , j a i b j X i ' Y j ' ] = i , j a i b j E [ X i ' Y j ' ] = i , j a i b j Cov ( X i , Y j )
(25)

In particular

Var ( X ) = Cov ( X , X ) = i , j a i a j Cov ( X i , X j ) = i = 1 n a i 2 Cov ( X i , X i ) + i j a i a j Cov ( X i , X j ) Var ( X ) = Cov ( X , X ) = i , j a i a j Cov ( X i , X j ) = i = 1 n a i 2 Cov ( X i , X i ) + i j a i a j Cov ( X i , X j )
(26)

Using the fact that aiaj Cov (Xi,Xj)=ajai Cov (Xj,Xi)aiaj Cov (Xi,Xj)=ajai Cov (Xj,Xi), we have

Var [ X ] = i = 1 n a i 2 Var [ X i ] + 2 i < j a i a j Cov ( X i , X j ) Var [ X ] = i = 1 n a i 2 Var [ X i ] + 2 i < j a i a j Cov ( X i , X j )
(27)

Note that ai2 does not depend upon the sign of ai. If the Xi form an independent class, or are otherwise uncorrelated, the expression for variance reduces to

Var [ X ] = i = 1 n a i 2 Var [ X i ] Var [ X ] = i = 1 n a i 2 Var [ X i ]
(28)

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks