Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » Applied Probability » Covariance and the Correlation Coefficient

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Covariance and the Correlation Coefficient

Module by: Paul E Pfeiffer. E-mail the author

Summary: The mean value and the variance give important information about the distribution for a real random variable X. We consider the expectation of an appropriate function of a pair (X, Y) which gives useful information about their joint distribution. This is the covariance function.

Covariance and the Correlation Coefficient

The mean value μX=E[X]μX=E[X] and the variance σX2=E[(X-μX)2]σX2=E[(X-μX)2] give important information about the distribution for real random variable X. Can the expectation of an appropriate function of (X,Y)(X,Y) give useful information about the joint distribution? A clue to one possibility is given in the expression

Var [ X ± Y ] = Var [ X ] + Var [ Y ] ± 2 E [ X Y ] - E [ X ] E [ Y ] Var [ X ± Y ] = Var [ X ] + Var [ Y ] ± 2 E [ X Y ] - E [ X ] E [ Y ]
(1)

The expression E[XY]-E[X]E[Y]E[XY]-E[X]E[Y] vanishes if the pair is independent (and in some other cases). We note also that for μX=E[X]μX=E[X] and μY=E[Y]μY=E[Y]

E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y ] - μ X μ Y E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y ] - μ X μ Y
(2)

To see this, expand the expression (X-μX)(Y-μY)(X-μX)(Y-μY) and use linearity to get

E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y - μ Y X - μ X Y + μ X μ Y ] = E [ X Y ] - μ Y E [ X ] - μ X E [ Y ] + μ X μ Y E [ ( X - μ X ) ( Y - μ Y ) ] = E [ X Y - μ Y X - μ X Y + μ X μ Y ] = E [ X Y ] - μ Y E [ X ] - μ X E [ Y ] + μ X μ Y
(3)

which reduces directly to the desired expression. Now for given ω, X(ω)-μXX(ω)-μX is the variation of X from its mean and Y(ω)-μYY(ω)-μY is the variation of Y from its mean. For this reason, the following terminology is used.

Definition. The quantity Cov [X,Y]=E[(X-μX)(Y-μY)] Cov [X,Y]=E[(X-μX)(Y-μY)] is called the covariance of X and Y.

If we let X'=X-μXX'=X-μX and Y'=Y-μYY'=Y-μY be the centered random variables, then

Cov [ X , Y ] = E [ X ' Y ' ] Cov [ X , Y ] = E [ X ' Y ' ]
(4)

Note that the variance of X is the covariance of X with itself.

If we standardize, with X*=(X-μX)/σXX*=(X-μX)/σX and Y*=(Y-μY)/σYY*=(Y-μY)/σY, we have

Definition. The correlation coefficientρ=ρ[X,Y]ρ=ρ[X,Y] is the quantity

ρ [ X , Y ] = E [ X * Y * ] = E [ ( X - μ X ) ( Y - μ Y ) ] σ X σ Y ρ [ X , Y ] = E [ X * Y * ] = E [ ( X - μ X ) ( Y - μ Y ) ] σ X σ Y
(5)

Thus ρ= Cov [X,Y]/σXσYρ= Cov [X,Y]/σXσY. We examine these concepts for information on the joint distribution. By Schwarz' inequality (E15), we have

ρ 2 = E 2 [ X * Y * ] E [ ( X * ) 2 ] E [ ( Y * ) 2 ] = 1 with equality iff Y * = c X * ρ 2 = E 2 [ X * Y * ] E [ ( X * ) 2 ] E [ ( Y * ) 2 ] = 1 with equality iff Y * = c X *
(6)

Now equality holds iff

1 = c 2 E 2 [ ( X * ) 2 ] = c 2 which implies c = ± 1 and ρ = ± 1 1 = c 2 E 2 [ ( X * ) 2 ] = c 2 which implies c = ± 1 and ρ = ± 1
(7)

We conclude -1ρ1-1ρ1, with ρ=±1ρ=±1 iff Y*=±X*Y*=±X*

Relationship between ρ and the joint distribution

  • We consider first the distribution for the standardized pair (X*,Y*)(X*,Y*)
  • Since P(X*r,Y*s)=PX-μXσXr,Y-μYσYsP(X*r,Y*s)=PX-μXσXr,Y-μYσYs
    =P(Xt=σXr+μX,Yu=σYs+μY)=P(Xt=σXr+μX,Yu=σYs+μY)
    (8)
    we obtain the results for the distribution for (X,Y)(X,Y) by the mapping
    t=σXr+μXu=σYs+μYt=σXr+μXu=σYs+μY
    (9)

Joint distribution for the standardized variables (X*,Y*)(X*,Y*), (r,s)=(X*,Y*)(ω)(r,s)=(X*,Y*)(ω)

  • ρ=1ρ=1 iff X*=Y*X*=Y* iff all probability mass is on the line s=rs=r.
  • ρ=-1ρ=-1 iff X*=-Y*X*=-Y* iff all probability mass is on the line s=-rs=-r.

If -1<ρ<1-1<ρ<1, then at least some of the mass must fail to be on these lines.

Figure 1: Distance from point (r,s) to the line s = r.
Figure one is comprised of a diagonal line with a right triangle. A portion of the line is the base of the triangle. The line is labeled, s = r. One point of the triangle located on the diagonal line is labeled (r, r). The point of the triangle that is not located on the line is labeled, (r, s). The side of the triangle in between these two labeled points is labeled as the absolute value of s - r. The side of the triangle on the line is not labeled. The third side is labeled as the absolute value of s - r divided by the square root of two.

The ρ=±1ρ=±1 lines for the (X,Y)(X,Y) distribution are:

u - μ Y σ Y = ± t - μ X σ X or u = ± σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = ± t - μ X σ X or u = ± σ Y σ X ( t - μ X ) + μ Y
(10)

Consider Z=Y*-X*Z=Y*-X*. Then E[12Z2]=12E[(Y*-X*)2]E[12Z2]=12E[(Y*-X*)2]. Reference to Figure 1 shows this is the average of the square of the distances of the points (r,s)=(X*,Y*)(ω)(r,s)=(X*,Y*)(ω) from the line s=rs=r (i.e., the variance about the line s=rs=r). Similarly for W=Y*+X*W=Y*+X*, E[W2/2]E[W2/2] is the variance about s=-rs=-r. Now

1 2 E [ ( Y * ± X * ) 2 ] = 1 2 E [ ( Y * ) 2 ] + E [ ( X * ) 2 ] ± 2 E [ X * Y * ] = 1 ± ρ 1 2 E [ ( Y * ± X * ) 2 ] = 1 2 E [ ( Y * ) 2 ] + E [ ( X * ) 2 ] ± 2 E [ X * Y * ] = 1 ± ρ
(11)

Thus

  • 1-ρ1-ρ is the variance about s=rs=r (the ρ=1ρ=1 line)
  • 1+ρ1+ρ is the variance about s=-rs=-r (the ρ=-1ρ=-1 line)

Now since

E [ ( Y * - X * ) 2 ] = E [ ( Y * + X * ) 2 ] iff ρ = E [ X * Y * ] = 0 E [ ( Y * - X * ) 2 ] = E [ ( Y * + X * ) 2 ] iff ρ = E [ X * Y * ] = 0
(12)

the condition ρ=0ρ=0 is the condition for equality of the two variances.

Transformation to the (X,Y)(X,Y) plane

t = σ X r + μ X u = σ Y s + μ Y r = t - μ X σ X s = u - μ Y σ Y t = σ X r + μ X u = σ Y s + μ Y r = t - μ X σ X s = u - μ Y σ Y
(13)

The ρ=1ρ=1 line is:

u - μ Y σ Y = t - μ X σ X or u = σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = t - μ X σ X or u = σ Y σ X ( t - μ X ) + μ Y
(14)

The ρ=-1ρ=-1 line is:

u - μ Y σ Y = - t - μ X σ X or u = - σ Y σ X ( t - μ X ) + μ Y u - μ Y σ Y = - t - μ X σ X or u = - σ Y σ X ( t - μ X ) + μ Y
(15)

1-ρ1-ρ is proportional to the variance abut the ρ=1ρ=1 line and 1+ρ1+ρ is proportional to the variance about the ρ=-1ρ=-1 line. ρ=0ρ=0 iff the variances about both are the same.

Example 1: Uncorrelated but not independent

Suppose the joint density for {X,Y}{X,Y} is constant on the unit circle about the origin. By the rectangle test, the pair cannot be independent. By symmetry, the ρ=1ρ=1 line is u=tu=t and the ρ=-1ρ=-1 line is u=-tu=-t. By symmetry, also, the variance about each of these lines is the same. Thus ρ=0ρ=0, which is true iff Cov [X,Y]=0 Cov [X,Y]=0. This fact can be verified by calculation, if desired.

Example 2: Uniform marginal distributions

Figure 2: Uniform marginals but different correlation coefficients.
Figure two is comprised of three graphs of multiple shaded squares. All three are standard cartesian graphs, with all four quadrants equal in size, t as the horizontal axis, and u as the vertical axis. The first graph shows one large square centered at the origin with a length of two units on a side. As the square is centered about the origin, the square is divided equally into four smaller squares by the vertical and horizontal axes. A caption below the first graph reads, rho = 0. The second graph contains two smaller squares, one unit to a side, one sitting with two sides along the axes of the graph in the first quadrant, and one sitting with two sides along the axes of the graph in the third quadrant. The caption reads rho = 3/4. The third graph contains two squares of the same size as the second graph, this time with one sitting with two sides along the axes in the second quadrant, and one sitting with two sides along the axes in the fourth quadrant. The caption reads rho = -3/4.

Consider the three distributions in Figure 2. In case (a), the distribution is uniform over the square centered at the origin with vertices at (1,1), (-1,1), (-1,-1), (1,-1). In case (b), the distribution is uniform over two squares, in the first and third quadrants with vertices (0,0), (1,0), (1,1), (0,1) and (0,0),

(-1,0), (-1,-1), (0,-1). In case (c) the two squares are in the second and fourth quadrants. The marginals are uniform on (-1,1) in each case, so that in each case

E [ X ] = E [ Y ] = 0 and Var [ X ] = Var [ Y ] = 1 / 3 E [ X ] = E [ Y ] = 0 and Var [ X ] = Var [ Y ] = 1 / 3
(16)

This means the ρ=1ρ=1 line is u=tu=t and the ρ=-1ρ=-1 line is u=-tu=-t.

  1. By symmetry, E[XY]=0E[XY]=0 (in fact the pair is independent) and ρ=0ρ=0.
  2. For every pair of possible values, the two signs must be the same, so E[XY]>0E[XY]>0 which implies ρ>0ρ>0. The actual value may be calculated to give ρ=3/4ρ=3/4. Since 1-ρ<1+ρ1-ρ<1+ρ, the variance about the ρ=1ρ=1 line is less than that about the ρ=-1ρ=-1 line. This is evident from the figure.
  3. E[XY]<0E[XY]<0 and ρ<0ρ<0. Since 1+ρ<1-ρ1+ρ<1-ρ, the variance about the ρ=-1ρ=-1 line is less than that about the ρ=1ρ=1 line. Again, examination of the figure confirms this.

Example 3: A pair of simple random variables

With the aid of m-functions and MATLAB we can easily caluclate the covariance and the correlation coefficient. We use the joint distribution for Example 9 in "Variance." In that example calculations show

E [ X Y ] - E [ X ] E [ Y ] = - 0 . 1633 = Cov [ X , Y ] , σ X = 1 . 8170 and σ Y = 1 . 9122 E [ X Y ] - E [ X ] E [ Y ] = - 0 . 1633 = Cov [ X , Y ] , σ X = 1 . 8170 and σ Y = 1 . 9122
(17)

so that ρ=-0.04699ρ=-0.04699.

Example 4: An absolutely continuous pair

The pair {X,Y}{X,Y} has joint density function fXY(t,u)=65(t+2u)fXY(t,u)=65(t+2u) on the triangular region bounded by t=0,u=tt=0,u=t, and u=1u=1. By the usual integration techniques, we have

f X ( t ) = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1 and f Y ( u ) = 3 u 2 , 0 u 1 f X ( t ) = 6 5 ( 1 + t - 2 t 2 ) , 0 t 1 and f Y ( u ) = 3 u 2 , 0 u 1
(18)

From this we obtain E[X]=2/5, Var [X]=3/50,E[Y]=3/4E[X]=2/5, Var [X]=3/50,E[Y]=3/4, and Var [Y]=3/80 Var [Y]=3/80. To complete the picture we need

E [ X Y ] = 6 5 0 1 t 1 ( t 2 u + 2 t u 2 ) d u d t = 8 / 25 E [ X Y ] = 6 5 0 1 t 1 ( t 2 u + 2 t u 2 ) d u d t = 8 / 25
(19)

Then

Cov [ X , Y ] = E [ X Y ] - E [ X ] E [ Y ] = 2 / 100 and ρ = Cov [ X , Y ] σ X σ Y = 4 30 10 0 . 4216 Cov [ X , Y ] = E [ X Y ] - E [ X ] E [ Y ] = 2 / 100 and ρ = Cov [ X , Y ] σ X σ Y = 4 30 10 0 . 4216
(20)

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 1]
Enter matrix [c d] of Y-range endpoints  [0 1]
Enter number of X approximation points  200
Enter number of Y approximation points  200
Enter expression for joint density  (6/5)*(t + 2*u).*(u>=t)
Use array operations on X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX =   0.4012                    % Theoretical = 0.4
EY = total(u.*P)
EY =   0.7496                    % Theoretical = 0.75
VX = total(t.^2.*P) - EX^2
VX =   0.0603                    % Theoretical = 0.06
VY = total(u.^2.*P) - EY^2
VY =   0.0376                    % Theoretical = 0.0375
CV = total(t.*u.*P) - EX*EY
CV =   0.0201                    % Theoretical = 0.02
rho = CV/sqrt(VX*VY)
rho =  0.4212                    % Theoretical = 0.4216

Coefficient of linear correlation

The parameter ρ is usually called the correlation coefficient. A more descriptive name would be coefficient of linear correlation. The following example shows that all probability mass may be on a curve, so that Y=g(X)Y=g(X) (i.e., the value of Y is completely determined by the value of X), yet ρ=0ρ=0.

Example 5: Y=g(X)Y=g(X) but ρ=0ρ=0

Suppose XX uniform (-1,1), so that fX(t)=1/2,-1<t<1fX(t)=1/2,-1<t<1 and E[X]=0E[X]=0. Let Y=g(X)=cosXY=g(X)=cosX. Then

Cov [ X , Y ] = E [ X Y ] = 1 2 - 1 1 t cos t d t = 0 Cov [ X , Y ] = E [ X Y ] = 1 2 - 1 1 t cos t d t = 0
(21)

Thus ρ=0ρ=0. Note that g could be any even function defined on (-1,1). In this case the integrand tg(t)tg(t) is odd, so that the value of the integral is zero.

Variance and covariance for linear combinations

We generalize the property (V4) on linear combinations. Consider the linear combinations

X = i = 1 n a i X i and Y = j = 1 m b j Y j X = i = 1 n a i X i and Y = j = 1 m b j Y j
(22)

We wish to determine Cov [X,Y] Cov [X,Y] and Var [X] Var [X]. It is convenient to work with the centered random variables X'=X-μXX'=X-μX and Y'=Y-μyY'=Y-μy. Since by linearity of expectation,

μ X = i = 1 n a i μ X i and μ Y = j = 1 m b j μ Y j μ X = i = 1 n a i μ X i and μ Y = j = 1 m b j μ Y j
(23)

we have

X ' = i = 1 n a i X i - i = 1 n a i μ X i = i = 1 n a i ( X i - μ X i ) = i = 1 n a i X i ' X ' = i = 1 n a i X i - i = 1 n a i μ X i = i = 1 n a i ( X i - μ X i ) = i = 1 n a i X i '
(24)

and similarly for Y'. By definition

Cov ( X , Y ) = E [ X ' Y ' ] = E [ i , j a i b j X i ' Y j ' ] = i , j a i b j E [ X i ' Y j ' ] = i , j a i b j Cov ( X i , Y j ) Cov ( X , Y ) = E [ X ' Y ' ] = E [ i , j a i b j X i ' Y j ' ] = i , j a i b j E [ X i ' Y j ' ] = i , j a i b j Cov ( X i , Y j )
(25)

In particular

Var ( X ) = Cov ( X , X ) = i , j a i a j Cov ( X i , X j ) = i = 1 n a i 2 Cov ( X i , X i ) + i j a i a j Cov ( X i , X j ) Var ( X ) = Cov ( X , X ) = i , j a i a j Cov ( X i , X j ) = i = 1 n a i 2 Cov ( X i , X i ) + i j a i a j Cov ( X i , X j )
(26)

Using the fact that aiaj Cov (Xi,Xj)=ajai Cov (Xj,Xi)aiaj Cov (Xi,Xj)=ajai Cov (Xj,Xi), we have

Var [ X ] = i = 1 n a i 2 Var [ X i ] + 2 i < j a i a j Cov ( X i , X j ) Var [ X ] = i = 1 n a i 2 Var [ X i ] + 2 i < j a i a j Cov ( X i , X j )
(27)

Note that ai2 does not depend upon the sign of ai. If the Xi form an independent class, or are otherwise uncorrelated, the expression for variance reduces to

Var [ X ] = i = 1 n a i 2 Var [ X i ] Var [ X ] = i = 1 n a i 2 Var [ X i ]
(28)

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks