Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Linear Regression

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Applied Probability"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • UniqU content

    This module is included inLens: UniqU's lens
    By: UniqU, LLCAs a part of collection: "Applied Probability"

    Click the "UniqU content" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Linear Regression

Module by: Paul E Pfeiffer. E-mail the author

Summary: Consider a pair {X,Y} with a joint distribution. A value X(ω) is observed. It is desired to estimate the corresponding value Y(ω). The best that can be hoped for is some estimate based on an average of the errors, or on the average of some function of the errors. The most common measure of error is the mean (expectation) of the square of the error. This has two important properties: it treats positive and negative errors alike, and it weights large errors more heavily than smaller ones. In general, we seek a rule (function) r such that the estimate is r(X(ω)). That is, we seek a function r such that the expectation of the square of Y - r(X) is a minimum. The problem of determining such a function is known as the regression problem. LINEAR REGRESSION: we seek the best straight line function (the regression line of Y on X) of the form u = r(t) + b, such that the mean square of Y - r(X) is a minimum. Matlab approximation procedures are compared with analytic results. More general linear regression is considered

Linear Regression

Suppose that a pair {X,Y}{X,Y} of random variables has a joint distribution. A value X(ω)X(ω) is observed. It is desired to estimate the corresponding value Y(ω)Y(ω). Obviously there is no rule for determining Y(ω)Y(ω) unless Y is a function of X. The best that can be hoped for is some estimate based on an average of the errors, or on the average of some function of the errors.

Suppose X(ω)X(ω) is observed, and by some rule an estimate Y^(ω)Y^(ω) is returned. The error of the estimate is Y(ω)-Y^(ω)Y(ω)-Y^(ω). The most common measure of error is the mean of the square of the error

E [ ( Y - Y ^ ) 2 ] E [ ( Y - Y ^ ) 2 ]
(1)

The choice of the mean square has two important properties: it treats positive and negative errors alike, and it weights large errors more heavily than smaller ones. In general, we seek a rule (function) r such that the estimate Y^(ω)Y^(ω) is rX(ω)rX(ω). That is, we seek a function r such that

E [ ( Y - r ( X ) ) 2 ] is a minimum. E [ ( Y - r ( X ) ) 2 ] is a minimum.
(2)

The problem of determining such a function is known as the regression problem. In the unit on Regression, we show that this problem is solved by the conditional expectation of Y, given X. At this point, we seek an important partial solution.

The regression line of Y on X

We seek the best straight line function for minimizing the mean squared error. That is, we seek a function r of the form u=r(t)=at+bu=r(t)=at+b. The problem is to determine the coefficients a, b such that

E [ ( Y - a X - b ) 2 ] is a minimum E [ ( Y - a X - b ) 2 ] is a minimum
(3)

We write the error in a special form, then square and take the expectation.

Error = Y - a X - b = ( Y - μ Y ) - a ( X - μ X ) + μ Y - a μ X - b = ( Y - μ Y ) - a ( X - μ X ) - β Error = Y - a X - b = ( Y - μ Y ) - a ( X - μ X ) + μ Y - a μ X - b = ( Y - μ Y ) - a ( X - μ X ) - β
(4)
Error squared = ( Y - μ Y ) 2 + a 2 ( X - μ X ) 2 + β 2 - 2 β ( Y - μ Y ) + 2 a β ( X - μ X ) - 2 a ( Y - μ Y ) ( X - μ X ) Error squared = ( Y - μ Y ) 2 + a 2 ( X - μ X ) 2 + β 2 - 2 β ( Y - μ Y ) + 2 a β ( X - μ X ) - 2 a ( Y - μ Y ) ( X - μ X )
(5)
E [ ( Y - a X - b ) 2 ] = σ Y 2 + a 2 σ X 2 + β 2 - 2 a Cov [ X , Y ] E [ ( Y - a X - b ) 2 ] = σ Y 2 + a 2 σ X 2 + β 2 - 2 a Cov [ X , Y ]
(6)

Standard procedures for determining a minimum (with respect to a) show that this occurs for

a = Cov [ X , Y ] Var [ X ] b = μ Y - a μ X a = Cov [ X , Y ] Var [ X ] b = μ Y - a μ X
(7)

Thus the optimum line, called the regression line of Y on X, is

u = Cov [ X , Y ] Var [ X ] ( t - μ X ) + μ Y = ρ σ Y σ X ( t - μ X ) + μ Y = α ( t ) u = Cov [ X , Y ] Var [ X ] ( t - μ X ) + μ Y = ρ σ Y σ X ( t - μ X ) + μ Y = α ( t )
(8)

The second form is commonly used to define the regression line. For certain theoretical purposes, this is the preferred form. But for calculation, the first form is usually the more convenient. Only the covariance (which requres both means) and the variance of X are needed. There is no need to determine Var [Y] Var [Y] or ρ.

Example 1: The simple pair of Example 3 from "Variance"

jdemo1
jcalc
Enter JOINT PROBABILITIES (as on the plane)  P
Enter row matrix of VALUES of X  X
Enter row matrix of VALUES of Y  Y
 Use array operations on matrices X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX =   0.6420
EY = total(u.*P)
EY =   0.0783
VX = total(t.^2.*P) - EX^2
VX =   3.3016
CV = total(t.*u.*P) - EX*EY
CV =  -0.1633
a = CV/VX
a  =  -0.0495
b = EY - a*EX
b  =   0.1100           % The regression line is u = -0.0495t + 0.11

Example 2: The pair in Example 6 from "Variance"

Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=3ufXY(t,u)=3u on the triangular region bounded by u=0u=0, u=1+tu=1+t, u=1-tu=1-t. Determine the regression line of Y on X.

ANALYTIC SOLUTION

By symmetry, E[X]=E[XY]=0E[X]=E[XY]=0, so Cov [X,Y]=0 Cov [X,Y]=0. The regression curve is

u = E [ Y ] = 3 0 1 u 2 u - 1 1 - u d t d u = 6 0 1 u 2 ( 1 - u ) d u = 1 / 2 u = E [ Y ] = 3 0 1 u 2 u - 1 1 - u d t d u = 6 0 1 u 2 ( 1 - u ) d u = 1 / 2
(9)

Note that the pair is uncorrelated, but by the rectangle test is not independent. With zero values of E[X]E[X] and E[XY]E[XY], the approximation procedure is not very satisfactory unless a very large number of approximation points are employed.

Example 3: Distribution of Example 5 from "Random Vectors and MATLAB" and Example 12 from "Function of Random Vectors"

The pair {X,Y}{X,Y} has joint density fXY(t,u)=637(t+2u)fXY(t,u)=637(t+2u) on the region 0t20t2, 0umax{1,t}0umax{1,t} (see Figure Figure 1). Determine the regression line of Y on X. If the value X(ω)=1.7X(ω)=1.7 is observed, what is the best mean-square linear estimate of Y(ω)Y(ω)?

Figure 1: Regression line for Example 3.
Figure one contains two lines in the first quadrant of a cartesian graph. The horizontal axis is labeled t, and the vertical axis is labeled u. The title caption reads f_xy (t, u) = (6/37)(t + 2u). The first line crosses the vertical axis one quarter of the way up the graph. It has a positive slope, and is labeled u = 0.3382t + 0.4011. It continues as a linear plot from one side of the graph to the other. The second line begins horizontally as one segment from the left to point (1, 1). The segment is labeled u = 1. After point (1, 1), the line moves upward with a positive, constant slope to point (2, 2). This segment is labeled u = t. At (2, 2) there is a vertical line continuing downward to point (2, 0).

ANALYTIC SOLUTION

E [ X ] = 6 37 0 1 0 1 ( t 2 + 2 t u ) d u d t + 6 37 1 2 0 t ( t 2 + 2 t u ) d u d t = 50 / 37 E [ X ] = 6 37 0 1 0 1 ( t 2 + 2 t u ) d u d t + 6 37 1 2 0 t ( t 2 + 2 t u ) d u d t = 50 / 37
(10)

The other quantities involve integrals over the same regions with appropriate integrands, as follows:

Table 1
Quantity Integrand Value
E [ X 2 ] E [ X 2 ] t 3 + 2 t 2 u t 3 + 2 t 2 u 779/370
E [ Y ] E [ Y ] t u + 2 u 2 t u + 2 u 2 127/148
E [ X Y ] E [ X Y ] t 2 u + 2 t u 2 t 2 u + 2 t u 2 232/185

Then

Var [ X ] = 779 370 - 50 37 2 = 3823 13690 Cov [ X , Y ] = 232 185 - 50 37 · 127 148 = 1293 13690 Var [ X ] = 779 370 - 50 37 2 = 3823 13690 Cov [ X , Y ] = 232 185 - 50 37 · 127 148 = 1293 13690
(11)

and

a = Cov [ X , Y ] / Var [ X ] = 1293 3823 0 . 3382 , b = E [ Y ] - a E [ X ] = 6133 15292 0 . 4011 a = Cov [ X , Y ] / Var [ X ] = 1293 3823 0 . 3382 , b = E [ Y ] - a E [ X ] = 6133 15292 0 . 4011
(12)

The regression line is u=at+bu=at+b. If X(ω)=1.7X(ω)=1.7, the best linear estimate (in the mean square sense) is Y^(ω)=1.7a+b=0.9760Y^(ω)=1.7a+b=0.9760 (see Figure 1 for an approximate plot).

APPROXIMATION

tuappr
Enter matrix [a b] of X-range endpoints  [0 2]
Enter matrix [c d] of Y-range endpoints  [0 2]
Enter number of X approximation points  400
Enter number of Y approximation points  400
Enter expression for joint density  (6/37)*(t+2*u).*(u<=max(t,1))
Use array operations on X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX =  1.3517                   % Theoretical = 1.3514
EY = total(u.*P)
EY =  0.8594                   % Theoretical = 0.8581
VX = total(t.^2.*P) - EX^2
VX =  0.2790                   % Theoretical = 0.2793
CV = total(t.*u.*P) - EX*EY
CV =  0.0947                   % Theoretical = 0.0944
a = CV/VX
a  =  0.3394                   % Theoretical = 0.3382
b = EY - a*EX
b  =  0.4006                   % Theoretical = 0.4011
y = 1.7*a + b
y  =  0.9776                   % Theoretical = 0.9760

An interpretation of ρ2

The analysis above shows the minimum mean squared error is given by

E [ ( Y - Y ^ ) 2 ] = E ( Y - ρ σ Y σ X ( X - μ X ) - μ Y ) 2 = σ Y 2 E [ ( Y * - ρ X * ) 2 ] E [ ( Y - Y ^ ) 2 ] = E ( Y - ρ σ Y σ X ( X - μ X ) - μ Y ) 2 = σ Y 2 E [ ( Y * - ρ X * ) 2 ]
(13)
= σ Y 2 E [ ( Y * ) 2 - 2 ρ X * Y * + ρ 2 ( X * ) 2 ] = σ Y 2 ( 1 - 2 ρ 2 + ρ 2 ) = σ Y 2 ( 1 - ρ 2 ) = σ Y 2 E [ ( Y * ) 2 - 2 ρ X * Y * + ρ 2 ( X * ) 2 ] = σ Y 2 ( 1 - 2 ρ 2 + ρ 2 ) = σ Y 2 ( 1 - ρ 2 )
(14)

If ρ=0ρ=0, then E[(Y-Y^)2]=σY2E[(Y-Y^)2]=σY2, the mean squared error in the case of zero linear correlation. Then, ρ2 is interpreted as the fraction of uncertainty removed by the linear rule and X. This interpretation should not be pushed too far, but is a common interpretation, often found in the discussion of observations or experimental results.

More general linear regression

Consider a jointly distributed class. {Y,X1,X2,,Xn}{Y,X1,X2,,Xn}. We wish to deterimine a function U of the form

U = i = 0 n a i X i , with X 0 = 1 , such that E [ ( Y - U ) 2 ] is a minimum U = i = 0 n a i X i , with X 0 = 1 , such that E [ ( Y - U ) 2 ] is a minimum
(15)

If U satisfies this minimum condition, then E[(Y-U)V]=0E[(Y-U)V]=0, or, equivalently

E [ Y V ] = E [ U V ] for all V of the form V = i = 0 n c i X i E [ Y V ] = E [ U V ] for all V of the form V = i = 0 n c i X i
(16)

To see this, set W=Y-UW=Y-U and let d2=E[W2]d2=E[W2]. Now, for any α

d 2 E [ ( W + α V ) 2 ] = d 2 + 2 α E [ W V ] + α 2 E [ V 2 ] d 2 E [ ( W + α V ) 2 ] = d 2 + 2 α E [ W V ] + α 2 E [ V 2 ]
(17)

If we select the special

α = - E [ W V ] E [ V 2 ] then 0 - 2 E [ W V ] 2 E [ V 2 ] + E [ W V ] 2 E [ V 2 ] 2 E [ V 2 ] α = - E [ W V ] E [ V 2 ] then 0 - 2 E [ W V ] 2 E [ V 2 ] + E [ W V ] 2 E [ V 2 ] 2 E [ V 2 ]
(18)

This implies E[WV]20E[WV]20, which can only be satisfied by E[WV]=0E[WV]=0, so that

E [ Y V ] = E [ U V ] E [ Y V ] = E [ U V ]
(19)

On the other hand, if E[(Y-U)V]=0E[(Y-U)V]=0 for all V of the form above, then E[(Y-U)2]E[(Y-U)2] is a minimum. Consider

E [ ( Y - V ) 2 ] = E [ ( Y - U + U - V ) 2 ] = E [ ( Y - U ) 2 ] + E [ ( U - V ) 2 ] + 2 E [ ( Y - U ) ( U - V ) ] E [ ( Y - V ) 2 ] = E [ ( Y - U + U - V ) 2 ] = E [ ( Y - U ) 2 ] + E [ ( U - V ) 2 ] + 2 E [ ( Y - U ) ( U - V ) ]
(20)

Since U-VU-V is of the same form as V, the last term is zero. The first term is fixed. The second term is nonnegative, with zero value iff U-V=0a.s.U-V=0a.s. Hence, E[(Y-V)2]E[(Y-V)2] is a minimum when V=UV=U.

If we take V to be 1,X1,X2,,Xn1,X1,X2,,Xn, successively, we obtain n+1n+1 linear equations in the n+1n+1 unknowns a0,a1,,ana0,a1,,an, as follows.

  1. E[Y]=a0+a1E[X1]++anE[Xn]E[Y]=a0+a1E[X1]++anE[Xn]
  2. E[YXi]=a0E[Xi]+a1E[X1Xi]++anE[XnXi]for1inE[YXi]=a0E[Xi]+a1E[X1Xi]++anE[XnXi]for1in

For each i=1,2,,ni=1,2,,n, we take (2)-E[Xi]·(1)(2)-E[Xi]·(1) and use the calculating expressions for variance and covariance to get

Cov [ Y , X i ] = a 1 Cov [ X 1 , X i ] + a 2 Cov [ X 2 , X i ] + + a n Cov [ X n , X i ] Cov [ Y , X i ] = a 1 Cov [ X 1 , X i ] + a 2 Cov [ X 2 , X i ] + + a n Cov [ X n , X i ]
(21)

These n equations plus equation (1) may be solved alagebraically for the ai.

In the important special case that the Xi are uncorrelated (i.e., Cov [Xi,Xj]=0 Cov [Xi,Xj]=0 for ijij), we have

a i = Cov [ Y , X i ] Var [ X i ] 1 i n a i = Cov [ Y , X i ] Var [ X i ] 1 i n
(22)

and

a 0 = E [ Y ] - a 1 E [ X 1 ] - a 2 E [ X 2 ] - - a n E [ X n ] a 0 = E [ Y ] - a 1 E [ X 1 ] - a 2 E [ X 2 ] - - a n E [ X n ]
(23)

In particular, this condition holds if the class {Xi:1in}{Xi:1in} is iid as in the case of a simple random sample (see the section on "Simple Random Samples and Statistics).

Examination shows that for n=1n=1, with X1=XX1=X, a0=ba0=b, and a1=aa1=a, the result agrees with that obtained in the treatment of the regression line, above.

Example 4: Linear regression with two variables.

Suppose E[Y]=3E[Y]=3, E[X1]=2E[X1]=2, E[X2]=3E[X2]=3, Var [X1]=3 Var [X1]=3, Var [X2]=8 Var [X2]=8, Cov [Y,X1]=5 Cov [Y,X1]=5, Cov [Y,X2]=7 Cov [Y,X2]=7, and Cov [X1,X2]=1 Cov [X1,X2]=1. Then the three equations are

a 0 + 2 a 2 + 3 a 3 = 3 0 + 3 a 1 + 1 a 2 = 5 0 + 1 a 1 + 8 a 2 = 7 a 0 + 2 a 2 + 3 a 3 = 3 0 + 3 a 1 + 1 a 2 = 5 0 + 1 a 1 + 8 a 2 = 7
(24)

Solution of these simultaneous linear equations with MATLAB gives the results

a0=-1.9565a0=-1.9565, a1=1.4348a1=1.4348, and a2=0.6957a2=0.6957.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks