Suppose that a pair {X,Y}{X,Y} of random variables has a joint distribution.
A value X(ω)X(ω) is observed. It is desired to estimate the corresponding value
Y(ω)Y(ω). Obviously there is no rule for determining Y(ω)Y(ω) unless Y is
a function of X. The best that can be hoped for is some estimate based on an average
of the errors, or on the average of some function of the errors.
Suppose X(ω)X(ω) is observed, and by some rule an estimate Y^(ω)Y^(ω)
is returned. The error of the estimate is Y(ω)-Y^(ω)Y(ω)-Y^(ω). The
most common measure of error is the mean of the square of the error
E
[
(
Y
-
Y
^
)
2
]
E
[
(
Y
-
Y
^
)
2
]
(1)
The choice of the mean square has two important properties: it treats positive and
negative errors alike, and it weights large errors more heavily than smaller ones.
In general, we seek a rule (function) r such that the estimate Y^(ω)Y^(ω)
is rX(ω)rX(ω). That is, we seek a function r such that
E
[
(
Y
-
r
(
X
)
)
2
]
is
a
minimum.
E
[
(
Y
-
r
(
X
)
)
2
]
is
a
minimum.
(2)
The problem of determining such a function is known as the regression problem. In the unit on Regression, we show that this problem is solved by the conditional expectation of Y,
given X. At this point, we seek an important partial solution.
The regression line of Y on X
We seek the best straight line function for minimizing the mean squared error. That is, we seek a
function r of the form u=r(t)=at+bu=r(t)=at+b. The problem is to determine the
coefficients a, b such that
E
[
(
Y
-
a
X
-
b
)
2
]
is
a
minimum
E
[
(
Y
-
a
X
-
b
)
2
]
is
a
minimum
(3)
We write the error in a special form, then square and take the expectation.
Error
=
Y
-
a
X
-
b
=
(
Y
-
μ
Y
)
-
a
(
X
-
μ
X
)
+
μ
Y
-
a
μ
X
-
b
=
(
Y
-
μ
Y
)
-
a
(
X
-
μ
X
)
-
β
Error
=
Y
-
a
X
-
b
=
(
Y
-
μ
Y
)
-
a
(
X
-
μ
X
)
+
μ
Y
-
a
μ
X
-
b
=
(
Y
-
μ
Y
)
-
a
(
X
-
μ
X
)
-
β
(4)
Error
squared
=
(
Y
-
μ
Y
)
2
+
a
2
(
X
-
μ
X
)
2
+
β
2
-
2
β
(
Y
-
μ
Y
)
+
2
a
β
(
X
-
μ
X
)
-
2
a
(
Y
-
μ
Y
)
(
X
-
μ
X
)
Error
squared
=
(
Y
-
μ
Y
)
2
+
a
2
(
X
-
μ
X
)
2
+
β
2
-
2
β
(
Y
-
μ
Y
)
+
2
a
β
(
X
-
μ
X
)
-
2
a
(
Y
-
μ
Y
)
(
X
-
μ
X
)
(5)
E
[
(
Y
-
a
X
-
b
)
2
]
=
σ
Y
2
+
a
2
σ
X
2
+
β
2
-
2
a
Cov
[
X
,
Y
]
E
[
(
Y
-
a
X
-
b
)
2
]
=
σ
Y
2
+
a
2
σ
X
2
+
β
2
-
2
a
Cov
[
X
,
Y
]
(6)
Standard procedures for determining a minimum (with respect to a) show that this occurs for
a
=
Cov
[
X
,
Y
]
Var
[
X
]
b
=
μ
Y
-
a
μ
X
a
=
Cov
[
X
,
Y
]
Var
[
X
]
b
=
μ
Y
-
a
μ
X
(7)
Thus the optimum line, called the regression line of Y on X, is
u
=
Cov
[
X
,
Y
]
Var
[
X
]
(
t
-
μ
X
)
+
μ
Y
=
ρ
σ
Y
σ
X
(
t
-
μ
X
)
+
μ
Y
=
α
(
t
)
u
=
Cov
[
X
,
Y
]
Var
[
X
]
(
t
-
μ
X
)
+
μ
Y
=
ρ
σ
Y
σ
X
(
t
-
μ
X
)
+
μ
Y
=
α
(
t
)
(8)
The second form is commonly used to define the regression line. For certain
theoretical purposes, this is the preferred form. But for calculation, the first
form is usually the more convenient. Only the covariance (which requres both means) and
the variance of X are needed. There is no need to determine Var [Y] Var [Y] or ρ.
jdemo1
jcalc
Enter JOINT PROBABILITIES (as on the plane) P
Enter row matrix of VALUES of X X
Enter row matrix of VALUES of Y Y
Use array operations on matrices X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX = 0.6420
EY = total(u.*P)
EY = 0.0783
VX = total(t.^2.*P) - EX^2
VX = 3.3016
CV = total(t.*u.*P) - EX*EY
CV = -0.1633
a = CV/VX
a = -0.0495
b = EY - a*EX
b = 0.1100 % The regression line is u = -0.0495t + 0.11
Suppose the pair {X,Y}{X,Y} has joint density fXY(t,u)=3ufXY(t,u)=3u on the
triangular region bounded by u=0u=0, u=1+tu=1+t, u=1-tu=1-t. Determine the regression line of Y on X.
ANALYTIC SOLUTION
By symmetry, E[X]=E[XY]=0E[X]=E[XY]=0, so Cov [X,Y]=0 Cov [X,Y]=0. The regression curve is
u
=
E
[
Y
]
=
3
∫
0
1
u
2
∫
u
-
1
1
-
u
d
t
d
u
=
6
∫
0
1
u
2
(
1
-
u
)
d
u
=
1
/
2
u
=
E
[
Y
]
=
3
∫
0
1
u
2
∫
u
-
1
1
-
u
d
t
d
u
=
6
∫
0
1
u
2
(
1
-
u
)
d
u
=
1
/
2
(9)
Note that the pair is uncorrelated, but by the rectangle test is not independent.
With zero values of E[X]E[X] and E[XY]E[XY], the approximation procedure is not very
satisfactory unless a very large number of approximation points are employed.
The pair {X,Y}{X,Y} has joint density fXY(t,u)=637(t+2u)fXY(t,u)=637(t+2u) on the region 0≤t≤20≤t≤2, 0≤u≤max{1,t}0≤u≤max{1,t}
(see Figure Figure 1). Determine the regression line of Y on X. If the value
X(ω)=1.7X(ω)=1.7 is observed, what is the best mean-square linear estimate
of Y(ω)Y(ω)?
ANALYTIC SOLUTION
E
[
X
]
=
6
37
∫
0
1
∫
0
1
(
t
2
+
2
t
u
)
d
u
d
t
+
6
37
∫
1
2
∫
0
t
(
t
2
+
2
t
u
)
d
u
d
t
=
50
/
37
E
[
X
]
=
6
37
∫
0
1
∫
0
1
(
t
2
+
2
t
u
)
d
u
d
t
+
6
37
∫
1
2
∫
0
t
(
t
2
+
2
t
u
)
d
u
d
t
=
50
/
37
(10)
The other quantities involve integrals over the same regions with appropriate integrands, as
follows:
Table 1
| Quantity |
Integrand |
Value |
|
E
[
X
2
]
E
[
X
2
]
|
t
3
+
2
t
2
u
t
3
+
2
t
2
u
|
779/370 |
|
E
[
Y
]
E
[
Y
]
|
t
u
+
2
u
2
t
u
+
2
u
2
|
127/148 |
|
E
[
X
Y
]
E
[
X
Y
]
|
t
2
u
+
2
t
u
2
t
2
u
+
2
t
u
2
|
232/185 |
Then
Var
[
X
]
=
779
370
-
50
37
2
=
3823
13690
Cov
[
X
,
Y
]
=
232
185
-
50
37
·
127
148
=
1293
13690
Var
[
X
]
=
779
370
-
50
37
2
=
3823
13690
Cov
[
X
,
Y
]
=
232
185
-
50
37
·
127
148
=
1293
13690
(11)
and
a
=
Cov
[
X
,
Y
]
/
Var
[
X
]
=
1293
3823
≈
0
.
3382
,
b
=
E
[
Y
]
-
a
E
[
X
]
=
6133
15292
≈
0
.
4011
a
=
Cov
[
X
,
Y
]
/
Var
[
X
]
=
1293
3823
≈
0
.
3382
,
b
=
E
[
Y
]
-
a
E
[
X
]
=
6133
15292
≈
0
.
4011
(12)
The regression line is u=at+bu=at+b. If X(ω)=1.7X(ω)=1.7, the best linear estimate (in
the mean square sense) is Y^(ω)=1.7a+b=0.9760Y^(ω)=1.7a+b=0.9760 (see Figure 1 for
an approximate plot).
APPROXIMATION
tuappr
Enter matrix [a b] of X-range endpoints [0 2]
Enter matrix [c d] of Y-range endpoints [0 2]
Enter number of X approximation points 400
Enter number of Y approximation points 400
Enter expression for joint density (6/37)*(t+2*u).*(u<=max(t,1))
Use array operations on X, Y, PX, PY, t, u, and P
EX = total(t.*P)
EX = 1.3517 % Theoretical = 1.3514
EY = total(u.*P)
EY = 0.8594 % Theoretical = 0.8581
VX = total(t.^2.*P) - EX^2
VX = 0.2790 % Theoretical = 0.2793
CV = total(t.*u.*P) - EX*EY
CV = 0.0947 % Theoretical = 0.0944
a = CV/VX
a = 0.3394 % Theoretical = 0.3382
b = EY - a*EX
b = 0.4006 % Theoretical = 0.4011
y = 1.7*a + b
y = 0.9776 % Theoretical = 0.9760
An interpretation of ρ2
The analysis above shows the minimum mean squared error is given by
E
[
(
Y
-
Y
^
)
2
]
=
E
(
Y
-
ρ
σ
Y
σ
X
(
X
-
μ
X
)
-
μ
Y
)
2
=
σ
Y
2
E
[
(
Y
*
-
ρ
X
*
)
2
]
E
[
(
Y
-
Y
^
)
2
]
=
E
(
Y
-
ρ
σ
Y
σ
X
(
X
-
μ
X
)
-
μ
Y
)
2
=
σ
Y
2
E
[
(
Y
*
-
ρ
X
*
)
2
]
(13)
=
σ
Y
2
E
[
(
Y
*
)
2
-
2
ρ
X
*
Y
*
+
ρ
2
(
X
*
)
2
]
=
σ
Y
2
(
1
-
2
ρ
2
+
ρ
2
)
=
σ
Y
2
(
1
-
ρ
2
)
=
σ
Y
2
E
[
(
Y
*
)
2
-
2
ρ
X
*
Y
*
+
ρ
2
(
X
*
)
2
]
=
σ
Y
2
(
1
-
2
ρ
2
+
ρ
2
)
=
σ
Y
2
(
1
-
ρ
2
)
(14)
If ρ=0ρ=0, then E[(Y-Y^)2]=σY2E[(Y-Y^)2]=σY2, the mean squared error in the case
of zero linear correlation. Then, ρ2 is interpreted as the fraction of
uncertainty removed by the linear rule and X. This interpretation should not be pushed
too far, but is a common interpretation, often found in the discussion of observations or
experimental results.
More general linear regression
Consider a jointly distributed class. {Y,X1,X2,⋯,Xn}{Y,X1,X2,⋯,Xn}. We wish to deterimine
a function U of the form
U
=
∑
i
=
0
n
a
i
X
i
,
with
X
0
=
1
,
such
that
E
[
(
Y
-
U
)
2
]
is
a
minimum
U
=
∑
i
=
0
n
a
i
X
i
,
with
X
0
=
1
,
such
that
E
[
(
Y
-
U
)
2
]
is
a
minimum
(15)
If U satisfies this minimum condition, then
E[(Y-U)V]=0E[(Y-U)V]=0, or, equivalently
E
[
Y
V
]
=
E
[
U
V
]
for
all
V
of
the
form
V
=
∑
i
=
0
n
c
i
X
i
E
[
Y
V
]
=
E
[
U
V
]
for
all
V
of
the
form
V
=
∑
i
=
0
n
c
i
X
i
(16)
To see this, set W=Y-UW=Y-U and let d2=E[W2]d2=E[W2]. Now, for any α
d
2
≤
E
[
(
W
+
α
V
)
2
]
=
d
2
+
2
α
E
[
W
V
]
+
α
2
E
[
V
2
]
d
2
≤
E
[
(
W
+
α
V
)
2
]
=
d
2
+
2
α
E
[
W
V
]
+
α
2
E
[
V
2
]
(17)
If we select the special
α
=
-
E
[
W
V
]
E
[
V
2
]
then
0
≤
-
2
E
[
W
V
]
2
E
[
V
2
]
+
E
[
W
V
]
2
E
[
V
2
]
2
E
[
V
2
]
α
=
-
E
[
W
V
]
E
[
V
2
]
then
0
≤
-
2
E
[
W
V
]
2
E
[
V
2
]
+
E
[
W
V
]
2
E
[
V
2
]
2
E
[
V
2
]
(18)
This implies E[WV]2≤0E[WV]2≤0, which can only be satisfied by E[WV]=0E[WV]=0, so that
E
[
Y
V
]
=
E
[
U
V
]
E
[
Y
V
]
=
E
[
U
V
]
(19)
On the other hand, if E[(Y-U)V]=0E[(Y-U)V]=0 for all V of the form above, then
E[(Y-U)2]E[(Y-U)2] is a minimum. Consider
E
[
(
Y
-
V
)
2
]
=
E
[
(
Y
-
U
+
U
-
V
)
2
]
=
E
[
(
Y
-
U
)
2
]
+
E
[
(
U
-
V
)
2
]
+
2
E
[
(
Y
-
U
)
(
U
-
V
)
]
E
[
(
Y
-
V
)
2
]
=
E
[
(
Y
-
U
+
U
-
V
)
2
]
=
E
[
(
Y
-
U
)
2
]
+
E
[
(
U
-
V
)
2
]
+
2
E
[
(
Y
-
U
)
(
U
-
V
)
]
(20)
Since U-VU-V is of the same form as V, the last term is zero. The first term is fixed.
The second term is nonnegative, with zero value iff U-V=0a.s.U-V=0a.s. Hence, E[(Y-V)2]E[(Y-V)2]
is a minimum when V=UV=U.
If we take V to be 1,X1,X2,⋯,Xn1,X1,X2,⋯,Xn, successively, we obtain n+1n+1 linear
equations in the n+1n+1 unknowns a0,a1,⋯,ana0,a1,⋯,an, as follows.
-
E[Y]=a0+a1E[X1]+⋯+anE[Xn]E[Y]=a0+a1E[X1]+⋯+anE[Xn]
- E[YXi]=a0E[Xi]+a1E[X1Xi]+⋯+anE[XnXi]for1≤i≤nE[YXi]=a0E[Xi]+a1E[X1Xi]+⋯+anE[XnXi]for1≤i≤n
For each i=1,2,⋯,ni=1,2,⋯,n, we take (2)-E[Xi]·(1)(2)-E[Xi]·(1) and use the calculating
expressions for variance and covariance to get
Cov
[
Y
,
X
i
]
=
a
1
Cov
[
X
1
,
X
i
]
+
a
2
Cov
[
X
2
,
X
i
]
+
⋯
+
a
n
Cov
[
X
n
,
X
i
]
Cov
[
Y
,
X
i
]
=
a
1
Cov
[
X
1
,
X
i
]
+
a
2
Cov
[
X
2
,
X
i
]
+
⋯
+
a
n
Cov
[
X
n
,
X
i
]
(21)
These n equations plus equation (1) may be solved alagebraically for the ai.
In the important special case that the Xi are uncorrelated (i.e., Cov [Xi,Xj]=0 Cov [Xi,Xj]=0 for
i≠ji≠j), we have
a
i
=
Cov
[
Y
,
X
i
]
Var
[
X
i
]
1
≤
i
≤
n
a
i
=
Cov
[
Y
,
X
i
]
Var
[
X
i
]
1
≤
i
≤
n
(22)
and
a
0
=
E
[
Y
]
-
a
1
E
[
X
1
]
-
a
2
E
[
X
2
]
-
⋯
-
a
n
E
[
X
n
]
a
0
=
E
[
Y
]
-
a
1
E
[
X
1
]
-
a
2
E
[
X
2
]
-
⋯
-
a
n
E
[
X
n
]
(23)
In particular, this condition holds if the class {Xi:1≤i≤n}{Xi:1≤i≤n} is iid as in the
case of a simple random sample (see the section on "Simple Random Samples and Statistics).
Examination shows that for n=1n=1, with X1=XX1=X,
a0=ba0=b, and a1=aa1=a, the result agrees with that obtained in the treatment
of the regression line, above.
Suppose E[Y]=3E[Y]=3, E[X1]=2E[X1]=2, E[X2]=3E[X2]=3, Var [X1]=3 Var [X1]=3, Var [X2]=8 Var [X2]=8,
Cov [Y,X1]=5 Cov [Y,X1]=5, Cov [Y,X2]=7 Cov [Y,X2]=7, and Cov [X1,X2]=1 Cov [X1,X2]=1. Then the three equations
are
a
0
+
2
a
2
+
3
a
3
=
3
0
+
3
a
1
+
1
a
2
=
5
0
+
1
a
1
+
8
a
2
=
7
a
0
+
2
a
2
+
3
a
3
=
3
0
+
3
a
1
+
1
a
2
=
5
0
+
1
a
1
+
8
a
2
=
7
(24)
Solution of these simultaneous linear equations with MATLAB gives the results
a0=-1.9565a0=-1.9565, a1=1.4348a1=1.4348, and a2=0.6957a2=0.6957.