The second problem posed in the introduction is basically the solution of
simultaneous linear equations which is fundamental to linear algebra and very
important in diverse areas of applications in mathematics, numerical analysis,
physical and social sciences, engineering, and business. Since a system of
linear equations may be over or under determined in a variety of ways, or may
be consistent but ill conditioned, a comprehensive theory turns out to be more
complicated than it first appears. Indeed, there is a considerable literature
on the subject of generalized inverses or
pseudo-inverses. The careful statement and formulation
of the general problem seems to have started with Moore
[1] and Penrose
[2][3]
and has been developed by many others, especially in statistics. Because the
generalized solution of simultaneous equations is defined in terms of
minimization of an equation error, the techniques are useful in a wide variety
of approximation problems.
The ideas are presented here in terms of finite dimensions and using matrices.
Many of the ideas extend to infinite dimensions using Banach and Hilbert
spaces [4].
Given an
m
m
by
n
n
real matrix
A
A
and an
m
m
by
1
1
vector
b
b
,
find the
n
n
by
1
1
vector
x
x
when
A
x
=
b
A
x
=
b
(1)
If
b
b
does not lie in the range space of
A
A
(the space spanned by the columns of
A
A
),
there is no exact solution to
Equation 1, therefore, an
error is defined by
ɛ
=
A
x
−
b
.
ɛ
=
A
x
−
b
.
(2)
A generalized solution to
Equation 1 is considered
to be an
x
x
that minimizes some norm of
ɛ
ɛ
,
usually
ɛ
T
*
ɛ
ɛ
T
*
ɛ
.
If there is a non-zero solution of the homogeneous equation
A
x
=
0
,
A
x
=
0
,
(3)
then
Equation 1 has many
generalized solutions in the sense that any particular solution of
Equation 1 plus an
arbitrary scalar times any non-zero solution of
Equation 3 will have same
error in
Equation 2 and,
therefore, is also a generalized solution.
Examination of the basic problem shows there are ten cases
[5] to be
considered. These depend on the shape of AA , the
rank
r
r
of AA , and whether
bb is in the span of the columns of
AA .
- 1a:
m
=
n
=
r
m
=
n
=
r
:
One solution with no error.
- 1b:
m
=
n
>
r
m
=
n
>
r
,
b
∈
s
p
a
n
{
A
}
b
∈
s
p
a
n
{
A
}
:
Many solutions with no error.
- 1c:
m
=
n
>
r
m
=
n
>
r
;,
b
n
o
t
∈
s
p
a
n
{
A
}
b
n
o
t
∈
s
p
a
n
{
A
}
:
Many solutions with error.
- 2a:
m
>
n
=
r
m
>
n
=
r
,
b
∈
s
p
a
n
{
A
}
b
∈
s
p
a
n
{
A
}
:
One solution with no error.
- 2b:
m
>
n
=
r
m
>
n
=
r
,
b
n
o
t
∈
s
p
a
n
{
A
}
b
n
o
t
∈
s
p
a
n
{
A
}
:
One solution with error.
- 2c:
m
>
n
>
r
m
>
n
>
r
,
b
∈
s
p
a
n
{
A
}
b
∈
s
p
a
n
{
A
}
:
Many solutions with no error.
- 2d:
m
>
n
>
r
m
>
n
>
r
,
b
n
o
t
∈
s
p
a
n
{
A
}
b
n
o
t
∈
s
p
a
n
{
A
}
:
Many solutions with error.
- 3a:
n
>
m
=
r
n
>
m
=
r
:
Many solutions with no error.
- 3b:
n
>
m
>
r
n
>
m
>
r
,
b
∈
s
p
a
n
{
A
}
b
∈
s
p
a
n
{
A
}
:
Many solutions with no error.
- 3c:
n
>
m
>
r
n
>
m
>
r
,
b
n
o
t
∈
s
p
a
n
{
A
}
b
n
o
t
∈
s
p
a
n
{
A
}
:
Many solutions with error.
In addition to these classifications, the possible orthogonality of the
columns or rows of the matrices gives special characteristics.
There are several assumptions or side conditions that could be used in order
to define a useful unique solution of
Equation 1. The side
conditions used to define the Moore-Penrose pseudo-inverse are that the norm
of
ɛ
ɛ
be minimized and, if there is ambiguity (several solutions with the same
minimum error), the norm of
x
x
also be minimized. A useful alternative to minimizing the norm of
x
x
is to require certain entries in
x
x
to be zero or fixed to some non zero value (equality constraints).
In addition to using side conditions to achieve a unique solution, side
conditions are sometimes part of the original problem. One important case
requires that certain of the equations be satisfied with no error and the
approximation be achieved with the remaining equations.
A unique generalized solution to
Equation 1 always exists
such that the equation error
ɛ
T
*
ɛ
ɛ
T
*
ɛ
and the norm of the solution
x
x
are minimized. This solution is denoted by
x
=
A
+
b
x
=
A
+
b
(4)
where
A
+
A
+
is called the Moore-Penrose inverse of
A
A
.
Roger Penrose
[2][3]
showed that for all
A
A
,
there exists a unique
A
+
A
+
satisfying the four conditions:
-
A
A
+
A
=
A
A
A
+
A
=
A
-
A
+
A
A
+
=
A
+
A
+
A
A
+
=
A
+
-
[
A
A
+
]
*
=
A
A
+
[
A
A
+
]
*
=
A
A
+
-
[
A
+
A
]
*
=
A
+
A
[
A
+
A
]
*
=
A
+
A
There is a large literature on this problem. Five useful books are
[5][6][7][8][9].
The Moore-Penrose pseudo-inverse can be calculated in Matlab
[10] by the
pinv(A,tol) function which uses a singular value
decomposition to calculate the inverse. There are a variety of other numerical
methods given in the above references where each has some advantages and some
disadvantages.
For cases 2a and 2b, the following
n
n
by
n
n
system of equations called the normal equations
[6][5]
have a unique minimum equation error solution.
A
T
*
A
x
=
A
T
*
b
A
T
*
A
x
=
A
T
*
b
(5)
Solving these equations is often used in least squares approximation problems.
For these two cases the pseudo-inverse is simply,
A
+
=
[
A
T
*
A
]
−
1
A
T
*
.
A
+
=
[
A
T
*
A
]
−
1
A
T
*
.
(6)
An equivalent definition
[6] of the
pseudo-inverse can be given in terms of a limit by
A
+
=
lim
δ
→
0
[
A
T
*
A
+
δ
2
I
]
−
1
A
T
*
=
lim
δ
→
0
A
T
*
[
A
A
T
*
+
δ
2
I
]
−
1
.
A
+
=
lim
δ
→
0
[
A
T
*
A
+
δ
2
I
]
−
1
A
T
*
=
lim
δ
→
0
A
T
*
[
A
A
T
*
+
δ
2
I
]
−
1
.
(7)
Some properties
[6][8]
are:
-
[
A
+
]
+
=
A
[
A
+
]
+
=
A
-
[
A
+
]
*
=
[
A
*
]
+
[
A
+
]
*
=
[
A
*
]
+
-
[
A
*
A
]
+
=
A
+
A
*
+
[
A
*
A
]
+
=
A
+
A
*
+
-
λ
+
=
1
/
λ
λ
+
=
1
/
λ
for
λ
≠
0
λ
≠
0
else
λ
+
=
0
λ
+
=
0
-
A
+
=
[
A
*
A
]
+
A
*
=
A
*
[
A
A
*
]
+
A
+
=
[
A
*
A
]
+
A
*
=
A
*
[
A
A
*
]
+
-
A
*
=
A
*
A
A
+
=
A
+
A
A
*
A
*
=
A
*
A
A
+
=
A
+
A
A
*
It is informative to consider the range and null spaces
[8] of
A
A
and
A
+
A
+
-
R
(
A
)
=
R
(
A
A
+
)
=
R
(
A
A
*
)
R
(
A
)
=
R
(
A
A
+
)
=
R
(
A
A
*
)
-
R
(
A
+
)
=
R
(
A
*
)
=
R
(
A
+
A
)
=
R
(
A
*
A
)
R
(
A
+
)
=
R
(
A
*
)
=
R
(
A
+
A
)
=
R
(
A
*
A
)
-
R
(
I
−
A
A
+
)
=
N
(
A
A
+
)
=
N
(
A
*
)
=
N
(
A
+
)
=
R
(
A
)
⊥
R
(
I
−
A
A
+
)
=
N
(
A
A
+
)
=
N
(
A
*
)
=
N
(
A
+
)
=
R
(
A
)
⊥
-
R
(
I
−
A
+
A
)
=
N
(
A
+
A
)
=
N
(
A
)
=
R
(
A
*
)
⊥
R
(
I
−
A
+
A
)
=
N
(
A
+
A
)
=
N
(
A
)
=
R
(
A
*
)
⊥
A particularly useful application of the pseudo-inverse of a matrix is to
various least squared error approximations. A geometric view of the derivation
of the normal equations can be helpful. If
b
b
does not lie in the range space of
A
A
,
an error vector is defined as the difference between
A
x
A
x
and
b
b
.
A geometric picture of this vector makes it clear that for the length of
ɛ
ɛ
to be minimum, it must be orthogonal to the space spanned by the columns of
A
A
.
This means that
A
*
ɛ
=
0
A
*
ɛ
=
0
.
If both sides of
Equation 1 are multiplied
by
A
*
A
*
,
it is easy to see that the normal equations of
Equation 5 result in the
error being orthogonal to the columns of
A
A
and, therefore its being minimal length. If
b
b
does lie in the range space of
A
A
,
the solution of the normal equations gives the exact solution of
Equation 1 with no error.
For cases 1b, 1c, 2c, 2d, 3a, 3b, and 3c the homogeneous equation
Equation 3 has non-zero
solutions. Any vector in the space spanned by these solutions (the null space
of
A
A
)
does not contribute to the error
ɛ
ɛ
defined in Equation 2 and,
therefore, can be added to any particular generalized solution of
Equation 1 to give a
family of solutions with the same approximation error. If the dimension of the
null space of
A
A
is
d
d
,
it is possible to find a unique generalized solution of
Equation 1 with
d
d
zero elements. The non-unique solution for these four cases can be written in
the form [7]
x
=
A
+
b
+
[
I
−
A
+
A
]
y
x
=
A
+
b
+
[
I
−
A
+
A
]
y
(8)
where
y
y
is an arbitrary vector. The first term is the minimum norm solution given by
the Moore-Penrose pseudo-inverse
A
+
A
+
and the second is a contribution in the null space of
A
A
.
The solution of the overdetermined simultaneous equations is generally a least
squared error approximation problem. A particularly interesting and useful
variation on this problem adds inequality and/or equality constraints. This
formulation has proven very powerful in solving the constrained least squares
approximation part of FIR filter design
[11]. The equality
constraints can be taken into account by using Lagrange multipliers and the
inequality constraints can use the Kuhn-Tucker conditions
[12][13][14].
-
E. H. Moore. (1920). On the Reciprocal of the General Algebraic Matrix. Bulletin of the AMS, 26, 394-395.
-
R. Penrose. (1955). A Generalized Inverse for Matrices. Proc. Cambridge Phil. Soc., 51, 406-413.
-
R. Penrose. (1955). On best Approximate Solutions of Linear Matrix Equations. Proc. Cambridge Phil. Soc., 52, 17-19.
-
R. M. Young. (1980). An Introduction to Nonharmonic Fourier Series. New York: Academic Press.
-
C. L. Lawson and R. J. Hanson. (1974). Solving Least Squares Problems. Inglewood Cliffs, NJ: Prentice-Hall.
-
Arthur Albert. (1972). Regression and the Moore-Penrose Pseudoinverse. New York: Academic Press.
-
A. Ben-Israel and T. N. E. Greville. (1974). Generalized Inverses: Theory and Applications. New York: Wiley & Sons.
-
S. L. Campbell and C. D. Meyer, Jr. (1979). Generalized Inverses of Linear Transformations. London: Pitman.
-
C. R. Rao and S. K. Mitra. (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons.
-
Cleve Moler, John Little and Steve Bangert. (1989). Matlab User's Guide. South Natick, MA: The MathWorks, Inc.
-
Ivan W. Selesnick, Markus Lang and C. Sidney Burrus. (1996, August). Constrained Least Square Design of FIR Filters without Explicitly Specified Transition Bands. IEEE Transactions on Signal Processing, 44(8), 1879-1892.
-
R. Fletcher. (1987). Practical Methods of Optimization. (Second). New York: John Wiley & Sons.
-
Gilbert Strang. (1986). Introduction to Applied Mathematics. Wellesley, MA: Wellesley-Cambridge Press.
-
D. G. Luenberger. (1984). Introduction to Linear and Nonlinear Programming. (Second). Reading, MA: Addison-Wesley.