Skip to content Skip to navigation

Connexions

You are here: Home » Content » General Solutions of Simultaneous Equations

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

General Solutions of Simultaneous Equations

Module by: C. Sidney Burrus. E-mail the author

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

The second problem posed in the introduction is basically the solution of simultaneous linear equations which is fundamental to linear algebra and very important in diverse areas of applications in mathematics, numerical analysis, physical and social sciences, engineering, and business. Since a system of linear equations may be over or under determined in a variety of ways, or may be consistent but ill conditioned, a comprehensive theory turns out to be more complicated than it first appears. Indeed, there is a considerable literature on the subject of generalized inverses or pseudo-inverses. The careful statement and formulation of the general problem seems to have started with Moore [1] and Penrose [2][3] and has been developed by many others, especially in statistics. Because the generalized solution of simultaneous equations is defined in terms of minimization of an equation error, the techniques are useful in a wide variety of approximation problems.

The ideas are presented here in terms of finite dimensions and using matrices. Many of the ideas extend to infinite dimensions using Banach and Hilbert spaces [4].

The Problem

Given an m m by n n real matrix A A and an m m by 1 1 vector b b , find the n n by 1 1 vector x x when

A x = b A x = b (1)
If b b does not lie in the range space of A A (the space spanned by the columns of A A ), there is no exact solution to Equation 1, therefore, an error is defined by
ɛ = A x b . ɛ = A x b . (2)
A generalized solution to Equation 1 is considered to be an x x that minimizes some norm of ɛ ɛ , usually ɛ T * ɛ ɛ T * ɛ .

If there is a non-zero solution of the homogeneous equation

A x = 0 , A x = 0 , (3)
then Equation 1 has many generalized solutions in the sense that any particular solution of Equation 1 plus an arbitrary scalar times any non-zero solution of Equation 3 will have same error in Equation 2 and, therefore, is also a generalized solution.

Examination of the basic problem shows there are ten cases [5] to be considered. These depend on the shape of AA , the rank r r of AA , and whether bb is in the span of the columns of AA .

  • 1a: m = n = r m = n = r : One solution with no error.
  • 1b: m = n > r m = n > r , b s p a n { A } b s p a n { A } : Many solutions with no error.
  • 1c: m = n > r m = n > r ;, b n o t s p a n { A } b n o t s p a n { A } : Many solutions with error.
  • 2a: m > n = r m > n = r , b s p a n { A } b s p a n { A } : One solution with no error.
  • 2b: m > n = r m > n = r , b n o t s p a n { A } b n o t s p a n { A } : One solution with error.
  • 2c: m > n > r m > n > r , b s p a n { A } b s p a n { A } : Many solutions with no error.
  • 2d: m > n > r m > n > r , b n o t s p a n { A } b n o t s p a n { A } : Many solutions with error.
  • 3a: n > m = r n > m = r : Many solutions with no error.
  • 3b: n > m > r n > m > r , b s p a n { A } b s p a n { A } : Many solutions with no error.
  • 3c: n > m > r n > m > r , b n o t s p a n { A } b n o t s p a n { A } : Many solutions with error.

In addition to these classifications, the possible orthogonality of the columns or rows of the matrices gives special characteristics.

There are several assumptions or side conditions that could be used in order to define a useful unique solution of Equation 1. The side conditions used to define the Moore-Penrose pseudo-inverse are that the norm of ɛ ɛ be minimized and, if there is ambiguity (several solutions with the same minimum error), the norm of x x also be minimized. A useful alternative to minimizing the norm of x x is to require certain entries in x x to be zero or fixed to some non zero value (equality constraints).

In addition to using side conditions to achieve a unique solution, side conditions are sometimes part of the original problem. One important case requires that certain of the equations be satisfied with no error and the approximation be achieved with the remaining equations.

Moore-Penrose Pseudo-Inverse

A unique generalized solution to Equation 1 always exists such that the equation error ɛ T * ɛ ɛ T * ɛ and the norm of the solution x x are minimized. This solution is denoted by

x = A + b x = A + b (4)
where A + A + is called the Moore-Penrose inverse of A A .

Roger Penrose [2][3] showed that for all A A , there exists a unique A + A + satisfying the four conditions:

  • A A + A = A A A + A = A
  • A + A A + = A + A + A A + = A +
  • [ A A + ] * = A A + [ A A + ] * = A A +
  • [ A + A ] * = A + A [ A + A ] * = A + A

There is a large literature on this problem. Five useful books are [5][6][7][8][9]. The Moore-Penrose pseudo-inverse can be calculated in Matlab [10] by the pinv(A,tol) function which uses a singular value decomposition to calculate the inverse. There are a variety of other numerical methods given in the above references where each has some advantages and some disadvantages.

Properties

For cases 2a and 2b, the following n n by n n system of equations called the normal equations [6][5] have a unique minimum equation error solution.

A T * A x = A T * b A T * A x = A T * b (5)
Solving these equations is often used in least squares approximation problems. For these two cases the pseudo-inverse is simply,
A + = [ A T * A ] 1 A T * . A + = [ A T * A ] 1 A T * . (6)
An equivalent definition [6] of the pseudo-inverse can be given in terms of a limit by
A + = lim δ 0 [ A T * A + δ 2 I ] 1 A T * = lim δ 0 A T * [ A A T * + δ 2 I ] 1 . A + = lim δ 0 [ A T * A + δ 2 I ] 1 A T * = lim δ 0 A T * [ A A T * + δ 2 I ] 1 . (7)
Some properties [6][8] are:

  • [ A + ] + = A [ A + ] + = A
  • [ A + ] * = [ A * ] + [ A + ] * = [ A * ] +
  • [ A * A ] + = A + A * + [ A * A ] + = A + A * +
  • λ + = 1 / λ λ + = 1 / λ for λ 0 λ 0 else λ + = 0 λ + = 0
  • A + = [ A * A ] + A * = A * [ A A * ] + A + = [ A * A ] + A * = A * [ A A * ] +
  • A * = A * A A + = A + A A * A * = A * A A + = A + A A *

It is informative to consider the range and null spaces [8] of A A and A + A +

  • R ( A ) = R ( A A + ) = R ( A A * ) R ( A ) = R ( A A + ) = R ( A A * )
  • R ( A + ) = R ( A * ) = R ( A + A ) = R ( A * A ) R ( A + ) = R ( A * ) = R ( A + A ) = R ( A * A )
  • R ( I A A + ) = N ( A A + ) = N ( A * ) = N ( A + ) = R ( A ) R ( I A A + ) = N ( A A + ) = N ( A * ) = N ( A + ) = R ( A )
  • R ( I A + A ) = N ( A + A ) = N ( A ) = R ( A * ) R ( I A + A ) = N ( A + A ) = N ( A ) = R ( A * )

Geometric interpretation and Least Squares Approximation

A particularly useful application of the pseudo-inverse of a matrix is to various least squared error approximations. A geometric view of the derivation of the normal equations can be helpful. If b b does not lie in the range space of A A , an error vector is defined as the difference between A x A x and b b . A geometric picture of this vector makes it clear that for the length of ɛ ɛ to be minimum, it must be orthogonal to the space spanned by the columns of A A . This means that A * ɛ = 0 A * ɛ = 0 . If both sides of Equation 1 are multiplied by A * A * , it is easy to see that the normal equations of Equation 5 result in the error being orthogonal to the columns of A A and, therefore its being minimal length. If b b does lie in the range space of A A , the solution of the normal equations gives the exact solution of Equation 1 with no error.

For cases 1b, 1c, 2c, 2d, 3a, 3b, and 3c the homogeneous equation Equation 3 has non-zero solutions. Any vector in the space spanned by these solutions (the null space of A A ) does not contribute to the error ɛ ɛ defined in Equation 2 and, therefore, can be added to any particular generalized solution of Equation 1 to give a family of solutions with the same approximation error. If the dimension of the null space of A A is d d , it is possible to find a unique generalized solution of Equation 1 with d d zero elements. The non-unique solution for these four cases can be written in the form [7]

x = A + b + [ I A + A ] y x = A + b + [ I A + A ] y (8)
where y y is an arbitrary vector. The first term is the minimum norm solution given by the Moore-Penrose pseudo-inverse A + A + and the second is a contribution in the null space of A A .

Least Squares with Constraints

The solution of the overdetermined simultaneous equations is generally a least squared error approximation problem. A particularly interesting and useful variation on this problem adds inequality and/or equality constraints. This formulation has proven very powerful in solving the constrained least squares approximation part of FIR filter design [11]. The equality constraints can be taken into account by using Lagrange multipliers and the inequality constraints can use the Kuhn-Tucker conditions [12][13][14].

References

  1. E. H. Moore. (1920). On the Reciprocal of the General Algebraic Matrix. Bulletin of the AMS, 26, 394-395.
  2. R. Penrose. (1955). A Generalized Inverse for Matrices. Proc. Cambridge Phil. Soc., 51, 406-413.
  3. R. Penrose. (1955). On best Approximate Solutions of Linear Matrix Equations. Proc. Cambridge Phil. Soc., 52, 17-19.
  4. R. M. Young. (1980). An Introduction to Nonharmonic Fourier Series. New York: Academic Press.
  5. C. L. Lawson and R. J. Hanson. (1974). Solving Least Squares Problems. Inglewood Cliffs, NJ: Prentice-Hall.
  6. Arthur Albert. (1972). Regression and the Moore-Penrose Pseudoinverse. New York: Academic Press.
  7. A. Ben-Israel and T. N. E. Greville. (1974). Generalized Inverses: Theory and Applications. New York: Wiley & Sons.
  8. S. L. Campbell and C. D. Meyer, Jr. (1979). Generalized Inverses of Linear Transformations. London: Pitman.
  9. C. R. Rao and S. K. Mitra. (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons.
  10. Cleve Moler, John Little and Steve Bangert. (1989). Matlab User's Guide. South Natick, MA: The MathWorks, Inc.
  11. Ivan W. Selesnick, Markus Lang and C. Sidney Burrus. (1996, August). Constrained Least Square Design of FIR Filters without Explicitly Specified Transition Bands. IEEE Transactions on Signal Processing, 44(8), 1879-1892.
  12. R. Fletcher. (1987). Practical Methods of Optimization. (Second). New York: John Wiley & Sons.
  13. Gilbert Strang. (1986). Introduction to Applied Mathematics. Wellesley, MA: Wellesley-Cambridge Press.
  14. D. G. Luenberger. (1984). Introduction to Linear and Nonlinear Programming. (Second). Reading, MA: Addison-Wesley.

Content actions

Give Feedback:

E-mail the module author | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks