Connexions

You are here: Home » Content » Least Squares
Content Actions
Lenses

What is a lens?

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

This content is ...
Affiliated with (?)
This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • This module is included inLens: Rice University OpenCourseWare
    By: OpenCourseWare ConsortiumAs a part of collection:"Matrix Analysis"

    Click the "Rice University OCW" link to see all content affiliated with them.

    Rice University OCW
Tags

(?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Least Squares

Module by: CJ Ganier

Summary: Describes the use of the least squares method for matrix analysis.

Introduction

We learned in the previous chapter that Ax=b Ax b need not possess a solution when the number of rows of A A exceeds its rank, i.e., r<m rm. As this situation arises quite often in practice, typically in the guise of 'more equations than unknowns,' we establish a rationale for the absurdity Ax=b Ax b .

The Normal Equations

The goal is to choose xx such that AxAx is as close as possible to bb. Measuring closeness in terms of the sum of the squares of the components we arrive at the 'least squares' problem of minimizing
res Ax-b2=Ax-bTAx-b Ax b 2 Ax b Ax b (1)
over all x x. The path to the solution is illuminated by the Fundamental Theorem. More precisely, we write bR,bN,bRAbNAT:b=bR+bN bR bN bR A bN A b bR bN . On noting that (i) bR,xn:Ax-bRA bR x n Ax bR A and (ii) AAT A A we arrive at the Pythagorean Theorem.
Pythagorean Theorem norm2Ax-b=Ax-bR+bN2=Ax-bR2+bN2 norm Ax b 2 Ax bR bN 2 Ax bR 2 bN 2 (2)
It is now clear from the Pythagorean Theorem that the best xx is the one that satisfies
Ax=bR Ax bR (3)
As bRA bR A this equation indeed possesses a solution. We have yet however to specify how one computes bR bR given bb. Although an explicit expression for bR bR, the so called orthogonal projection of bb onto A A, in terms of AA and bb is within our grasp we shall, strictly speaking, not require it. To see this, let us note that if xx satisfies the above equation then
Ax-b=Ax-bR+bN=-bN Ax b Ax bR bN bN (4)
As bN bN is no more easily computed than bR bR you may claim that we are just going in circles. The 'practical' information in the above equation however is that Ax-bAT Ax b A , i.e., ATAx-b=0 A Ax b 0 , i.e.,
ATAx=ATb A Ax A b (5)
As ATbAT A b A regardless of bb this system, often referred to as the normal equations, indeed has a solution. This solution is unique so long as the columns of ATA A A are linearly independent, i.e., so long as ATA=0 A A 0 . Recalling Chapter 2, Exercise 2, we note that this is equivalent to A=0 A 0 . We summarize our findings in
theorem 1 
The set of xbR x bR for which the misfit Ax-b2 Ax b 2 is smallest is composed of those xx for which ATAx=ATb A A x A b . There is always at least one such xx. There is exactly one such xx if A=0 A 0 .
As a concrete example, suppose with reference to Figure 1 that A=110100 A 11 01 00 and b=111 b 1 1 1 .
lsqfig.jpg
Figure 1: The decomposition of bb.
As bA b A there is no xx such that Ax=b Ax b. Indeed, Ax-b2=x1+x2+-12+x2-12+11 Ax b 2 x1 x2 -1 2 x2 1 2 1 1 , with the minimum uniquely attained at x=01 x 0 1 , in agreement with the unique solution of the above equation, for ATA=1112 A A 1 1 1 2 and ATb=12 A b 1 2 . We now recognize, a posteriori, that bR=Ax=110 bR Ax 1 1 0 is the orthogonal projection of bb onto the column space of AA.

Applying Least Squares to the Biaxial Test Problem

We shall formulate the identification of the 20 fiber stiffnesses in this previous figure, as a least squares problem. We envision loading, ff, the 9 nodes and measuring the associated 18 displacements, xx. From knowledge of xx and ff we wish to infer the components of K=diagk K diagk where kk is the vector of unknown fiber stiffnesses. The first step is to recognize that
ATKAx=f A K A x f (6)
may be written as
B,B=ATdiagAx:Bk=f B B A diag Ax Bk f (7)
Though conceptually simple this is not of great use in practice, for BB is 18-by-20 and hence the above equation possesses many solutions. The way out is to compute kk as the result of more than one experiment. We shall see that, for our small sample, 2 experiments will suffice. To be precise, we suppose that x1 x1 is the displacement produced by loading f1 f1 while x2 x2 is the displacement produced by loading f2 f2. We then piggyback the associated pieces in B=ATdiagAx1ATdiagAx2 B A diag A x1 A diag A x2 and f=f1f2 f f1 f2 This BB is 36-by-20 and so the system Bk=f Bk f is overdetermined and hence ripe for least squares.
We proceed then to assemble BB and ff. We suppose f1 f1 and f2 f2 to correspond to horizontal and vertical stretching
f1=-100010-100010-100010T f1 -1 0 0 0 1 0 -1 0 0 0 1 0 -1 0 0 0 1 0 (8)
f2=0101010000000-10-10-1T f2 0 1 0 1 0 1 0 0 0 0 0 0 0 -1 0 -1 0 -1 (9)
respectively. For the purpose of our example we suppose that each kj=1 kj 1 except k8=5 k8 5 . We assemble ATKA A K A as in Chapter 2 and solve
ATKAxj=fj A K A xj fj (10)
with the help of the pseudoinverse. In order to impart some `reality' to this problem we taint each xj xj with 10 percent noise prior to constructing BB. Regarding
BTBk=BTf B B k B f (11)
we note that Matlab solves this system when presented with k=B\f when BB is rectangular. We have plotted the results of this procedure in the Figure 2. The stiff fiber is readily identified.
lsqtst.jpg
Figure 2: Results of a successful biaxial test.

Projections

From an algebraic point of view Equation 5)is an elegant reformulation of the least squares problem. Though easy to remember it unfortunately obscures the geometric content, suggested by the word 'projection,' of Equation 4. As projections arise frequently in many applications we pause here to develop them more carefully. With respect to the normal equations we note that if A=0 A 0 then
x=ATA-1ATb x A A -1 A b (12)
and so the orthogonal projection of bb onto A A is:
bR=Ax=AATA-1ATb bR Ax A A A -1 A b (13)
Defining
P=AATA-1AT P A A A -1 A (14)
Equation 13 takes the form bR=Pb bR P b . Commensurate with our notion of what a 'projection' should be we expect that PP map vectors not in A A onto A A while leaving vectors already in A A unscathed. More succinctly, we expect that PbR=bR P bR bR , i.e., PbR=PbR P bR P bR . As the latter should hold for all bRm b Rm we expect that
P2=P P2 P (15)
With respect to Equation 14 we find that indeed
P2=AATA-1ATAATA-1AT=AATA-1AT=P P2 A A A -1 A A A A -1 A A A A -1 A P (16)
We also note that the PP in Equation 14 is symmetric. We dignify these properties through
Definition 1: orthogonal projection
A matrix PP that satisfies P2=P P2 P is called a projection. A symmetric projection is called an orthogonal projection.
We have taken some pains to motivate the use of the word 'projection.' You may be wondering however what symmetry has to do with orthogonality. We explain this in terms of the tautology
b=Pb+I-Pb b Pb IP b (17)
Now, if PP is a projection then so too is I-P IP. Moreover, if PP is symmetric then the dot product of bb's two constituents is
PbTI-Pb=bTPTI-Pb=bTP-P2b=bT0b=0 Pb IP b b P IP b b P P2 b b 0 b 0 (18)
i.e., Pb P b is orthogonal to I-Pb IP b . As examples of a nonorthogonal projections we offer P=100-1200-14-121 P 1 0 0 -12 0 0 -14 -12 1 and I-P IP. Finally, let us note that the central formula, P=AATA-1=AT P A A A -1 A , is even a bit more general than advertised. It has been billed as the orthogonal projection onto the column space of AA. The need often arises however for the orthogonal projection onto some arbitrary subspace MM. The key to using the old PP is simply to realize that every subspace is the column space of some matrix. More precisely, if
x1...xm x1 ... xm (19)
is a basis for MM then clearly if these xj xj are placed into the columns of a matrix called AA then A=M A M . For example, if MM is the line through 11T 11 then
P=111211=121111 P 1 1 12 11 12 1 1 1 1 (20)
is orthogonal projection onto MM.

Exercises

  1. Gilbert Strang was stretched on a rack to lengths =6 6 , 77, and 88 feet under applied forces of f=1 f1 , 22, and 44 tons. Assuming Hooke's law -L=cf L cf , find his compliance, cc, and original height, LL, by least squares.
  2. With regard to the example of § 3 note that, due to the the random generation of the noise that taints the displacements, one gets a different 'answer' every time the code is invoked.
    1. Write a loop that invokes the code a statistically significant number of times and submit bar plots of the average fiber stiffness and its standard deviation for each fiber, along with the associated M--file.
    2. Experiment with various noise levels with the goal of determining the level above which it becomes difficult to discern the stiff fiber. Carefully explain your findings.
  3. Find the matrix that projects 3 3 onto the line spanned by 101T 1 0 1 .
  4. Find the matrix that projects 3 3 onto the plane spanned by 101T 1 0 1 and 11-1T 1 1 -1 .
  5. If PP is the projection of mm onto a kk--dimensional subspace MM, what is the rank of PP and what is P P?

Comments, questions, feedback, criticisms?

Send feedback