Skip to content Skip to navigation

Connexions

You are here: Home » Content » LMS Algorithm Analysis

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

This feature requires Javascript to be enabled.

LMS Algorithm Analysis

Module by: Clayton Scott, Robert Nowak

Objective

Minimize instantaneous squared error

e k 2w= y k -xkTw2 e k w 2 y k x k w 2 (1)

LMS Algorithm

ŵk=ŵk-1+μxk e k w k w k 1 μ x k e k (2)
Where wk w k is the new weight vector, wk-1 w k 1 is the old weight vector, and μxk e k μ x k e k is a small step in the instantaneous error gradient direction.

Interpretation in Terms of Weight Error Vector

Define

vk=wk-wopt v k w k w opt (3)
Where wopt w opt is the optimal weight vector and
ε k = y k -xkTwopt ε k y k x k w opt (4)
where ε k ε k is the minimum error. The stochastic difference equation is:
vk=I-μxkxkTvk-1+μxk ε k v k I μ x k x k v k 1 μ x k ε k (5)

Convergence/Stability Analysis

Show that (tightness)

limBmax{PrvkB}=0 B v k B 0 (6)
With probability 1, the weight error vector is bounded for all kk.

Chebyshev's inequality is

PrvkBEvk2B2 v k B v k 2 B 2 (7)
and
PrvkB=1B2Evk2+σvk2 v k B 1 B 2 v k 2 v k (8)
where Evk2 v k 2 is the squared bias. If Evk2+σvk2 v k 2 v k is finite for all kk, then limBPrvkB=0 B v k B 0 for all kk.

Also,

σvk2=trEvkvkT v k tr v k v k (9)
Therefore σvk2 v k is finite if the diagonal elements of Γ k EvkvkT Γ k v k v k are bounded.

Convergence in Mean

Evk0 v k 0 as k k . Take expectation of Equation 5 using smoothing property to simplify the calculation. We have convergence in mean if

  1. Rxx R xx is positive definite (invertible).
  2. μ<2 λ max Rxx μ 2 λ max R xx .

Bounded Variance

Show that Γ k =EvkvkT Γ k v k v k , the weight vector error covariance is bounded for all kk.

note:

We could have Evk0 v k 0 , but σvk2 v k ; in which case the algorithm would not be stable.
Recall that it is fairly straightforward to show that the diagonal elements of the transformed covariance Ck=UΓkUT C k U Γ k U tend to zero if μ<1 λ max Rxx μ 1 λ max R xx ( U U is the eigenvector matrix of Rxx R xx ; Rxx=UDUT R xx U D U ). The diagonal elements of Ck C k were denoted by γ k , i i,i=1p γ k , i i i 1 p .

Note:

σvk2=trΓk=trUTCkU=trCk=i=1p γ k , i v k tr Γ k tr U C k U tr C k i 1 p γ k , i
Thus, to guarantee boundedness of σvk2 v k we need to show that the "steady-state" values γ k , i γ i < γ k , i γ i .

We showed that

γ i =μα+ σ ε 221-μ λ i γ i μ α σ ε 2 2 1 μ λ i (10)
where σ ε 2=E ε k 2 σ ε 2 ε k 2 , λ i λ i is the i th i th eigenvalue of Rxx R xx ( Rxx=U λ 1 00 λ p UT R xx U λ 1 0 0 λ p U ), and α=c σ ε 21-c α c σ ε 2 1 c .
0<c=12i=1pμ λ i 1-μ λ i <1 0 c 1 2 i 1 p μ λ i 1 μ λ i 1 (11)
We found a sufficient condition for μμ that guaranteed that the steady-state γ i γ i 's (and hence σvk2 v k ) are bounded: μ<23i=1p λ i μ 2 3 i 1 p λ i Where i=1p λ i =trRxx i 1 p λ i tr R xx is the input vector energy.

With this choice of μμ we have:

  1. convergence in mean
  2. bounded steady-state variance
This implies
limBmax{PrvkB}=0 B v k B 0 (12)
In other words, the LMS algorithm is stable about the optimum weight vector wopt w opt .

Learning Curve

Recall that

e k = y k -xkTwk-1 e k y k x k w k 1 (13)
and Equation 4. These imply
e k = ε k -xkTvk-1 e k ε k x k v k 1 (14)
where vk=wk-wopt v k w k w opt . So the MSE
E e k 2= σ ε 2+Evk-1TxkxkTvk-1= σ ε 2+EEvk-1TxkxkTvk-1| x n ε n n,n<k= σ ε 2+Evk-1TRxxvk-1= σ ε 2+EtrRxxvk-1vk-1T= σ ε 2+trRxxΓk-1 e k 2 σ ε 2 v k 1 x k x k v k 1 σ ε 2 x n ε n n n k v k 1 x k x k v k 1 σ ε 2 v k 1 R xx v k 1 σ ε 2 tr R xx v k 1 v k 1 σ ε 2 tr R xx Γ k 1 (15)
Where trRxxΓk-1 α k 1 α=c σ ε 21-c tr R xx Γ k 1 α k 1 α c σ ε 2 1 c . So the limiting MSE is
ε =limkE e k 2= σ ε 2+c σ ε 21-c= σ ε 21-c ε k e k 2 σ ε 2 c σ ε 2 1 c σ ε 2 1 c (16)
Since 0<c<1 0 c 1 was required for convergence, ε > σ ε 2 ε σ ε 2 so that we see noisy adaptation leads to an MSE larger than the optimal
E ε k 2=E y k -xkTwopt2= σ ε 2 ε k 2 y k x k w opt 2 σ ε 2 (17)
To quantify the increase in the MSE, define the so-called misadjustment:
M= ε - σ ε 2 σ ε 2= ε σ ε 2-1=α σ ε 2=c1-c M ε σ ε 2 σ ε 2 ε σ ε 2 1 α σ ε 2 c 1 c (18)
We would of course like to keep MM as small as possible.

Learning Speed and Misadjustment Trade-off

Fast adaptation and quick convergence require that we take steps as large as possible. In other words, learning speed is proportional to μμ; larger μμ means faster convergence. How does μμ affect the misadjustment?

To guarantee convergence/stability we require μ<23i=1p λ i Rxx μ 2 3 i 1 p λ i R xx Let's assume that in fact μ1i=1p λ i μ 1 i 1 p λ i so that there is no problem with convergence. This condition implies μ1 λ i μ 1 λ i or μ λ i 1 i,i=1p μ λ i 1 i i 1 p . From here we see that

c=12i=1pμ λ i 1-μ λ i 12μi=1p λ i 1 c 1 2 i 1 p μ λ i 1 μ λ i 1 2 μ i 1 p λ i 1 (19)
This misadjustment
M=c1-cc=12μi=1p λ i M c 1 c c 1 2 μ i 1 p λ i (20)
This shows that larger step size μμ leads to larger misadjustment.

Since we still have convergence in mean, this essentially means that with a larger step size we "converge" faster but have a larger variance (rattling) about wopt w opt .

Summary

small μμ implies

  • small misadjustment in steady-state
  • slow adaptation/tracking
large μμ implies
  • large misadjustment in steady-state
  • fast adaptation/tracking

Example 1

wopt=11 w opt 1 1 xk01001 x k 0 1 0 0 1 y k =xkTwopt+ ε k y k x k w opt ε k ε k 00.01 ε k 0 0.01

LMS Algorithm

initialization w0=00 w 0 0 0 and wk=wk-1+μxk e k k,k1 w k w k 1 μ x k e k k k 1 , where e k = y k -xkTwk-1 e k y k x k w k 1

Figure 1: μ=0.05 μ 0.05
Learning Curve
Learning Curve (.png)
Figure 2: μ=0.3 μ 0.3
LMS Learning Curve
LMS Learning Curve (.png)
Figure 3
Comparison of Learning Curves
Comparison of Learning Curves (.png)

Comments, questions, feedback, criticisms?

Send feedback