As a popular alternative to the Kalman filter, we will
investigate the so-called
The price we pay with LMS instead of a Kalman filter is that the
rate of convergence and adaptation to sudden changes is slower for
LMS than for the Kalman filter (with correct prior assumptions).
Most adaptive filtering alogrithms (LMS
included) are modifications of standard iterative procedures
for solving minimization problems in a real-time
or on-line fashion. Therefore, before deriving
the LMS algorithm we will look at iterative methods of
minimizing error criteria such as MSE.
Conider the following set-up:
x
k
:
observation
x
k
:
observation
y
k
:
signal to be estimated
y
k
:
signal to be estimated
y^k=
w
1
x
k
+
w
2
x
k−1
+…+
w
p
x
k−p+1
y
k
w
1
x
k
w
2
x
k
1
…
w
p
x
k
p
1
(1)
Impulse response of the filter:
…
,
0
,
0
,
w
1
,
w
2
,
…
w
p
,
0
,
0
,
…
…
,
0
,
0
,
w
1
,
w
2
,
…
w
p
,
0
,
0
,
…
y^k=
x
k
Tw
y
k
x
k
w
(2)
Where
x
k
=
x
k
x
k−1
⋮
x
k−p+1
x
k
x
k
x
k
1
⋮
x
k
p
1
and
w=
w
1
w
2
⋮
w
p
w
w
1
w
2
⋮
w
p
e
k
=
y
k
−y^k=
y
k
−
x
k
Tw
e
k
y
k
y
k
y
k
x
k
w
(3)
(
x
k
,
y
k
)
(
x
k
,
y
k
)
are jointly stationary with zero-mean.
E
e
k
2=E
y
k
−
x
k
Tw2=E
y
k
2−2wTE
x
k
y
k
+wTE
x
k
x
k
Tw=
R
yy
−2wT
R
xy
+wT
R
xx
w
e
k
2
y
k
x
k
w
2
y
k
2
2
w
x
k
y
k
w
x
k
x
k
w
R
yy
2
w
R
xy
w
R
xx
w
(4)
Where
R
yy
R
yy
is the variance of
y
k
2
y
k
2
,
R
xx
R
xx
is the covariance matrix of
x
k
x
k
, and
R
xy
=E
x
k
y
k
R
xy
x
k
y
k
is the cross-covariance between
x
k
x
k
and
y
k
y
k
The MSE is quadratic in
W W which implies the MSE
surface is "bowl" shaped with a unique minimum point (
Figure 8).
Minimize MSE:
(∂E
e
k
2∂
w
=2
R
xy
+2
R
xx
w=0)⇒(
w
opt
=
R
xx
-1
R
xy
)
w
e
k
2
-2
R
xy
2
R
xx
w
0
w
opt
R
xx
R
xy
(5)
Notice that we can re-write
Equation 5 as
E
x
k
x
k
Tw=E
x
k
y
k
x
k
x
k
w
x
k
y
k
(6)
or
E
x
k
(
y
k
−
x
k
Tw)=E
x
k
e
k
=0
x
k
y
k
x
k
w
x
k
e
k
0
(7)
Which shows that the error signal is orthogonal to the input
x
k
x
k
(by the
orthogonality principle of
minimum MSE estimator).
Although we can easily determine
w
opt
w
opt
by solving the system of equations
R
xx
w=
R
xy
R
xx
w
R
xy
(8)
Let's look at an iterative procedure for solving this
problem. This will set the stage for our adaptive filtering
algorithm.
We want to minimize the MSE. The idea is
simple. Starting at some initial weight vector
w
0
w
0
, iteratively adjust the values to decrease the
MSE (Figure 9).
We want to
move
w
0
w
0
towards the optimal vector
w
opt
w
opt
. In order to move in the correct direction, we
must move
downhill or in the direction opposite
to the gradient of the MSE surface at the point
w
0
w
0
. Thus, a natural and simple adjustment takes the form
w
1
=
w
0
−12μ∂E
e
k
2∂
w
|
w
=
w
0
w
1
w
0
1
2
μ
w
w
0
w
e
k
2
(9)
Where
μμ is the step size and
tells us how far to move in the negative gradient direction
(
Figure 10).
Generalizing this idea to an iterative strategy, we get
w
k
=
w
k−1
−12μ∂E
e
k
2∂
w
|
w
=
w
k−1
w
k
w
k
1
1
2
μ
w
w
k
1
w
e
k
2
(10)
and we can repeatedly update
w
w:
w
0
,
w
1
,
…
,
w
k
w
0
,
w
1
,
…
,
w
k
. Hopefully each subsequent
w
k
w
k
is closer to
w
opt
w
opt
. Does the procedure converge? Can we adapt it to
an on-line, real-time, dynamic situation in which the
signals may not be stationary?