Recall the Weiner filter problem
x
k
x
k
,
d
k
d
k
jointly wide sense stationary
Find WW minimizing
E
ε
k
2
ε
k
2
ε
k
=
d
k
-
y
k
=
d
k
-∑i=0M-1
w
i
x
k
-
i
=
d
k
-
X
k
T
W
k
ε
k
d
k
y
k
d
k
i
M
1
0
w
i
x
k
-
i
d
k
X
k
W
k
X
k
=
x
k
x
k
-
1
⋮
x
k
-
M
+
1
X
k
x
k
x
k
-
1
⋮
x
k
-
M
+
1
W
k
=
w
0
k
w
1
k
⋮
w
M
-
1
k
W
k
w
0
k
w
1
k
⋮
w
M
-
1
k
The superscript denotes absolute time, and the subscript denotes
time or a vector index.
the solution can be found by setting the gradient
=0
0
∇
k
=∂∂WE
ε
k
2=E2
ε
k
-
X
k
=E-2
d
k
-
X
k
T
W
k
X
k
=-2E
d
k
X
k
+E
X
k
TW=-2P+2RW
∇
k
W
ε
k
2
2
ε
k
X
k
-2
d
k
X
k
W
k
X
k
2
d
k
X
k
X
k
X
k
W
-2
P
2
R
W
(1)
⇒
W
opt
=R-1P
⇒
W
opt
R
P
Alternatively,
W
opt
W
opt
can be found iteratively using a gradient descent
technique
W
k
+
1
=
W
k
-μ
∇
k
W
k
+
1
W
k
μ
∇
k
In practice, we don't know
RR and
PP exactly, and in an adaptive
context they may be slowly varying with time.
To find the (approximate) Wiener filter, some approximations are
necessary. As always, the key is to make the
right approximations!
Good idea: Approximate
RR and
PP: ⇒ RLS methods,
as discussed last time.
Better idea: Approximate the gradient!
∇
k
=∂∂WE
ε
k
2
∇
k
W
ε
k
2
Note that
ε
k
2
ε
k
2
itself is a very noisy approximation to
E
ε
k
2
ε
k
2
. We can get a noisy approximation to the gradient by
finding the gradient of
ε
k
2
ε
k
2
! Widrow and Hoff first published the LMS algorithm,
based on this clever idea, in 1960.
∇
k
̂=∂∂W
ε
k
2=2
ε
k
∂∂W
d
k
-
W
k
T
X
k
=2
ε
k
-
X
k
=-2
ε
k
X
k
∇
k
W
ε
k
2
2
ε
k
W
d
k
W
k
X
k
2
ε
k
X
k
2
ε
k
X
k
This yields the LMS adaptive filter algorithm
Example 1: The LMS Adaptive Filter Algorithm -
y
k
=
W
k
T
X
k
=∑i=0M-1
w
i
k
x
k
-
i
y
k
W
k
X
k
i
0
M
1
w
i
k
x
k
-
i
-
ε
k
=
d
k
-
y
k
ε
k
d
k
y
k
-
W
k
+
1
=
W
k
-μ
∇
k
̂=
W
k
-μ-2
ε
k
X
k
=
W
k
+2μ
ε
k
X
k
W
k
+
1
W
k
μ
∇
k
W
k
μ
-2
ε
k
X
k
W
k
2
μ
ε
k
X
k
(
w
i
k
+
1
=
w
i
k
+2μ
ε
k
x
k
-
i
w
i
k
+
1
w
i
k
2
μ
ε
k
x
k
-
i
)
The LMS algorithm is often called a
stochastic
gradient algorithm, since
∇
k
̂
∇
k
is a noisy gradient. This is
by
far the most commonly used adaptive filtering
algorithm, because
- it was the first
- it is very simple
- in practice it works well (except that sometimes it
converges slowly)
- it requires relatively litle computation
- it updates the tap weights every sample, so it
continually adapts the filter
- it tracks slow changes in the signal statistics well
Computational Cost of LMS
| To Compute ⇒ |
y
k
y
k
|
ε
k
ε
k
|
W
k
+
1
W
k
+
1
|
= Total |
| multiplies |
MM
|
00
|
M+1
M
1
|
2M+1
2
M
1
|
| adds |
M-1
M
1
|
11
|
MM
|
2M
2
M
|
So the LMS algorithm is
OM
O
M
per sample. In fact, it is nicely balanced in that
the filter computation and the adaptation require the same
amount of computation.
Note that the parameter μμ
plays a very important role in the LMS algorithm. It can also
be varied with time, but usually a constant
μμ ("convergence weight
facor") is used, chosen after experimentation for a given
application.
Tradeoffs
large μμ: fast convergence,
fast adaptivity
small μμ: accurate
WW → less
misadjustment error, stability
"A good introduction in adaptive filters, a major DSP application."