Recall the Weiner filter problem
x
k
x
k
,
d
k
d
k
jointly wide sense stationary
Find WW minimizing
E
ε
k
2
ε
k
2
ε
k
=
d
k
−
y
k
=
d
k
−∑
i
=0M−1
w
i
x
k

i
=
d
k
−
X
k
T
W
k
ε
k
d
k
y
k
d
k
i
M
1
0
w
i
x
k

i
d
k
X
k
W
k
X
k
=(
x
k
x
k

1
⋮
x
k

M
+
1
)
X
k
x
k
x
k

1
⋮
x
k

M
+
1
W
k
=(
w
0
k
w
1
k
⋮
w
M

1
k
)
W
k
w
0
k
w
1
k
⋮
w
M

1
k
The superscript denotes absolute time, and the subscript denotes
time or a vector index.
the solution can be found by setting the gradient
0
0
∇
k
=∂E
ε
k
2∂
W
=E2
ε
k
(−
X
k
)=E2(
d
k
−
X
k
T
W
k
)
X
k
=−(2E
d
k
X
k
)+E
X
k
TW=2P+2RW
∇
k
W
ε
k
2
2
ε
k
X
k
2
d
k
X
k
W
k
X
k
2
d
k
X
k
X
k
X
k
W
2
P
2
R
W
(1)
⇒
W
opt
=R1P
⇒
W
opt
R
P
Alternatively,
W
opt
W
opt
can be found iteratively using a gradient descent
technique
W
k
+
1
=
W
k
−μ
∇
k
W
k
+
1
W
k
μ
∇
k
In practice, we don't know
RR and
PP exactly, and in an adaptive
context they may be slowly varying with time.
To find the (approximate) Wiener filter, some approximations are
necessary. As always, the key is to make the
right approximations!
Approximate
RR and
PP: ⇒ RLS methods,
as discussed last time.
Approximate the gradient!
∇
k
=∂E
ε
k
2∂
W
∇
k
W
ε
k
2
Note that
ε
k
2
ε
k
2
itself is a very noisy approximation to
E
ε
k
2
ε
k
2
. We can get a noisy approximation to the gradient by
finding the gradient of
ε
k
2
ε
k
2
! Widrow and Hoff first published the LMS algorithm,
based on this clever idea, in 1960.
∇
k
^=∂
ε
k
2∂
W
=2
ε
k
∂(
d
k
−
W
k
T
X
k
)∂
W
=2
ε
k
(−
X
k
)=−(2
ε
k
X
k
)
∇
k
W
ε
k
2
2
ε
k
W
d
k
W
k
X
k
2
ε
k
X
k
2
ε
k
X
k
This yields the LMS adaptive filter algorithm

y
k
=
W
k
T
X
k
=∑i=0M−1
w
i
k
x
k

i
y
k
W
k
X
k
i
0
M
1
w
i
k
x
k

i

ε
k
=
d
k
−
y
k
ε
k
d
k
y
k

W
k
+
1
=
W
k
−μ
∇
k
^=
W
k
−μ(2
ε
k
X
k
)=
W
k
+2μ
ε
k
X
k
W
k
+
1
W
k
μ
∇
k
W
k
μ
2
ε
k
X
k
W
k
2
μ
ε
k
X
k
(
w
i
k
+
1
=
w
i
k
+2μ
ε
k
x
k

i
w
i
k
+
1
w
i
k
2
μ
ε
k
x
k

i
)
The LMS algorithm is often called a stochastic
gradient algorithm, since
∇
k
^
∇
k
is a noisy gradient. This is by
far the most commonly used adaptive filtering
algorithm, because
 it was the first
 it is very simple
 in practice it works well (except that sometimes it
converges slowly)
 it requires relatively litle computation
 it updates the tap weights every sample, so it
continually adapts the filter
 it tracks slow changes in the signal statistics well
Table 1
To Compute ⇒ 
y
k
y
k

ε
k
ε
k

W
k
+
1
W
k
+
1

= Total 
multiplies 
MM

00

M+1
M
1

2M+1
2
M
1

adds 
M−1
M
1

11

MM

2M
2
M

So the LMS algorithm is
OM
O
M
per sample. In fact, it is nicely balanced in that
the filter computation and the adaptation require the same
amount of computation.
Note that the parameter μμ
plays a very important role in the LMS algorithm. It can also
be varied with time, but usually a constant
μμ ("convergence weight
facor") is used, chosen after experimentation for a given
application.
large μμ: fast convergence,
fast adaptivity
small μμ: accurate
WW → less
misadjustment error, stability
"A good introduction in adaptive filters, a major DSP application."