Convergence of the mean (firstorder analysis) is insufficient
to guarantee desirable behavior of the LMS algorithm; the variance
could still be infinite. It is important to show that the variance
of the filter coefficients is finite, and to determine how close
the average squared error is to the minimum possible error using
an exact Wiener filter.
E
ε
k
2=E
d
k
−
W
k
T
X
k
2=E
d
k
2−2
d
k
X
k
T
W
k
−
W
k
T
X
k
X
k
T
W
k
=
r
dd
0−2
W
k
TP+
W
k
TR
W
k
ε
k
2
d
k
W
k
X
k
2
d
k
2
2
d
k
X
k
W
k
W
k
X
k
X
k
W
k
r
dd
0
2
W
k
P
W
k
R
W
k
(1)
The minimum error is obtained using the Wiener filter
W
opt
=R1P
W
opt
R
P
ε
min
2=Eε2=(
r
dd
0−2PTR1P+PTR1RR1P)=
r
dd
0−PTR1P
ε
min
2
ε
2
r
dd
0
2
P
R
P
P
R
R
R
P
r
dd
0
P
R
P
(2)
To analyze the average error in LMS, write
Equation 1 in terms of
V
'
=QW−
W
opt
V
'
Q
W
W
opt
, where
Q1ΛQ=R
Q
Λ
Q
R
E
ε
k
2=
r
dd
0−2
W
k
TP+
W
k
TR
W
k
+(−(
W
k
TR
W
opt
))−
W
opt
TR
W
k
+
W
opt
TR
W
opt
+
W
k
TR
W
opt
+
W
opt
TR
W
k
−
W
opt
TR
W
opt
=
r
dd
0+
V
k
TR
V
k
−PTR1P=
ε
min
2+
V
k
TR
V
k
=
ε
min
2+
V
k
TQ1QRQ1Q
V
k
=
ε
min
2+
V
'k
TΛ
V
'k
ε
k
2
r
dd
0
2
W
k
P
W
k
R
W
k
W
k
R
W
opt
W
opt
R
W
k
W
opt
R
W
opt
W
k
R
W
opt
W
opt
R
W
k
W
opt
R
W
opt
r
dd
0
V
k
R
V
k
P
R
P
ε
min
2
V
k
R
V
k
ε
min
2
V
k
Q
Q
R
Q
Q
V
k
ε
min
2
V
'k
Λ
V
'k
(3)
E
ε
k
2=
ε
min
2+∑
j
=0N−1
λ
j
E
v
j
'k
2
ε
k
2
ε
min
2
j
0
N
1
λ
j
v
j
'k
2
So we need to know
E
v
j
'k
2
v
j
'k
2
, which are the diagonal elements of the covariance
matrix of
V
'k
V
'k
, or
E
V
'k
V
'k
T
V
'k
V
'k
.
From the LMS update equation
W
k
+
1
=
W
k
+2μ
ε
k
X
k
W
k
+
1
W
k
2
μ
ε
k
X
k
we get
V
'k
+
1
=
W
'k
+2μ
ε
k
Q
X
k
V
'k
+
1
W
'k
2
μ
ε
k
Q
X
k
𝒱
k
+
1
=E
V
'
k
+
1
V
'
k
+
1
T=E4μ2
ε
k
2Q
X
k
X
k
TQT=
𝒱
k
+2μ
ε
k
Q
X
k
V
'k
T+2μ
ε
k
V
'k
X
k
TQT+4μ2E
ε
k
2Q
X
k
X
k
TQT
𝒱
k
+
1
V
'
k
+
1
V
'
k
+
1
V
'k
V
'k
2
μ
ε
k
Q
X
k
V
'k
2
μ
ε
k
V
'k
X
k
Q
4
μ
2
ε
k
2
Q
X
k
X
k
Q
𝒱
k
2
μ
ε
k
Q
X
k
V
'k
2
μ
ε
k
V
'k
X
k
Q
4
μ
2
ε
k
2
Q
X
k
X
k
Q
(4)
Note that
ε
k
=
d
k
−
W
k
T
X
k
=
d
k
−
W
opt
T−
V
'k
TQ
X
k
ε
k
d
k
W
k
X
k
d
k
W
opt
V
'k
Q
X
k
so
E
ε
k
Q
X
k
V
'k
T=E
d
k
Q
X
k
V
'k
T−
W
opt
T
X
k
Q
X
k
V
'k
T−
V
'k
TQ
X
k
V
'k
T=0+0−Q
X
k
X
k
TQT
V
'k
V
'k
T=−(QE
X
k
X
k
TQTE
V
'k
V
'k
T)=−(Λ
𝒱
k
)
ε
k
Q
X
k
V
'k
d
k
Q
X
k
V
'k
W
opt
X
k
Q
X
k
V
'k
V
'k
Q
X
k
V
'k
0
0
Q
X
k
X
k
Q
V
'k
V
'k
Q
X
k
X
k
Q
V
'k
V
'k
Λ
𝒱
k
(5)
Note that the Patently False independence Assumption was invoked here.
To analyze
E
ε
k
2Q
X
k
X
k
TQT
ε
k
2
Q
X
k
X
k
Q
, we make yet another obviously false assumptioon that
ε
k
2
ε
k
2
and
X
k
X
k
are statistically independent. This is obviously false, since
ε
k
=
d
k
−
W
k
T
X
k
ε
k
d
k
W
k
X
k
. Otherwise, we get 4thorder terms in
XX in the product. These can be
dealt with, at the expense of a more complicated analysis, if a
particular type of distribution (such as Gaussian) is
assumed. See, for example Gardner.
A questionable justification for this assumption is that as
W
k
≃
W
opt
W
k
W
opt
,
W
k
W
k
becomes uncorrelated with
X
k
X
k
(if we invoke the original independence assumption),
which tends to randomize the error signal relative to
X
k
X
k
. With this assumption,
E
ε
k
2Q
X
k
X
k
TQT=E
ε
k
2EQ
X
k
X
k
TQT=E
ε
k
2Λ
ε
k
2
Q
X
k
X
k
Q
ε
k
2
Q
X
k
X
k
Q
ε
k
2
Λ
Now
ε
k
2=
ε
min
2+
V
'k
TΛ
V
'k
ε
k
2
ε
min
2
V
'k
Λ
V
'k
so
E
ε
k
2=
ε
min
2+E∑
j
λ
j
V
j
'k
2=
ε
min
2+∑
j
λ
j
𝒱
jj
k
ε
k
2
ε
min
2
j
λ
j
V
j
'k
2
ε
min
2
j
λ
j
𝒱
jj
k
(6)
Thus,
Equation 4 becomes
𝒱
k
+
1
=I
𝒱
k
+4μ2∑
j
λ
j
𝒱
jj
k
Λ+4μ2
ε
min
2Λ
𝒱
k
+
1
I
4
μ
Λ
𝒱
k
4
μ
2
j
λ
j
𝒱
jj
k
Λ
4
μ
2
ε
min
2
Λ
(7)
Now
if this system is stable and converges,
it converges to
𝒱
∞
=
𝒱
∞
+
1
𝒱
∞
𝒱
∞
+
1
⇒4μΛ
𝒱
∞
=4μ2(∑
j
λ
j
𝒱
jj
+
ε
min
2)Λ
⇒
4
μ
Λ
𝒱
∞
4
μ
2
j
λ
j
𝒱
jj
ε
min
2
Λ
⇒
𝒱
∞
=μ(∑
j
λ
j
𝒱
jj
+
ε
min
2)I
⇒
𝒱
∞
μ
j
λ
j
𝒱
jj
ε
min
2
I
So it is a diagonal matrix with all elements on the diagonal equal:
Then
𝒱
ii
∞
=μ(
𝒱
ii
∞
∑
j
λ
j
+
ε
min
2)
𝒱
ii
∞
μ
𝒱
ii
∞
j
λ
j
ε
min
2
𝒱
ii
∞
(1−μ∑
j
λ
j
)=μ
ε
min
2
𝒱
ii
∞
1
μ
j
λ
j
μ
ε
min
2
𝒱
ii
∞
=μ
ε
min
21−μ∑
j
λ
j
𝒱
ii
∞
μ
ε
min
2
1
μ
j
λ
j
Thus the error in the LMS adaptive filter after convergence is
E
ε
∞
2=
ε
min
2+E
V
'∞
λ
V
'∞
=
ε
min
2+μ
ε
min
2∑
j
λ
j
1−μ∑
j
λ
j
=
ε
min
211−μ∑
j
λ
j
=
ε
min
211−μtrR=
ε
min
211−μ
r
xx
0N
ε
∞
2
ε
min
2
V
'∞
λ
V
'∞
ε
min
2
μ
ε
min
2
j
λ
j
1
μ
j
λ
j
ε
min
2
1
1
μ
j
λ
j
ε
min
2
1
1
μ
tr
R
ε
min
2
1
1
μ
r
xx
0
N
(8)
E
ε
∞
2=
ε
min
211−μN
σ
x
2
ε
∞
2
ε
min
2
1
1
μ
N
σ
x
2
(9)
1−μN
σ
x
2
1
μ
N
σ
x
2
is called the
misadjustment
factor. Oftern, one chooses
μμ to select a desired
misadjustment factor, such as an error 10% higher than the
Wiener filter error.
To determine the range for μμ
for which Equation 7 converges, we must determine
the μμ for which the matrix
difference equation converges.
𝒱
k
+
1
=I
𝒱
k
+4μ2∑
j
λ
j
𝒱
jj
k
Λ+4μ2
ε
min
2Λ
𝒱
k
+
1
I
4
μ
Λ
𝒱
k
4
μ
2
j
λ
j
𝒱
jj
k
Λ
4
μ
2
ε
min
2
Λ
The offdiagonal elements each evolve independently according to
𝒱
ij
k
+
1
=1−4μ
λ
i
𝒱
ij
k
𝒱
ij
k
+
1
1
4
μ
λ
i
𝒱
ij
k
These terms will decay to zero if
∀
i
:4μ
λ
i
<2
i
4
μ
λ
i
2
, or
μ<12
λ
max
μ
1
2
λ
max
The diagonal terms evolve according to
𝒱
ii
k
+
1
=1
𝒱
ii
k
+4μ2
λ
i
∑
j
λ
j
𝒱
jj
k
+4μ2
ε
min
2
λ
i
𝒱
ii
k
+
1
1
4
μ
λ
i
𝒱
ii
k
4
μ
2
λ
i
j
λ
j
𝒱
jj
k
4
μ
2
ε
min
2
λ
i
For the homoegeneous equation
𝒱
ii
k
+
1
=1
𝒱
ii
k
+4μ2
λ
i
∑
j
λ
j
𝒱
jj
k
𝒱
ii
k
+
1
1
4
μ
λ
i
𝒱
ii
k
4
μ
2
λ
i
j
λ
j
𝒱
jj
k
for
1−4μ
λ
i
1
4
μ
λ
i
positive,
𝒱
ii
k
+
1
≤1
𝒱
iimax
k
+4μ2
λ
i
∑
j
λ
j
𝒱
jjmax
k
=(1−4μ
λ
i
+4μ2
λ
i
∑
j
λ
j
)
𝒱
jjmax
k
𝒱
ii
k
+
1
1
4
μ
λ
i
𝒱
iimax
k
4
μ
2
λ
i
j
λ
j
𝒱
jjmax
k
1
4
μ
λ
i
4
μ
2
λ
i
j
λ
j
𝒱
jjmax
k
(10)
𝒱
ii
k
+
1
𝒱
ii
k
+
1
will be strictly less than
𝒱
jjmax
k
𝒱
jjmax
k
for
1−4μ
λ
i
+4μ2
λ
i
∑
j
λ
j
<1
1
4
μ
λ
i
4
μ
2
λ
i
j
λ
j
1
or
4μ2
λ
i
∑
j
λ
j
<4μ
λ
i
4
μ
2
λ
i
j
λ
j
4
μ
λ
i
or
μ<1∑
j
λ
j
=1trR=1N
r
xx
0=1N
σ
x
2
μ
1
j
λ
j
1
tr
R
1
N
r
xx
0
1
N
σ
x
2
(11)
This is a more rigorous bound than the firstorder
bounds. Ofter engineers choose
μμ a few times smaller than
this, since more rigorous analyses yield a slightly smaller
bound.
μ=μ3N
σ
x
2
μ
μ
3
N
σ
x
2
is derived in some analyses assuming Gaussian
x
k
x
k
,
d
k
d
k
.

W.A. Gardner. (1984). Learning Characteristics of StochasticGradientDescent Algorithms: A General Study, Analysis, and Critique. Signal Processing, 6, 113133.
"A good introduction in adaptive filters, a major DSP application."