Stochastic
L
2
L
2
optimal (least squares) FIR filter design problem: Given a
widesense stationary (WSS) input signal
x
k
x
k
and desired signal
d
k
d
k
(WSS ⇔
Eyk=Eyk+d
y
k
y
k
d
,
r
yz
l=Eykzk+l
r
yz
l
y
k
z
k
l
,
∀
k
,
l
:
r
yy
0<∞
k
l
r
yy
0
)
The Wiener filter is the linear, timeinvariant filter
minimizing
Eε2
ε
2
, the variance of the error.
As posed, this problem seems slightly silly, since
d
k
d
k
is already available! However, this idea is useful in
a wide cariety of applications.
active suspension system design
optimal system may change with different road conditions or
mass in car, so an adaptive system might be
desirable.
System identification (radar, nondestructive testing,
adaptive control systems)
Usually one desires that the input signal
x
k
x
k
be "persistently exciting," which, among other things,
implies nonzero energy in all frequency bands. Why is this
desirable?
for convenience, we will analyze only the
causal, realdata case; extensions are straightforward.
y
k
=∑
l
=0M−1
w
l
x
k

l
y
k
l
0
M
1
w
l
x
k

l
argmin
w
l
Eε2=E
d
k
−
y
k
2=E
d
k
−∑
l
=0M−1
w
l
x
k

l
2=E
d
k
2−2∑
l
=0M−1
w
l
E
d
k
x
k

l
+∑
l
=0M−1∑
m
=0M−1(
w
l
w
m
E
x
k

l
x
k

m
)
w
l
ε
2
d
k
y
k
2
d
k
l
M
1
0
w
l
x
k

l
2
d
k
2
2
l
M
1
0
w
l
d
k
x
k

l
l
0
M
1
m
0
M
1
w
l
w
m
x
k

l
x
k

m
Eε2=
r
dd
0−2∑
l
=0M−1
w
l
r
dx
l+∑
l
=0M−1∑
m
=0M−1
w
l
w
m
r
xx
l−m
ε
2
r
dd
0
2
l
M
1
0
w
l
r
dx
l
l
M
1
0
m
M
1
0
w
l
w
m
r
xx
l
m
where
r
dd
0=E
d
k
2
r
dd
0
d
k
2
r
dx
l=E
d
k
X
k

l
r
dx
l
d
k
X
k

l
r
xx
l−m=E
x
k
x
k
+
l

m
r
xx
l
m
x
k
x
k
+
l

m
This can be written in matrix form as
Eε2=
r
dd
0−2PWT+WTRW
ε
2
r
dd
0
2
P
W
W
R
W
where
P=(
r
dx
0
r
dx
1
⋮
r
dx
M−1
)
P
r
dx
0
r
dx
1
⋮
r
dx
M
1
R=(
r
xx
0
r
xx
1……
r
xx
M−1
r
xx
1
r
xx
0⋱⋱⋮
⋮⋱⋱⋱⋮
⋮⋱⋱
r
xx
0
r
xx
1
r
xx
M−1……
r
xx
1
r
xx
0
)
R
r
xx
0
r
xx
1
…
…
r
xx
M
1
r
xx
1
r
xx
0
⋱
⋱
⋮
⋮
⋱
⋱
⋱
⋮
⋮
⋱
⋱
r
xx
0
r
xx
1
r
xx
M
1
…
…
r
xx
1
r
xx
0
To solve for the optimum filter, compute the gradient with
respect to the top weights vector WW
∇≐(
∂ε2∂
w
0
∂ε2∂
w
1
⋮
∂ε2∂
w
M

1
)
≐
∇
w
0
ε
2
w
1
ε
2
⋮
w
M

1
ε
2
∇=−(2P)+2RW
∇
2
P
2
R
W
(recall
dd
W
ATW=AT
W
A
W
A
,
dd
W
WMW=2MW
W
W
M
W
2
M
W
for symmetric MM) setting the
gradient equal to zero ⇒
W
opt
R=P⇒
W
opt
=R1P
⇒
W
opt
R
P
W
opt
R
P
Since RR is a correlation matrix,
it must be nonnegative definite, so this is a minimizer. For
RR positive definite, the
minimizer is unique.
"A good introduction in adaptive filters, a major DSP application."