There are problems where the peak error or Chebyshev error is important.
This can be minimized directly using the Remez exchange algorithm but, in
many cases, is better controlled by use of a peak error constraint on the
basic least squared error formulation of the problem
[2], [6], [66], [5]. An efficient algorithm for minimizing the
constrained least squared error uses Lagrange multipliers
[41], [40] and the Kuhn-Tucker conditions [66], [65].
Similar to the Chebyshev design problem, there are two formulations of the
problem: one where there is a well defined transition band separating the
desired signal spectrum (passband) from the noise or interfering signal
spectrum (stopband) and the second where there is a well defined frequency
that separates the pass and stopband but no well defined transition band.
The first case would include situations with signals residing in specified
bands separated by “guard bands" such as commercial radio and TV
transmissions. It also includes cases where due to multirate sampling,
certain well defined bands are aliased into other well defined bands. The
Parks-McClellan and Shpak-Antoniou Chebyshev designs address this case for
the Chebyshev error.
Adams' method [2], [1], [3], [6], [61], [5] described below
applies to the constrained least squares design with a specified transition band.
The second case would include signals with known spectral support with
additive white or broad-band noise. In these cases there is no obvious
transition band or “don't care" band. The Hoffstetter-Oppenheim-Siegel
and the method of (Reference) address this case for a Chebyshev design.
The method in section below applies to the constrained least squares
design [66] without a specified transition band.
To pose the constrained least squared error optimization problem, we use
a Lagrange multiplier formulation. First define the Lagrangian as
L
=
P
∫
0
π
(
A
(
ω
)
-
A
d
(
ω
)
)
2
d
ω
+
∑
i
μ
i
A
(
ω
i
)
-
[
A
d
(
ω
i
)
±
T
(
ω
i
)
]
L
=
P
∫
0
π
(
A
(
ω
)
-
A
d
(
ω
)
)
2
d
ω
+
∑
i
μ
i
A
(
ω
i
)
-
[
A
d
(
ω
i
)
±
T
(
ω
i
)
]
(1)
where the μiμi are the necessary number of Langrange multipliers and
PP is a scale factor that can be chosen for simplicity later. The
first term in Equation 1 is the integral squared error of the frequency
response to be minimized and the second term will be zero when the equality
constraints are satisfied at the frequencies, ωiωi.
The function T(ω)T(ω) is the constraint function in that A(ω)A(ω)
must satisfy
A
d
(
ω
)
+
T
(
ω
)
≥
A
(
ω
)
≥
A
d
(
ω
)
-
T
(
ω
)
.
A
d
(
ω
)
+
T
(
ω
)
≥
A
(
ω
)
≥
A
d
(
ω
)
-
T
(
ω
)
.
(2)
Necessary
conditions for the minimization of the integral squared error are that the
derivative of the Lagrangian with respect to the filter parameters a(n)a(n)
defined in (Reference)
and to the Lagrange multipliers μiμi be zero [69].
The derivatives of the Lagrangian with respect to a(n)a(n) are
d
L
d
a
(
n
)
=
P
∫
0
π
2
(
A
(
ω
)
-
A
d
(
ω
)
)
d
A
d
a
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
d
L
d
a
(
n
)
=
P
∫
0
π
2
(
A
(
ω
)
-
A
d
(
ω
)
)
d
A
d
a
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
(3)
where from (Reference) we have for n=1,2,⋯,Mn=1,2,⋯,M
d
A
(
ω
)
d
a
(
n
)
=
cos
(
ω
n
)
d
A
(
ω
)
d
a
(
n
)
=
cos
(
ω
n
)
(4)
and for n=0n=0
d
A
(
ω
)
d
a
(
0
)
=
K
.
d
A
(
ω
)
d
a
(
0
)
=
K
.
(5)
For n=1,2,⋯,Mn=1,2,⋯,M this gives
d
L
d
a
(
n
)
=
2
P
∫
A
(
ω
)
cos
(
ω
n
)
d
ω
-
∫
A
d
(
ω
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
cos
(
ω
i
n
)
d
L
d
a
(
n
)
=
2
P
∫
A
(
ω
)
cos
(
ω
n
)
d
ω
-
∫
A
d
(
ω
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
cos
(
ω
i
n
)
(6)
and for n=0n=0 gives
d
L
d
a
(
0
)
=
2
P
K
∫
A
(
ω
)
d
ω
-
∫
A
d
(
ω
)
d
ω
+
∑
i
μ
i
K
.
d
L
d
a
(
0
)
=
2
P
K
∫
A
(
ω
)
d
ω
-
∫
A
d
(
ω
)
d
ω
+
∑
i
μ
i
K
.
(7)
Using (Reference) for n=1,2,⋯,Mn=1,2,⋯,M, we have
d
L
d
a
(
n
)
=
π
P
a
(
n
)
-
a
d
(
n
)
+
∑
i
μ
i
cos
(
ω
i
n
)
=
0
d
L
d
a
(
n
)
=
π
P
a
(
n
)
-
a
d
(
n
)
+
∑
i
μ
i
cos
(
ω
i
n
)
=
0
(8)
and for n=0n=0
d
L
d
a
(
0
)
=
2
π
P
K
2
a
(
0
)
-
a
d
(
0
)
+
K
∑
i
μ
i
=
0
.
d
L
d
a
(
0
)
=
2
π
P
K
2
a
(
0
)
-
a
d
(
0
)
+
K
∑
i
μ
i
=
0
.
(9)
Choosing P=1/πP=1/π gives
a
(
n
)
=
a
d
(
n
)
-
∑
i
μ
i
cos
(
ω
i
n
)
a
(
n
)
=
a
d
(
n
)
-
∑
i
μ
i
cos
(
ω
i
n
)
(10)
and
a
(
0
)
=
a
d
(
0
)
-
1
2
K
∑
i
μ
i
a
(
0
)
=
a
d
(
0
)
-
1
2
K
∑
i
μ
i
(11)
Writing Equation 10 and Equation 11 in matrix form gives
a
=
a
d
-
H
μ
.
a
=
a
d
-
H
μ
.
(12)
where HH is a matrix with elements
h
(
n
,
i
)
=
cos
(
ω
i
n
)
h
(
n
,
i
)
=
cos
(
ω
i
n
)
(13)
except for the first row which is
h
(
0
,
i
)
=
1
2
K
h
(
0
,
i
)
=
1
2
K
(14)
because of the normalization of the a(0)a(0) term. The ad(n)ad(n) are the
cosine coefficients for the unconstrained approximation to the ideal
filter which result from truncating the inverse DTFT of Ad(ω)Ad(ω).
The derivative of the Lagrangian in Equation 1 with respect to the Lagrange
multipliers μiμi, when set to zero, gives
A
(
ω
i
)
=
A
d
(
ω
i
)
±
T
(
ω
i
)
=
A
c
(
ω
i
)
A
(
ω
i
)
=
A
d
(
ω
i
)
±
T
(
ω
i
)
=
A
c
(
ω
i
)
(15)
which is simply a statement of the equality constraints.
In terms of the filter's cosine coefficients a(n)a(n), from (Reference), this
can be written
A
c
(
ω
i
)
=
∑
n
a
(
n
)
cos
(
ω
i
n
)
+
K
a
(
0
)
A
c
(
ω
i
)
=
∑
n
a
(
n
)
cos
(
ω
i
n
)
+
K
a
(
0
)
(16)
and as matrices
A
c
=
G
a
A
c
=
G
a
(17)
where AcAc is the vector of frequency response values which are the
desired response plus or minus the constraints evaluated at the
frequencies in the constraint set. The frequency response must
interpolate these values. The matrix GG is
g
(
i
,
n
)
=
cos
(
ω
i
n
)
g
(
i
,
n
)
=
cos
(
ω
i
n
)
(18)
except for the first column which is
g
(
i
,
0
)
=
K
.
g
(
i
,
0
)
=
K
.
(19)
Notice that if K=1/2K=1/2, the first rows and columns are such that we
have GT=HGT=H.
The two equations Equation 12 and Equation 17 that must be satisfied can be
written as a single matrix equation of the form
I
H
G
0
a
μ
=
a
d
A
c
I
H
G
0
a
μ
=
a
d
A
c
(20)
or, if K=1/2K=1/2, as
I
G
T
G
0
a
μ
=
a
d
A
c
I
G
T
G
0
a
μ
=
a
d
A
c
(21)
which have as solutions
μ
=
(
G
H
)
-
1
(
G
a
d
-
A
c
)
a
=
a
d
-
H
μ
μ
=
(
G
H
)
-
1
(
G
a
d
-
A
c
)
a
=
a
d
-
H
μ
(22)
The filter corresponding to the cosine coefficients a(n)a(n) minimize the
L2L2 error norm subject the equality conditions in Equation 17.
Notice that the term in Equation 22 of the form GadGad is the
frequency response of the optimal unconstrained filter evaluated at the
constraint set frequencies. Equation Equation 22 could, therefore, be
written
μ
=
(
G
H
)
-
1
(
A
u
-
A
c
)
μ
=
(
G
H
)
-
1
(
A
u
-
A
c
)
(23)
Combining the weighted least squared error formulation with the
constrained least squared error gives the general formulation of this
class of problems.
We now modify the Lagrangian in Equation 1 to allow a weighted squared error
giving
L
=
1
π
∫
0
π
W
(
ω
)
(
A
(
ω
)
-
A
d
(
ω
)
)
2
d
ω
+
∑
i
μ
i
A
(
ω
i
)
-
A
d
(
ω
i
)
±
T
(
ω
i
)
L
=
1
π
∫
0
π
W
(
ω
)
(
A
(
ω
)
-
A
d
(
ω
)
)
2
d
ω
+
∑
i
μ
i
A
(
ω
i
)
-
A
d
(
ω
i
)
±
T
(
ω
i
)
(24)
with a corresponding derivative of
d
L
d
a
(
n
)
=
2
π
∫
W
(
ω
)
(
A
(
ω
)
-
A
d
(
ω
)
d
A
d
a
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
d
L
d
a
(
n
)
=
2
π
∫
W
(
ω
)
(
A
(
ω
)
-
A
d
(
ω
)
d
A
d
a
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
(25)
The integral cannot be carried out analytically for a general weighting
function, but if the weight function is constant over each subband,
(Reference) can be written
d
L
d
a
(
n
)
=
2
π
∑
k
∫
ω
k
ω
k
+
1
W
k
(
∑
m
=
1
M
a
(
m
)
cos
(
ω
m
)
+
K
a
(
0
)
-
A
d
(
ω
)
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
d
L
d
a
(
n
)
=
2
π
∑
k
∫
ω
k
ω
k
+
1
W
k
(
∑
m
=
1
M
a
(
m
)
cos
(
ω
m
)
+
K
a
(
0
)
-
A
d
(
ω
)
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
d
A
d
a
ω
i
(26)
which after rearranging is
=
∑
m
=
1
M
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
(
cos
(
ω
m
)
cos
(
ω
n
)
)
d
ω
a
(
m
)
=
∑
m
=
1
M
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
(
cos
(
ω
m
)
cos
(
ω
n
)
)
d
ω
a
(
m
)
(27)
-
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
A
d
(
ω
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
cos
(
ω
i
n
)
=
0
-
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
A
d
(
ω
)
cos
(
ω
n
)
d
ω
+
∑
i
μ
i
cos
(
ω
i
n
)
=
0
(28)
where the integral in the first term can now be done analytically. In matrix
notation (Reference) is
R
a
-
a
d
w
+
H
μ
=
0
R
a
-
a
d
w
+
H
μ
=
0
(29)
This is a similar form to that in the multiband paper where the matrix
RR gives the effects of weighting with elements
r
(
n
,
m
)
=
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
(
cos
(
ω
m
)
cos
(
ω
n
)
)
d
ω
r
(
n
,
m
)
=
2
π
∑
k
W
k
∫
ω
k
ω
k
+
1
(
cos
(
ω
m
)
cos
(
ω
n
)
)
d
ω
(30)
except for the first row which should be divided by 2K2K because of the
normalizing of the a(0)a(0) term in (Reference) and Equation 14 and the first
column which should be multiplied by KK because of (Reference) and
Equation 19. The matrix RR is a sum of a Toeplitz matrix and a
Hankel matrix and this fact might be used to advantage and adwadw is the vector of modified filter parameters with elements
a
d
w
(
n
)
=
2
π
∑
W
k
∫
ω
k
ω
k
+
1
A
d
(
ω
)
cos
(
ω
n
)
d
ω
a
d
w
(
n
)
=
2
π
∑
W
k
∫
ω
k
ω
k
+
1
A
d
(
ω
)
cos
(
ω
n
)
d
ω
(31)
and the matrix HH is the same as used in Equation 12 and defined in
Equation 13.
Equations (Reference) and Equation 17 can be written together as a matrix equation
R
H
G
0
a
μ
=
a
d
w
A
c
R
H
G
0
a
μ
=
a
d
w
A
c
(32)
The solutions to (Reference) and Equation 17 or to Equation 32 are
μ
=
(
G
R
-
1
H
)
-
1
(
GR
-
1
a
d
w
-
A
c
)
μ
=
(
G
R
-
1
H
)
-
1
(
GR
-
1
a
d
w
-
A
c
)
(33)
a
=
R
-
1
(
a
d
w
-
H
μ
)
a
=
R
-
1
(
a
d
w
-
H
μ
)
(34)
which are ideally suited to a language like Matlab and are implemented in
the programs at the end of this book.
Since the solution of Rau=adwRau=adw is the optimal
unconstrained weighted least squares filter, we can write Equation 33 and
Equation 34 in the form
μ
=
(
G
R
-
1
H
)
-
1
(
G
a
u
-
A
c
)
=
(
G
R
-
1
H
)
-
1
(
A
u
-
A
c
)
a
=
a
u
-
R
-
1
H
μ
μ
=
(
G
R
-
1
H
)
-
1
(
G
a
u
-
A
c
)
=
(
G
R
-
1
H
)
-
1
(
A
u
-
A
c
)
a
=
a
u
-
R
-
1
H
μ
(35)
a
=
a
u
-
R
-
1
H
μ
a
=
a
u
-
R
-
1
H
μ
(36)
This Lagrange multiplier formulation together with applying the Kuhn-Tucker
conditions are used in an iterative multiple exchange algorithm similar to
the Remez exchange algorithm to give the complete design method.
One version of this exchange algorithm applies to the problem posed by
Adams with specified pass and stopband edges and with zero error weighting
in the transition band. This problem has the structure of a quadratic
programming problem and could be solved using general QP methods but the
multiple exchange algorithm suggested here is probably faster.
The second version of this exchange algorithm applies to the problem where
there is no explicitly specified transition band. This problem is not
strictly a quadratic programming problem and our exchange algorithm has no
proof of convergence (the HOS algorithm also has no proof of convergence).
However, in practice, this program has proven to be robust and
converges for a wide variety of lengths, constraints, weights, and
band edges. The performance is completely independent of the normalizing
parameter KK. Notice that the inversion of the RR matrix is done
once and does not have to be done each iteration. The details of the
program are included in the filter design paper and in the Matlab program
at the end of this book.
At mentioned earlier, this design problem might be addressed by general
constrained quadratic programming methods
[23], [31], [47], [51], [49], [50], [52], [76], [75], [79].
Here we show that the CLS FIR filter design approach is probably the best
general FIR filter design method. For example, a length-31 linear phase lowpass FIR
filter is designed for a band edge of 0.3 and the constraint that the response in the stop cannot be
greater than 0.03 is illustrated in Figure 2.
This filter was designed using the Matlab command: 'fircls1()' function.
We now consider the general LpLp approximation which contains the least
squares L2L2 and the Chebyshev L∞L∞ cases.
This approach is described in [8], [11], [12]
using the iterative reweighted least squared (IRLS) alorithm and looks
attractive in that it can use different pp in different frequency bands.
This would allow, for example, a least squared error approximation in the
passband and a Chebyshev approximation in the stopband. The IRLS method
can also be used for complex Chebyshev approximations [73] and
constrained L2L2 approximatin.
There are cases where it is desirable to design an FIR filter that will
minimize the LpLp error norm. The error is defined by
q
=
∫
Ω
|
A
(
ω
)
-
A
d
(
ω
)
|
p
d
ω
q
=
∫
Ω
|
A
(
ω
)
-
A
d
(
ω
)
|
p
d
ω
(37)
but we usually work with Q2Q2.
For large pp, the results are essentially the same as the Chebyshev
filter and this gives a continuum of design between L2L2 and L∞L∞.
It also allows the very interesting important possibility of allowing
p(ω)p(ω) to be a function of frequency. This means one could have
different error criteria in different frequency bands. It can be modified
to give the same effects as a constraint. This approach is discussed in
[12]. It can be applied to complex approximation and to
two-dimensional filter design [8], [9].
The least squared error and the minimum Chebyshev error criteria are the
two most commonly used linear-phase FIR filter design methods [46].
There are many situations where better total performance would be obtained
with a mixture of these two error measures or some compromise design that
would give a trade-off between the two. We show how to design a filter
with an L2L2 approximation in the passband and a Chebyshev approximation
in the stopband. We also show that by formulating the LpLp problem we
can solve the constrained L2L2 approximation problem [2].
This section first explores the minimization of the pthpth power of the
error as a general approximation method and then shows how this allows
L2L2 and L∞L∞ approximations to be used simultaneous in different
frequency bands of one filter and how the method can be used to impose
constraints on the frequency response. There are no analytical methods to
find this approximation, therefore, an iterative method is used over
samples of the error in the frequency domain. The methods developed here
[8], [11] are based on what is called an iterative reweighted
least squared (IRLS) error algorithm [70], [58], [14] and
they can solve certain FIR filter design problems that neither the Remez
exchange algorithm nor analytical L2L2 methods can.
The idea of using an IRLS algorithm to achieve a Chebyshev or L∞L∞
approximation seems to have been first developed by Lawson [32] and extended to
LpLp by Rice and Usow [59], [57]. The basic IRLS method for
LpLp was given by Karlovitz [30] and extended by Chalmers,
et. al. [16], Bani and Chalmers [13], and Watson
[78]. Independently, Fletcher, Grant and Hebden [24]
developed a similar form of IRLS but based on Newton's method and Kahng
[29] did likewise as an extension of Lawson's algorithm. Others
analyzed and extended this work [22], [44], [14], [78]. Special
analysis has been made for 1≤p<21≤p<2 by
[74], [77], [58], [37], [44], [70], [80] and for p=∞p=∞ by
[24], [13], [58], [42], [4], [38]. Relations to the
Remez exchange algorithm [17], [48] were suggested by
[13], to homotopy [64] by [60], and to
Karmarkar's linear programming algorithm [69] by
[58], [68]. Applications of Lawson's algorithm to complex
Chebyshev approximation in FIR filter design have been made in
[25], [18], [21], [73] and to 2-D filter design in [15].
Reference [72] indicates further results may be forthcoming.
Application to array design can be found in [71] and to
statistics in [14].
This paper unifies and extends the IRLS techniques and applies them to the
design of FIR digital filters. It develops a framework that relates all
of the above referenced work and shows them to be variations of a basic
IRLS method modified so as to control convergence. In particular, we
generalize the work of Rice and Usow on Lawson's algorithm and explain why
its asymptotic convergence is slow.
The main contribution here is a new robust IRLS method
[8], [11] that combines an improved convergence acceleration scheme
and a Newton based method. This gives a very efficient and versatile
filter design algorithm that performs significantly better than the
Rice-Usow-Lawson algorithm or any of the other IRLS schemes. Both the
initial and asymptotic convergence behavior of the new algorithm is
examined and the reason for occasional slow convergence of this and all
other IRLS methods is discovered.
We then show that the new IRLS method allows the use of pp as a function
of frequency to achieve different error criteria in the pass and stopbands
of a filter. Therefore, this algorithm can be applied to solve the
constrained LpLp approximation problem. Initial results of applications
to the complex and two-dimensional filter design problem are presented.
Although the traditional IRLS methods were sometimes slower than competing
approaches, the results of this paper and the availability of fast modern
desktop computers make them practical now and allow exploitation of their
greater flexibility and generality.
Various approximation methods can be developed by considering different
definitions of norm or error measure. Commonly used definitions are
L1L1, L2L2, and Chebyshev or L∞L∞. Using the L2L2 norm, gives
the scalar error to minimize
q
=
∑
k
=
0
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
q
=
∑
k
=
0
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
(38)
or in matrix notation using Equation 38, the error or residual vector is
defined by
q
=
C
a
-
A
d
q
=
C
a
-
A
d
(39)
giving the scalar error of Equation 38 as
q
=
ϵ
T
ϵ
.
q
=
ϵ
T
ϵ
.
(40)
This can be minimized by solution of the normal equations
[36], [26], [69]
C
T
C
a
=
C
T
A
d
.
C
T
C
a
=
C
T
A
d
.
(41)
The weighted squared error defined by
q
=
∑
k
=
0
L
-
1
w
k
2
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
.
q
=
∑
k
=
0
L
-
1
w
k
2
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
.
(42)
or, in matrix notation using Equation 39 and Equation 40 causes Equation 42
to become
q
=
ϵ
T
W
T
W
ϵ
q
=
ϵ
T
W
T
W
ϵ
(43)
which can be minimized by solving
W
C
a
=
W
A
d
W
C
a
=
W
A
d
(44)
with the normal equations
C
T
W
T
W
C
a
=
C
T
W
T
W
A
d
C
T
W
T
W
C
a
=
C
T
W
T
W
A
d
(45)
where WW is an LL by LL diagonal matrix with the weights wkwk from
Equation 42 along the diagonal. A more general formulation of the
approximation simply requires WTWWTW to be positive definite. Some
authors define the weighted error in Equation 42 using wkwk rather than
wk2wk2. We use the latter to be consistent with the least squared error
algorithms in Matlab [43].
Solving Equation 45 is a direct method of designing an FIR filter using a
weighted least squared error approximation. To minimize the sum of the
squared error and get approximately the same result as minimizing the
integral of the squared error, one must choose LL to be 3 to 10 or more
times the length LL of the filter being designed.
There is no simple direct method for finding the optimal approximation for
any error power other than two. However, if the weighting coefficients
wkwk as elements of WW in Equation 45 could be set equal to the elements
in |A-Ad||A-Ad|, minimizing Equation 42 would minimize the fourth power of
|A-Ad||A-Ad|. This cannot be done in one step because we need the solution
to find the weights! We can, however, pose an iterative algorithm which
will first solve the problem in Equation 41 with no weights, then calculate
the error vector ϵϵ from Equation 39 which will then be used to
calculate the weights in Equation 45. At each stage of the iteration, the
weights are updated from the previous error and the problem solved again.
This process of successive approximations is called the iterative
reweighted least squared error algorithm (IRLS).
The basic IRLS equations can also be derived by simply taking the gradient
of the pp-error with respect to the filter coefficients hh or aa and
setting it equal to zero [24], [29]. These equations form the
basis for the iterative algorithm.
If the algorithm is a contraction mapping [39], the successive
approximations will converge and the limit is the solution of the minimum
L4L4 approximation problem. If a general problem can be posed
[67], [27], [45] as the solution of an equation in the form
x
=
f
(
x
)
,
x
=
f
(
x
)
,
(46)
a successive approximation algorithm can be proposed which iteratively
calculates xx using
x
m
+
1
=
f
(
x
m
)
x
m
+
1
=
f
(
x
m
)
(47)
starting with some x0x0. The function f(·)f(·) maps xmxm into
xm+1xm+1 and, if
limm→∞xm=x0limm→∞xm=x0 where x0=f(x0)x0=f(x0), x0x0 is the fixed point of the mapping and a solution to
Equation 46. The trick is to find a mapping that solves the desired
problem, converges, and converges fast.
By setting the weights in Equation 42 equal to
w
(
k
)
=
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
(
p
-
2
)
/
2
,
w
(
k
)
=
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
(
p
-
2
)
/
2
,
(48)
the fixed point of a convergent algorithm minimizes
q
=
∑
k
=
0
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
p
.
q
=
∑
k
=
0
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
p
.
(49)
It has been shown [59] that weights always exist such that
minimizing Equation 42 also minimizes Equation 49. The problem is to find
those weights efficiently.
The basic IRLS algorithm is started by initializing the weight matrix
defined in Equation 42 and Equation 43 for unit weights with W0=IW0=I.
Using these weights to start, the mthmth iteration solves Equation 45 for
the filter coefficients with
a
m
=
[
C
T
W
m
T
W
m
C
]
-
1
C
T
W
m
T
W
m
A
d
a
m
=
[
C
T
W
m
T
W
m
C
]
-
1
C
T
W
m
T
W
m
A
d
(50)
This is a formal statement of the operation. In practice one should not
invert a matrix, one should use a sophisticated numerical method
[20] to solve the overdetermined equations in Equation 38 The
error or residual vector Equation 39 for the mthmth iteration is found by
ϵ
m
=
C
a
m
-
A
d
ϵ
m
=
C
a
m
-
A
d
(51)
A new weighting vector is created from this error vector using Equation 48 by
w
m
+
1
=
|
ϵ
m
|
(
p
-
2
)
/
2
w
m
+
1
=
|
ϵ
m
|
(
p
-
2
)
/
2
(52)
whose elements are the diagonal elements of the new weight matrix
W
m
+
1
=
d
i
a
g
[
w
m
+
1
]
.
W
m
+
1
=
d
i
a
g
[
w
m
+
1
]
.
(53)
Using this weight matrix, we solve for the next vector of filter
coefficients by going back to Equation 50 and this defines the basic
iterative process of the IRLS algorithm.
It can easily be shown that the aa that minimizes Equation 49 is a fixed
point of this iterative map. Unfortunately, applied directly, this basic
IRLS algorithm does not converge and/or it has numerical problems for most
practical cases [14]. There are three aspects that must be
addressed. First, the IRLS algorithm must theoretically converge. Second,
the solution of Equation 50 must be numerically stable. Finally, even if
the algorithm converges and is numerically stable, it must converge fast
enough to be practical.
Both theory and experience indicate there are different convergence
problems connected with several different ranges and values of pp. In
the range 2≤p<32≤p<3, virtually all methods converge
[24], [14], [44]. In the range 3≤p<∞3≤p<∞, the algorithm
diverges and the various methods discussed in this paper must be used.
As pp becomes large compared to 2, the weights carry a larger
contribution to the total minimization than the underlying least squared
error minimization, the improvement at each iteration becomes smaller, and
the likelihood of divergence becomes larger. For p=∞p=∞ we can use
to advantage the fact that the optimal approximation solution to
Equation 49 is unique but the weights in Equation 42 that give that solution
are not. In other words, different matrices WW give the same solution to
Equation 50 but will have different convergence properties. This allows
certain alteration to the weights to improve convergence without harming
the optimality of the results [38]. In the range 1<p<21<p<2,
both convergence and numerical problems exist as, in contrast to p>2p>2,
the IRLS iterations are undoing what the underlying least squares is
doing. In particular, the weights near frequencies with small errors
become very large. Indeed, if the error happens to be zero, the weight
becomes infinite because of the negative exponent in Equation 52. For p=1p=1 the solution to the optimization problem is not even unique. The
various algorithms that are presented below are based on schemes to
address these problems.
In order to achieve convergence, a second order update is used which only
partially changes the filter coefficients amam in Equation 50 each
iteration. This is done by first calculating the unweighted L2L2
approximation filter coefficients using (Reference) as
a
0
=
[
C
T
C
]
-
1
C
T
A
d
.
a
0
=
[
C
T
C
]
-
1
C
T
A
d
.
(54)
The error or residual vector (Reference) for the mthmth iteration is
found as (Reference) by
ϵ
m
=
C
a
m
-
A
d
ϵ
m
=
C
a
m
-
A
d
(55)
and the new weighting vector is created from this error vector using
Equation 48 by
w
m
+
1
=
|
ϵ
m
|
(
p
-
2
)
/
2
w
m
+
1
=
|
ϵ
m
|
(
p
-
2
)
/
2
(56)
whose elements are the diagonal elements of the new weight matrix
W
m
+
1
=
d
i
a
g
[
w
m
+
1
]
.
W
m
+
1
=
d
i
a
g
[
w
m
+
1
]
.
(57)
This weight matrix is then used to calculate a temporary filter coefficient
vector by
a
^
m
+
1
=
[
C
T
W
m
+
1
T
W
m
+
1
C
]
-
1
C
T
W
m
+
1
T
W
m
+
1
A
d
.
a
^
m
+
1
=
[
C
T
W
m
+
1
T
W
m
+
1
C
]
-
1
C
T
W
m
+
1
T
W
m
+
1
A
d
.
(58)
The vector of filter coefficients that is actually used is only partially
updated using a form of adjustable step size in the following second order
linearly weighted sum
a
m
+
1
=
λ
a
^
m
+
1
+
(
1
-
λ
)
a
m
a
m
+
1
=
λ
a
^
m
+
1
+
(
1
-
λ
)
a
m
(59)
Using this filter coefficient vector, we solve for the next error vector
by going back to Equation 55 and this defines Karlovitz's IRLS algorithm
[30].
In this algorithm, λλ is a convergence parameter that takes values
0<λ≤10<λ≤1. Karlovitz showed that for the proper λλ, the
IRLS algorithm using Equation 58 always converges to the globally optimal
LpLp approximation for pp an even integer in the range 4≤p<∞4≤p<∞. At each iteration the LpLp error has to be minimized over
λλ which requires a line search. In other words, the full
Karlovitz method requires a multi-dimensional weighted least squares
minimization and a one-dimensional pthpth power error minimization at
each iteration. Extensions of Karlovitz's work [78] show the
one-dimensional minimization is not necessary but practice shows the
number of required iterations increases considerably and robustness in
lost.
Fletcher et al. [24] and later Kahng [29] independently
derive the same second order iterative algorithm by applying Newton's
method. That approach gives a formula for λλ as a function of pp
and is discussed later in this paper. Although the iteration count for
convergence of the Karlovitz method is good, indeed, perhaps the best of
all, the minimization of λλ at each iteration causes the algorithm
to be very slow in execution.
Both the new method in section 4.3 and Lawson's method use a second order
updating of the weights to obtain convergence of the basic IRLS algorithm.
Fletcher et al. [24] and Kahng [29] use a linear
summation for the updating similar in form to Equation 59 but
apply it to the filter coefficients in the manner of Karlovitz rather than
the weights as Lawson did. Indeed, using our development of Karlovitz's
method in (Reference), we see that Kahng's method and Fletcher, Grant, and
Hebden's method are simply a particular choice of λλ as a function
of pp in Karlovitz's method. They derive
λ
=
1
p
-
1
λ
=
1
p
-
1
(60)
by using Newton's method to minimize εε in Equation 49 to give
for Equation 59
a
m
=
(
a
^
m
+
(
p
-
2
)
a
m
-
1
)
/
(
p
-
1
)
.
a
m
=
(
a
^
m
+
(
p
-
2
)
a
m
-
1
)
/
(
p
-
1
)
.
(61)
This defines Kahng's method which he says always converges
[45]. He also notes that the summation methods in section
4.2, 4.3 and 4.5 do not have the possible restarting problem that Lawson's
method theoretically does. Because Kahng's algorithm is a form of
Newton's method, its asymptotic convergence is very good but the initial
convergence is poor and very sensitive to starting values.
A modification and generalization of an acceleration method suggested
independently by Ekblom [22] and by Kahng [29] is
developed here and combined with the Newton's method of Fletcher, Grant,
and Hebden and of Kahng to give a robust, fast, and accurate IRLS
algorithm [8], [11]. It overcomes the poor initial performance of
the Newton's methods and the poor final performance of the RUL algorithms.
Rather than starting the iterations of the IRLS algorithms with the actual
desired value of pp, after the initial L2L2 approximation, the new
algorithm starts with p=K*2p=K*2 where KK is a parameter between one and
approximately two, chosen for the particular problem specifications.
After the first iteration, the value of pp is increased to p=K2*2p=K2*2. It
is increased by a factor of KK at each iteration until it reaches the
actual desired value. This keeps the value of pp being approximated just
ahead of the value achieved. This is similar to a homotopy where we vary
the value of pp from 2 to its final value. A small value of KK gives
very reliable convergence because the approximation is achieved at each
iteration but requires a large number of iterations for pp to reach its
final value. A large value of KK gives faster convergence for most
filter specifications but fails for some. The rule that is used to choose
pmpm at the mthmth iteration is
p
m
=
min
(
p
,
K
p
m
-
1
)
.
p
m
=
min
(
p
,
K
p
m
-
1
)
.
(62)
Each iteration of our new variable pp method is implemented by the basic
algorithm described as Karlovitz's method but using the Newton's method
based value of λλ from Fletcher or Kahng in Equation 60. Both
Ekblom and Kahng only used K=2K=2 which is too large in almost all cases.
We also tried the generalized acceleration scheme with the basic Karlovitz
method and the RUL algorithm. Although it improved the initial
performance of the Karlovitz method, the slowness of each iteration still
made this method unattractive. Use with the RUL algorithm gave only a
minor improvement of initial performance and no improvement of the poor
final convergence.
Our new algorithm uses three distinct concepts:
- The basic IRLS which is a straight forward algorithm with linear
convergence [14] when it converges.
- The second order or Newton's modification which increases the number
of cases where initial convergence occurs and gives quadratic asymptotic
convergence [24], [29].
- The controlled increasing of pp from one iteration to the next
is a modification which gives excellent initial convergence and allows
adaptation for “difficult" cases.
The best total algorithm, therefore, combines the increasing of pp given
in Equation 62 the updating the filter coefficients using Equation 59, and
the Newton's choice of λλ in Equation 60. By slowly increasing pp,
the error surface slowly changes from the parabolic shape of L2L2 which
Newton's method is based on, to the more complicated surface of LpLp.
The question is how fast to change and, from experience with many
examples, we have learned that this depends on the filter design
specifications.
A Matlab program that implements this basic IRLS algorithm is given in the
appendix of this paper. It uses an updating of A(ωk)A(ωk) in the
frequency domain rather than of a(n)a(n) in the time domain to allow
modifications necessary for using different pp in different bands as
will be developed later in this paper.
An example design for a length L=31L=31, passband edge fp=0.4fp=0.4, stopband edge
fs=0.44fs=0.44, and p=2p=2 the program does not have to iterate and give the response
in Figure 3.
For the same specifications except p=4p=4 we get
Figure 4
and for p=100p=100 we get Figure 5
Probably the most important use of the LpLp approximation problem posed
here is its use for designing filters with different error criteria in
different frequency bands. This is possible because the IRLS algorithm
allows an error power that is a function of frequency p(ω)p(ω) which can
allow an L2L2 measure in the passband and a Chebyshev error measure in
the stopband or any other form. This is important if an L2L2
approximation is needed in the passband because Parseval's theorem shows
that the time domain properties of the filtered signal will be well
preserved but, because of unknown properties of the noise or interference,
the stopband attenuation must be less than some specified valued.
The new algorithm described in "A New Robust IRLS Method" was modified so that the
iterative updating is done to A(ω)A(ω) rather than to a(n)a(n).
Because the Fourier transform is linear, the updating of
Equation 59 can also be achieved by
A
m
+
1
(
ω
)
=
λ
A
^
m
+
1
(
ω
)
+
(
1
-
λ
)
A
m
(
ω
)
.
A
m
+
1
(
ω
)
=
λ
A
^
m
+
1
(
ω
)
+
(
1
-
λ
)
A
m
(
ω
)
.
(63)
The Matlab program listed in the appendix uses this form. This type of
updating in the frequency domain allows different pp to be used in
different bands of A(ω)A(ω) and different update parameters λλ
to be used in the appropriate bands. In addition, it allows a different
constant KK weighting to be used for the different bands. The error for
this problem is changed from Equation 38 to be
q
=
∑
k
=
0
k
0
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
+
K
∑
k
=
k
0
+
1
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
p
q
=
∑
k
=
0
k
0
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
2
+
K
∑
k
=
k
0
+
1
L
-
1
|
A
(
ω
k
)
-
A
d
(
ω
k
)
|
p
(64)
Figure 6 shows the frequency response of a filter designed with a
passband p=2p=2, a stopband p=4p=4, and a stopband weight of K=1K=1.
Figure 7 gives the frequency response for the same specifications but with
p=100p=100 and Figure 8 adds a constant weight to the stopband.
In some design situations, neither a pure L2L2 nor a L∞L∞ or
Chebyshev approximation is appropriate. If one evaluates both the squared
error and the Chebyshev error of a particular filter, it is easily seen
that for an optimal least squares solution, a considerable reduction of
the Chebyshev error can be obtained by allowing a small increase in the
squared error. For the optimal Chebyshev solution the opposite is true.
A considerable reduction of the squared error can be obtained by allowing
a small increase in the Chebyshev error. This suggests a better filter
might be obtained by some combination of L2L2 and L∞L∞
approximation. This problem is stated and addressed by Adams [2]
and by Lang [33], [34].
We have applied the IRLS method to the constrained least squares problem
by adding an error based weighting function to unity in the stopband only
in the frequency range where the response in the previous iteration
exceeds the constraint. The frequency response of an example is the that
was illustrated in (Reference) as obtained using the CLS algorithm.
The IRLS approach to this problem is currently being evaluated
and compared to the approach used by Adams. The initial results are
encouraging.
Although described above in terms of a one dimensional linear phase FIR
filter, the method can just as easily be applied to the complex
approximation problem and to the multidimensional filter design problem.
We have obtained encouraging initial results from applications of our new
IRLS algorithm to the optimal design of FIR filters with a nonlinear phase
response. By using a large pp we are able to design essentially
Chebyshev filters where the Remez algorithm is difficult to
apply reliably.
Our new IRLS design algorithm was applied to the two examples considered
by Chen and Parks [19] and by Schulist [62], [63] and
Preuss [54], [53]. One is a lowpass filter and the other a
bandpass filter, both approximating a constant group delay over their
passbands. Examination of magnitude frequency response plots, imaginary
vs. real part plots, and group delay frequency response plots for the
filters designed by the IRLS method showed close agreement with published
results [9]. The use of an LpLp approximation may give more
desirable results than a true Chebyshev approximation. Our results on the
complex approximation problem are preliminary and we are doing further
investigations on convergence properties of the algorithm and on the
characteristics of LpLp approximations in this context.
Application of the new IRLS method to the design of 2D FIR filters has
also given encouraging results. Here again, it is difficult to apply the
Remez exchange algorithm directly to the multi-dimensional approximation
problem. Application of the IRLS to this problem is currently being
investigated.
We designed 5×55×5, 7×77×7, 9×99×9, 41×4141×41, and
71×7171×71 filters to specifications used in
[35], [28], [15], [7]. Our preliminary observations from these
examples indicate the new IRLS method is faster and/or gives lower
Chebyshev errors than any of the other methods [10]. Values of KK
in the 1.1 to 1.2 range were required for convergence. As for the complex
approximation problem, further research is being done on convergence
properties of the algorithm and on the characteristics of LpLp
approximations in this context.
We have proposed applying the iterative reweighted least squared error
approach to the FIR digital filter design problem. We have shown how a
large number of existing methods can be cast as variations on one basic
successive approximation algorithm called Iterative Reweighted Least
Squares. From this formulation we were able to understand the convergence
characteristics of all of them and see why Lawson's method has
experimentally been found to have slow convergence.
We have created a new IRLS algorithm by combining an improved acceleration
scheme with Fletcher's and Kahng's Newton type methods to give a very good
design method with good initial and final convergence properties. It is a
significant improvement over the Rice-Usow-Lawson method.
The main contribution of the paper was showing how to use these algorithms
with different pp in different frequency bands to give a filter with
different pass and stopband characteristics, how to solve the constrained
LpLp problem, and how the approach is used in complex approximation and
in 2D filter design.