Coefficient quantization is an important concern with IIR filters,
since straigthforward quantization often yields poor results, and because
quantization can produce unstable filters.
Sensitivity analysis
The performance and stability of an IIR filter depends
on the pole locations, so it is important to know how
quantization of the filter coefficients
a
k
a
k
affects the pole locations
p
j
p
j
. The denominator polynomial is
Dz=1+∑k=1N
a
k
z-k=∏i=1N1-
p
i
z-1
D
z
1
k
N
1
a
k
z
k
i
N
1
1
p
i
z
We wish to know
∂∂
a
k
p
i
a
k
p
i
, which, for small deviations, will tell us that a
δδ change in
a
k
a
k
yields an
ε=δ∂∂
a
k
p
i
ε
δ
a
k
p
i
change in the pole location.
∂∂
a
k
p
i
a
k
p
i
is the
sensitivity of the pole location to
quantization of
a
k
a
k
. We can find
∂∂
a
k
p
i
a
k
p
i
using the chain rule.
∂∂
a
k
Az|z=
p
i
=∂∂zAz∂∂
a
k
z|z=
p
i
z
p
i
a
k
A
z
z
p
i
z
A
z
a
k
z
⇓
⇓
∂∂
a
k
p
i
=∂∂
a
k
A
z
i
|z=
p
i
∂∂zA
z
i
|z=
p
i
a
k
p
i
z
p
i
a
k
A
z
i
z
p
i
z
A
z
i
which is
∂∂
a
k
p
i
=z-k-z-1∏j=1j≠iN1-
p
j
z-1|z=
p
i
=-
p
i
N-k∏j=1j≠iN
p
j
-
p
i
a
k
p
i
z
p
i
z
k
z
j
j
i
N
1
1
p
j
z
p
i
N
k
j
j
i
1
N
p
j
p
i
(1)
Note that as the poles get closer together, the sensitivity
increases greatly. So as the filter order increases and more poles
get stuffed closer together inside the unit circle, the error
introduced by coefficient quantization in the pole locations
grows rapidly.
How can we reduce this high sensitivity to IIR filter coefficient
quantization?
Solution
Cascade
or
parallel form
implementations! The numerator and denominator polynomials
can be factored off-line at very high precision and grouped into
second-order sections, which are then quantized section by
section. The sensitivity of the quantization is thus that
of second-order, rather than
NN-th order, polynomials. This
yields major improvements in the frequency response of the
overall filter, and is almost always done in practice.
Note that the numerator polynomial faces the same
sensitivity issues; the cascade form
also improves the sensitivity of the zeros, because they are
also factored into second-order terms. However, in the
parallel form, the zeros are globally
distributed across the sections, so they suffer from
quantization of all the blocks. Thus the
cascade form preserves zero locations
much better than the parallel form, which typically means
that the stopband behavior is better in the cascade
form, so it is most often used in practice.
Note on FIR Filters:
On the basis of the preceding analysis, it would seem
important to use cascade structures in FIR filter
implementations. However, most FIR filters are linear-phase and
thus symmetric or anti-symmetric. As long as the quantization is
implemented such that the filter coefficients retain
symmetry, the filter retains linear phase. Furthermore, since all
zeros off the unit circle must appear in groups of four for
symmetric linear-phase filters, zero
pairs can leave the unit circle only by joining with another
pair. This requires relatively severe quantizations (enough to
completely remove or change the sign of a ripple in the
amplitude response). This "reluctance" of pole pairs to leave the
unit circle tends to keep quantization from damaging the
frequency response as much as might be expected, enough so
that cascade structures are rarely used for FIR filters.
Problem 1
What is the worst-case pole pair in an IIR digital filter?
[
Click for Solution 1 ]
Solution 1
The pole pair closest to the real axis in the z-plane, since the
complex-conjugate poles will be closest together and thus have the
highest sensitivity to quantization.
[
Hide Solution 1 ]
Quantized Pole Locations
In a
direct-form
or
transpose-form
implementation of a second-order section, the filter coefficients are
quantized versions of the polynomial coefficients.
Dz=z2+
a
1
z+
a
2
=z-pz-p¯
D
z
z
2
a
1
z
a
2
z
p
z
p
p=-
a
1
±
a
1
2-4
a
2
2
p
±
a
1
a
1
2
4
a
2
2
p=rⅇⅈθ
p
r
θ
Dz=z2-2rcosθ+r2
D
z
z
2
2
r
θ
r
2
So
a
1
=-2rcosθ
a
1
2
r
θ
a
2
=r2
a
2
r
2
Thus the quantization of
a
1
a
1
and
a
2
a
2
to
BB bits restricts
the radius
rr to
r=k
Δ
B
r
k
Δ
B
, and
a
1
=-2ℜp=k
Δ
B
a
1
2
p
k
Δ
B
The following figure shows all stable pole locations after
four-bit two's-complement quantization.
Note the nonuniform distribution of possible pole
locations. This might be
good for poles
near
r=1
r
1
,
θ=π2
θ
2
, but not so good for poles near the origin or the Nyquist
frequency.
In the "normal-form" structures,
a
state-variable based
realization, the poles are uniformly spaced.
This can only be accomplished if the coefficients to be
quantized equal the real and imaginary parts of the pole
location; that is,
α
1
=rcosθ=ℜr
α
1
r
θ
r
α
2
=rsinθ=ℑp
α
2
r
θ
p
This is the case for a 2nd-order system with the
state matrix
A=
α
1
α
2
-
α
1
α
1
A
α
1
α
2
α
1
α
1
: The denominator polynomial is
detzI-A=z-
α
1
2+
α
2
2=z2-2
α
1
z+
α
1
2+
α
2
2=z2-2rcosθz+r2cos2θ+sin2θ=z2-2rcosθz+r2
z
I
A
z
α
1
2
α
2
2
z
2
2
α
1
z
α
1
2
α
2
2
z
2
2
r
θ
z
r
2
θ
2
θ
2
z
2
2
r
θ
z
r
2
(2)
Given any second-order filter coefficient set, we can write it
as a
state-space system,
find a
transformation matrix
TT such that
A
^
=T-1AT
A
^
T
A
T
is in normal form, and then implement the
second-order section using a structure corresponding to
the state equations.
The normal form has a number of other advantages; both
eigenvalues are equal, so it minimizes the norm of
Ax
A
x
, which makes overflow less likely, and it minimizes
the output variance due to quantization of the state
values. It is sometimes used when minimization of finite-precision
effects is critical.
Problem 2
What is the disadvantage of the normal form?
[
Click for Solution 2 ]
Solution 2
[
Hide Solution 2 ]