Connexions

You are here: Home » Content » IIR Coefficient Quantization Analysis
Content Actions

IIR Coefficient Quantization Analysis

Module by: Douglas L. Jones

Summary: Proper coefficient quantization is essential for IIR filters. Sensitivity analysis shows that second-order cascade-form implementations have much lower sensitivity than higher-order direct-form or transpose-form structures. The normal form is even less sensitive, but requires more computation.

Coefficient quantization is an important concern with IIR filters, since straigthforward quantization often yields poor results, and because quantization can produce unstable filters.

Sensitivity analysis

The performance and stability of an IIR filter depends on the pole locations, so it is important to know how quantization of the filter coefficients a k a k affects the pole locations p j p j . The denominator polynomial is Dz=1+k=1N a k z-k=i=1N1- p i z-1 D z 1 k N 1 a k z k i N 1 1 p i z We wish to know a k p i a k p i , which, for small deviations, will tell us that a δδ change in a k a k yields an ε=δ a k p i ε δ a k p i change in the pole location. a k p i a k p i is the sensitivity of the pole location to quantization of a k a k . We can find a k p i a k p i using the chain rule. a k Az|z= p i =zAz a k z|z= p i z p i a k A z z p i z A z a k z a k p i = a k A z i |z= p i zA z i |z= p i a k p i z p i a k A z i z p i z A z i which is
a k p i =z-k-z-1j=1jiN1- p j z-1|z= p i =- p i N-kj=1jiN p j - p i a k p i z p i z k z j j i N 1 1 p j z p i N k j j i 1 N p j p i (1)
Note that as the poles get closer together, the sensitivity increases greatly. So as the filter order increases and more poles get stuffed closer together inside the unit circle, the error introduced by coefficient quantization in the pole locations grows rapidly.
How can we reduce this high sensitivity to IIR filter coefficient quantization?

Solution

Cascade or parallel form implementations! The numerator and denominator polynomials can be factored off-line at very high precision and grouped into second-order sections, which are then quantized section by section. The sensitivity of the quantization is thus that of second-order, rather than NN-th order, polynomials. This yields major improvements in the frequency response of the overall filter, and is almost always done in practice.
Note that the numerator polynomial faces the same sensitivity issues; the cascade form also improves the sensitivity of the zeros, because they are also factored into second-order terms. However, in the parallel form, the zeros are globally distributed across the sections, so they suffer from quantization of all the blocks. Thus the cascade form preserves zero locations much better than the parallel form, which typically means that the stopband behavior is better in the cascade form, so it is most often used in practice.
Note on FIR Filters: On the basis of the preceding analysis, it would seem important to use cascade structures in FIR filter implementations. However, most FIR filters are linear-phase and thus symmetric or anti-symmetric. As long as the quantization is implemented such that the filter coefficients retain symmetry, the filter retains linear phase. Furthermore, since all zeros off the unit circle must appear in groups of four for symmetric linear-phase filters, zero pairs can leave the unit circle only by joining with another pair. This requires relatively severe quantizations (enough to completely remove or change the sign of a ripple in the amplitude response). This "reluctance" of pole pairs to leave the unit circle tends to keep quantization from damaging the frequency response as much as might be expected, enough so that cascade structures are rarely used for FIR filters.
Problem 1
What is the worst-case pole pair in an IIR digital filter?
[ Click for Solution 1 ]
Solution 1
The pole pair closest to the real axis in the z-plane, since the complex-conjugate poles will be closest together and thus have the highest sensitivity to quantization.
[ Hide Solution 1 ]

Quantized Pole Locations

In a direct-form or transpose-form implementation of a second-order section, the filter coefficients are quantized versions of the polynomial coefficients. Dz=z2+ a 1 z+ a 2 =z-pz-p¯ D z z 2 a 1 z a 2 z p z p p=- a 1 ± a 1 2-4 a 2 2 p ± a 1 a 1 2 4 a 2 2 p=rθ p r θ Dz=z2-2rcosθ+r2 D z z 2 2 r θ r 2 So a 1 =-2rcosθ a 1 2 r θ a 2 =r2 a 2 r 2 Thus the quantization of a 1 a 1 and a 2 a 2 to BB bits restricts the radius rr to r=k Δ B r k Δ B , and a 1 =-2p=k Δ B a 1 2 p k Δ B The following figure shows all stable pole locations after four-bit two's-complement quantization.
figdfpolelocs.png
Figure 1
Note the nonuniform distribution of possible pole locations. This might be good for poles near r=1 r 1 , θ=π2 θ 2 , but not so good for poles near the origin or the Nyquist frequency.
In the "normal-form" structures, a state-variable based realization, the poles are uniformly spaced.
fignfpolelocs.png
Figure 2
This can only be accomplished if the coefficients to be quantized equal the real and imaginary parts of the pole location; that is, α 1 =rcosθ=r α 1 r θ r α 2 =rsinθ=p α 2 r θ p This is the case for a 2nd-order system with the state matrix A= α 1 α 2 - α 1 α 1 A α 1 α 2 α 1 α 1 : The denominator polynomial is
detzI-A=z- α 1 2+ α 2 2=z2-2 α 1 z+ α 1 2+ α 2 2=z2-2rcosθz+r2cos2θ+sin2θ=z2-2rcosθz+r2 z I A z α 1 2 α 2 2 z 2 2 α 1 z α 1 2 α 2 2 z 2 2 r θ z r 2 θ 2 θ 2 z 2 2 r θ z r 2 (2)
Given any second-order filter coefficient set, we can write it as a state-space system, find a transformation matrix TT such that A ^ =T-1AT A ^ T A T is in normal form, and then implement the second-order section using a structure corresponding to the state equations.
The normal form has a number of other advantages; both eigenvalues are equal, so it minimizes the norm of Ax A x , which makes overflow less likely, and it minimizes the output variance due to quantization of the state values. It is sometimes used when minimization of finite-precision effects is critical.
Problem 2
What is the disadvantage of the normal form?
[ Click for Solution 2 ]
Solution 2
It requires more computation. The general state-variable equation requires nine multiplies, rather than the five used by the Direct-Form II or Transpose-Form structures.
[ Hide Solution 2 ]

Comments, questions, feedback, criticisms?

Send feedback